|
Hello,
Our upgrade process is blocked, we cannot upgrade our production servers as their security is based on replication.
Is there anything we can do to get past this problem or help resolve it ?
|
|
Hi jppo
To try to recreate this, could you please provide more information on the following:
- The clusters (topology)
- my.cnf files
- SHOW CREATE TABLE on the tables with sequences whose later INSERTs cause the slave to lock
- The queries with the INSERTs that cause the locks
Thank you.
|
|
Hello,
I add two files (INSERT_1, INSERT_2) used to test the applications.
If you needs more I can get some more.
Regards
JP P
|
|
Thanks, jppo.
In the file SEQ_AND_TABLES, I am a bit confused by table SUIVI_V2 and its key SUPPRESSION when there is no column SUPPRESSION defined for that table - I cannot create this table on its own.
I'm not able to recreate this simply with creating the sequence and table insertion, so could you also please send...
- output of the command used for to see the metadata locks?
- output of SHOW PROCESSLIST (if not the above)
- error and SQL logs before and up to the lock occurring (if possible?) (If they are too big, you can upload them to the private folder of this FTP server: https://mariadb.com/kb/en/meta/mariadb-ftp-server/)
Also, is the problem seen with only one cluster? If not, how many clusters are there, and how are they connected together (from server.cnf I assume Galera is not used)?
Thank you.
|
|
Hello,
Yes the index on "SUPPRESSION" colums is irrelevant.
The SQL used to show METADATA_LOCK_INFO :
select *
from information_schema.METADATA_LOCK_INFO
where length(TABLE_NAME) > 0
and TABLE_SCHEMA not in ('mysql');*
The problem is "old", so The server is notr able to activate the save (binlogs are maximum 3 days old) :
Could not find GTID state requested by slave in any binlog files. Probably the slave state is too old and required binlog files have been purged.
Processlist is very small as the slave proceeses are stopped :
| 4 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 1 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge coordinator |
NULL |
0.000 |
| 3 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 2 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 5 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB shutdown handler |
NULL |
0.000 |
| 7 |
event_scheduler |
localhost |
NULL |
Daemon |
445538 |
Waiting on empty queue |
NULL |
0.000 |
| 72150 |
replic_user |
10.216.51.62:41696 |
NULL |
Binlog Dump |
41987 |
Master has sent all binlog to slave; waiting for binlog to be updated |
NULL |
0.000 |
| 79637 |
admin_coyote |
localhost |
mysql |
Query |
0 |
Init |
show processlist |
0.000 |
---------------------------------------------------------------------------------------------
If nexessary I can restore the data from the other node and connect the slave, but the database is quite big and it is a very long process. I can restore for today as I will not have enough time until 18:00.
I can do it tomorrow if it helps to solve the problem.
Regards
JP Pozzi
|
|
Hello,
I made the second machine in SYNC (without restoring) and I get the difference between the two servers :
Server 1 (the "master") :
---------------------------------------------------------------------------------------+
| THREAD_ID |
LOCK_MODE |
LOCK_DURATION |
LOCK_TYPE |
TABLE_SCHEMA |
TABLE_NAME |
---------------------------------------------------------------------------------------+
| 185148 |
MDL_SHARED_WRITE |
NULL |
Table metadata lock |
coyote |
id_suivi |
| 185148 |
MDL_SHARED_WRITE |
NULL |
Table metadata lock |
coyote |
LOCAL_APPAREIL_M |
---------------------------------------------------------------------------------------+
Sur le serveur 2 (the slave which is not working) :
---------------------------------------------------------------------------------------+
| THREAD_ID |
LOCK_MODE |
LOCK_DURATION |
LOCK_TYPE |
TABLE_SCHEMA |
TABLE_NAME |
---------------------------------------------------------------------------------------+
| 80803 |
MDL_EXCLUSIVE |
NULL |
Table metadata lock |
coyote |
id_suivi |
| 80803 |
MDL_SHARED_WRITE |
NULL |
Table metadata lock |
coyote |
LOCAL_APPAREIL_M |
---------------------------------------------------------------------------------------+
I hope it will help !
Regards
JP Pozzi
|
|
Hi jppo,
Thanks for that. While it is active, could you please also do SHOW PROCESSLIST on both servers? And also SHOW SLAVE STATUS and SHOW VARIABLES?
The SQL and/or binary logs that show the queries that produce the locks would be useful as well, if available.
Thank you.
|
|
Hello,
I restart the slave and get a lock after 1 second, here is the log :
---------------------------------------------------------------------------------------+
| THREAD_ID |
LOCK_MODE |
LOCK_DURATION |
LOCK_TYPE |
TABLE_SCHEMA |
TABLE_NAME |
---------------------------------------------------------------------------------------+
| 81843 |
MDL_EXCLUSIVE |
NULL |
Table metadata lock |
coyote |
id_suivi |
| 81843 |
MDL_SHARED_WRITE |
NULL |
Table metadata lock |
coyote |
LOCAL_APPAREIL_M |
| 81845 |
MDL_SHARED_WRITE |
NULL |
Table metadata lock |
coyote |
LOCAL_APPAREIL_M |
---------------------------------------------------------------------------------------+
The processlist shows the threads 81843 and 81845 "Waiting for prior transaction to commit" :
------------------------------------------------------------------------------------------------------------------------------------------------------------
| Id |
User |
Host |
db |
Command |
Time |
State |
Info |
Progress |
------------------------------------------------------------------------------------------------------------------------------------------------------------
| 4 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 1 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge coordinator |
NULL |
0.000 |
| 3 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 2 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 5 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB shutdown handler |
NULL |
0.000 |
| 7 |
event_scheduler |
localhost |
NULL |
Daemon |
457721 |
Waiting on empty queue |
NULL |
0.000 |
| 72150 |
replic_user |
10.216.51.62:41696 |
NULL |
Binlog Dump |
54169 |
Master has sent all binlog to slave; waiting for binlog to be updated |
NULL |
0.000 |
| 81833 |
admin_coyote |
localhost |
mysql |
Query |
0 |
Init |
show processlist |
0.000 |
| 81840 |
system user |
|
NULL |
Slave_IO |
125 |
Waiting for master to send event |
NULL |
0.000 |
| 81842 |
system user |
|
NULL |
Slave_worker |
125 |
Waiting for work from SQL thread |
NULL |
0.000 |
| 81843 |
system user |
|
NULL |
Slave_worker |
123 |
Waiting for prior transaction to commit |
NULL |
0.000 |
| 81845 |
system user |
|
NULL |
Slave_worker |
123 |
Waiting for prior transaction to commit |
NULL |
0.000 |
| 81844 |
system user |
|
NULL |
Slave_worker |
123 |
Waiting for table metadata lock |
NULL |
0.000 |
| 81841 |
system user |
|
NULL |
Slave_SQL |
123 |
Waiting for room in worker thread event queue |
NULL |
0.000 |
------------------------------------------------------------------------------------------------------------------------------------------------------------
Regards
JP Pozzi
|
|
And the "show processlist" on the master :
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Id |
User |
Host |
db |
Command |
Time |
State |
Info |
Progress |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
| 1 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 2 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 3 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge worker |
NULL |
0.000 |
| 4 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB purge coordinator |
NULL |
0.000 |
| 5 |
system user |
|
NULL |
Daemon |
NULL |
InnoDB shutdown handler |
NULL |
0.000 |
| 7 |
event_scheduler |
localhost |
NULL |
Daemon |
313594 |
Waiting on empty queue |
NULL |
0.000 |
| 174975 |
system user |
|
NULL |
Slave_IO |
54442 |
Waiting for master to send event |
NULL |
0.000 |
| 174977 |
system user |
|
NULL |
Slave_worker |
54442 |
Waiting for work from SQL thread |
NULL |
0.000 |
| 174978 |
system user |
|
NULL |
Slave_worker |
54442 |
Waiting for work from SQL thread |
NULL |
0.000 |
| 174979 |
system user |
|
NULL |
Slave_worker |
54442 |
Waiting for work from SQL thread |
NULL |
0.000 |
| 174980 |
system user |
|
NULL |
Slave_worker |
54442 |
Waiting for work from SQL thread |
NULL |
0.000 |
| 174976 |
system user |
|
NULL |
Slave_SQL |
54442 |
Slave has read all relay log; waiting for the slave I/O thread to update it |
NULL |
0.000 |
| 185091 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33240 |
coyote |
Sleep |
60 |
|
NULL |
0.000 |
| 185092 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33242 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185094 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33250 |
coyote |
Sleep |
4 |
|
NULL |
0.000 |
| 185118 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33642 |
coyote |
Sleep |
16 |
|
NULL |
0.000 |
| 185121 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33654 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185122 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33662 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185123 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33664 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185124 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33672 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185125 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33678 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185126 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33680 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185127 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33682 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185128 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33694 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185129 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33698 |
coyote |
Query |
0 |
Commit |
INSERT INTO CENTRE_POI_STAT_TEMPS_REEL ( ID_RADAR_M , PAYS , TYPE_POI , TYPE_INSERTION , IC |
0.000 |
| 185131 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33738 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185132 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33756 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185135 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33814 |
coyote |
Sleep |
1 |
|
NULL |
0.000 |
| 185136 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33816 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185137 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33818 |
coyote |
Query |
0 |
Commit |
INSERT INTO CENTRE_POI_STAT_TEMPS_REEL ( ID_RADAR_M , PAYS , TYPE_POI , TYPE_INSERTION , IC |
0.000 |
| 185146 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33968 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185148 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33990 |
coyote |
Query |
0 |
Commit |
INSERT INTO CENTRE_POI_ZONE_RT_ELEMENT(ID_ZONE_RT, VITESSE, CAP, ZONE_LAT_A, ZONE_LON_A, ZONE_LAT_B, |
0.000 |
| 185149 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:33992 |
coyote |
Sleep |
0 |
|
NULL |
0.000 |
| 185161 |
coyotadmin |
zvr-lrts032.preprod.abcdef.local:56712 |
coyote |
Sleep |
39 |
|
NULL |
0.000 |
| 185162 |
coyotadmin |
zvr-lrts032.preprod.abcdef.local:56718 |
coyote_poi |
Sleep |
39 |
|
NULL |
0.000 |
| 185163 |
coyotadmin |
zvr-lrts032.preprod.abcdef.local:56720 |
coyote |
Sleep |
39 |
|
NULL |
0.000 |
| 185168 |
coyotadmin |
zvr-lrts031.preprod.abcdef.local:59140 |
coyote |
Sleep |
34 |
|
NULL |
0.000 |
| 185169 |
coyotadmin |
zvr-lrts031.preprod.abcdef.local:59146 |
coyote_poi |
Sleep |
34 |
|
NULL |
0.000 |
| 185170 |
coyotadmin |
zvr-lrts031.preprod.abcdef.local:59148 |
coyote |
Sleep |
34 |
|
NULL |
0.000 |
| 185188 |
coyotadmin |
zvr-lrts032.preprod.abcdef.local:56754 |
coyote_poi |
Sleep |
32 |
|
NULL |
0.000 |
| 185192 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:34538 |
coyote |
Sleep |
32 |
|
NULL |
0.000 |
| 185216 |
coyotadmin |
10.218.60.4:59070 |
coyote |
Sleep |
26 |
|
NULL |
0.000 |
| 185248 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:34628 |
coyote |
Sleep |
18 |
|
NULL |
0.000 |
| 185255 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:34634 |
coyote |
Sleep |
16 |
|
NULL |
0.000 |
| 185259 |
coyotadmin |
zvr-lrts031.preprod.abcdef.local:59210 |
coyote_poi |
Sleep |
71 |
|
NULL |
0.000 |
| 185299 |
coyotadmin |
zvr-lrts031.preprod.abcdef.local:59256 |
coyote_poi |
Sleep |
71 |
|
NULL |
0.000 |
| 185403 |
coyotadmin |
zvr-lrts032.preprod.abcdef.local:56948 |
coyote_poi |
Sleep |
273 |
|
NULL |
0.000 |
| 188133 |
coyotadmin |
zvr-lrts032.preprod.abcdef.local:59496 |
coyote |
Sleep |
16 |
|
NULL |
0.000 |
| 220686 |
coyotadmin |
zvr-lrts031.preprod.abcdef.local:36126 |
coyote |
Sleep |
5 |
|
NULL |
0.000 |
| 231455 |
coyotadmin |
zvr-lrts032.preprod.abcdef.local:43828 |
coyote_poi |
Sleep |
46 |
|
NULL |
0.000 |
| 234726 |
replic_user |
10.216.51.61:56070 |
NULL |
Binlog Dump |
397 |
Master has sent all binlog to slave; waiting for binlog to be updated |
NULL |
0.000 |
| 235245 |
coyotadmin |
10.221.131.229:59476 |
coyote |
Sleep |
19 |
|
NULL |
0.000 |
| 235298 |
coyotadmin |
zvr-lrts032.preprod.abcdef.local:47490 |
coyote |
Sleep |
47 |
|
NULL |
0.000 |
| 235313 |
coyotadmin |
zvr-lmin001.preprod.abcdef.local:60328 |
coyote |
Sleep |
32 |
|
NULL |
0.000 |
| 235338 |
admin_coyote |
localhost |
mysql |
Query |
0 |
Init |
show processlist |
0.000 |
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
55 rows in set (0,000 sec)
That'sall folks.
|
|
Slave state :
*************************** 1. row ***************************
|
Connection_name: lmdb001
|
Slave_SQL_State: Slave has read all relay log; waiting for the slave I/O thread to update it
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: 10.216.51.61
|
Master_User: replic_user
|
Master_Port: 3306
|
Connect_Retry: 20
|
Master_Log_File: mysql-bin.000021
|
Read_Master_Log_Pos: 385
|
Relay_Log_File: relay-bin-lmdb001.000002
|
Relay_Log_Pos: 684
|
Relay_Master_Log_File: mysql-bin.000021
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB: centre,centre_eu,centre_poi,geocentre,LOG_TRAMES,mysql,test,traffic
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 385
|
Relay_Log_Space: 995
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 0
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 201
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: Slave_Pos
|
Gtid_IO_Pos: 0-201-834653345
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: conservative
|
SQL_Delay: 0
|
SQL_Remaining_Delay: NULL
|
Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
|
Slave_DDL_Groups: 3
|
Slave_Non_Transactional_Groups: 0
|
Slave_Transactional_Groups: 0
|
Retried_transactions: 0
|
Max_relay_log_size: 1073741824
|
Executed_log_entries: 62
|
Slave_received_heartbeats: 7583
|
Slave_heartbeat_period: 30.000
|
Gtid_Slave_Pos: 0-202-860080803
|
|
|
The mariadb error log does not show anything about the problem and is very short.
|
|
List of variables sent to ftp as : MDEV-29621_variables.txt
|
|
Thanks jppo. Would it be possible to get the relay log to see what the slave is trying to execute?
|
|
Hello,
Relay log transfered as MDEV-29621_relay-bin-lmdb002.000002.gz
Regards
|
|
Hi jppo!
Good observation with the MDL lock difference. It looks like this may be a side-effect of MDEV-28487 (fixed in 10.4.26). Prior to the MDEV-28487 fix, sequence updates in row format would always be logged in minimal mode for binlog_row_image. Now that it can use full mode (the default), the MDL lock mode uses exclusive locking, which may be causing your issue. I'm going to continue to look deeper, but in the mean time, if your data meets the requirements of using minimal mode for binlog_row_image, switching to that may bypass your issue. Alternatively you can try 10.4.25.
The requirements, for easy reference:
Note that to safely change this setting from the default, the table
|
being replicated to must contain identical primary key definitions,
|
and columns must be present, in the same order, and use the same
|
data types as the original table. If these conditions are not met,
|
matches may not be correctly determined and updates and deletes
|
may diverge on the replica, with no warnings or errors returned.
|
|
|
Hello,
Thanks for the explanations.
I will try with 10.4.25, but I don't know how to get the server package for Centos7 X86-64.
Regards
|
|
Hello,
I try to downgrade the "slave" server to 10.4.25 with no result :
The lowks are always here with same characteristics :
{ HEAD_ID | LOCK_MODE | LOCK_DURATION | LOCK_TYPE | TABLE_SCHEMA | TABLE_NAME |
------------------------------------------------------------------------------------------+
| 50 |
MDL_EXCLUSIVE |
NULL |
Table metadata lock |
coyote |
id_suivi |
| 50 |
MDL_SHARED_WRITE |
NULL |
Table metadata lock |
coyote |
LOCAL_APPAREIL_M |
------------------------------------------------------------------------------------------+}}
And the slave is always "Waiting for room in worker thread event queue".
Will try to downgrade the master.
Regards
JP P
|
|
Great! Thanks for the update. I'll continue working to fix the deadlock under this ticket, so you'll know which future release you can upgrade to next.
|
|
Hi jppo,
Did you ever see the deadlock happen from the sequence id_alert? The relay log you provided only has updates using the sequence id_alert (I didn't see id_suivi used anywhere in that relay log). I recreated a similar event stream from that uploaded relay log; however could not recreate the deadlock. Do you have any binary or relay logs with the events that caused the deadlock, i.e. inserts that use the id_suivi sequence? A log that would come from 10.4.25 is fine. It also looks like the tables you are inserting into have triggers, would you be able to show them as well?
Thanks!
|
|
Hello Brandon,
I had forgotten that problem as I was very busy on other subjects ...
It is very difficult to get that kind of information as the systems are working at a very high rate (30000 to 50000 update rows every minute and a little less inserts).
|
|
In another customer report the following picture is presented.
A slave worker (thread id 125982 below) requests IX type of Global-read-lock which must be in conflict with
other worker locks, most probably with X type one (125979).
MDL locks and the processlist are as the following.
|
+-----------+-------------------------+---------------+---------------------+--------------+-------------------+
|
| THREAD_ID | LOCK_MODE | LOCK_DURATION | LOCK_TYPE | TABLE_SCHEMA | TABLE_NAME |
|
+-----------+-------------------------+---------------+---------------------+--------------+-------------------+
|
| 125982 | MDL_INTENTION_EXCLUSIVE | NULL | Global read lock | | |
|
| 125979 | MDL_INTENTION_EXCLUSIVE | NULL | Commit lock | | |
|
| 125981 | MDL_INTENTION_EXCLUSIVE | NULL | Commit lock | | |
|
| 125980 | MDL_INTENTION_EXCLUSIVE | NULL | Commit lock | | |
|
| 403322 | MDL_SHARED_READ | NULL | Table metadata lock | mysql | user |
|
| 125981 | MDL_SHARED_WRITE | NULL | Table metadata lock | iwork | pl |
|
| 125978 | MDL_SHARED_READ | NULL | Table metadata lock | mysql | gtid_slave_pos |
|
| 125979 | MDL_SHARED_WRITE | NULL | Table metadata lock | mysql | gtid_slave_pos |
|
| 125981 | MDL_SHARED_WRITE | NULL | Table metadata lock | mysql | gtid_slave_pos |
|
| 125980 | MDL_SHARED_WRITE | NULL | Table metadata lock | mysql | gtid_slave_pos |
|
| 125981 | MDL_SHARED_WRITE | NULL | Table metadata lock | iwork | audit_ |
|
| 125980 | MDL_SHARED_WRITE | NULL | Table metadata lock | iwork | trade_explain |
|
| 125979 | MDL_EXCLUSIVE | NULL | Table metadata lock | iwork | seq_trade_explain |
|
+-----------+-------------------------+---------------+---------------------+--------------+-------------------+
|
|
+--------+--------------------+--------------------+-------+--------------+---------+-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+--------+--------------------+--------------------+-------+--------------+---------+-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
|
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
|
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
|
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
|
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
|
| 125977 | system user | | NULL | Slave_IO | 1779166 | Waiting for master to send event | NULL | 0.000 |
|
| 125980 | system user | | NULL | Slave_worker | 20315 | Waiting for prior transaction to commit | NULL | 0.000 |
|
| 125979 | system user | | iwork | Slave_worker | 20315 | Waiting for prior transaction to commit | COMMIT | 0.000 |
|
| 125982 | system user | | NULL | Slave_worker | 20315 | Waiting for table metadata lock | NULL | 0.000 |
|
| 125981 | system user | | NULL | Slave_worker | 20315 | Waiting for prior transaction to commit | NULL | 0.000 |
|
| 125978 | system user | | NULL | Slave_SQL | 20323 | Waiting for room in worker thread event queue | NULL | 0.000 |
|
| 403322 | monuser | 10.67.21.126:59496 | NULL | Sleep | 115 | | NULL | 0.000 |
|
| 405701 | tmtb_tableau_iwork | 10.68.22.59:58046 | iwork | Query | 11282 | Waiting for table metadata lock | SELECT TABLE_NAME, TABLE_COMMENT, TABLE_TYPE, TABLE_SCHEMA FROM ( SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'iwork' AND ( TABLE_TYPE='BASE TABLE' OR TABLE_TYPE='VIEW' ) ) TABLES | 0.000 |
|
| 405792 | tmtb_tableau_iwork | 10.68.22.59:33726 | iwork | Query | 10633 | Waiting for table metadata lock | SELECT TABLE_NAME, TABLE_COMMENT, TABLE_TYPE, TABLE_SCHEMA FROM ( SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'iwork' AND ( TABLE_TYPE='BASE TABLE' OR TABLE_TYPE='VIEW' ) ) TABLES | 0.000 |
|
| 406462 | tmtb_tableau_iwork | 10.68.22.6:56774 | iwork | Query | 6400 | Waiting for table metadata lock | SELECT TABLE_NAME, TABLE_COMMENT, TABLE_TYPE, TABLE_SCHEMA FROM ( SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'iwork' AND ( TABLE_TYPE='BASE TABLE' OR TABLE_TYPE='VIEW' ) ) TABLES | 0.000 |
|
| 411908 | tmtb_tableau_iwork | 10.68.22.6:48418 | iwork | Query | 1648 | Waiting for table metadata lock | SELECT TABLE_NAME, TABLE_COMMENT, TABLE_TYPE, TABLE_SCHEMA FROM ( SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'iwork' AND ( TABLE_TYPE='BASE TABLE' OR TABLE_TYPE='VIEW' ) ) TABLES | 0.000 |
|
| 412662 | tmtb_tableau_iwork | 10.68.22.59:36440 | iwork | Query | 743 | Waiting for table metadata lock | SELECT TABLE_NAME, TABLE_COMMENT, TABLE_TYPE, TABLE_SCHEMA FROM ( SELECT * FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_SCHEMA LIKE 'iwork' AND ( TABLE_TYPE='BASE TABLE' OR TABLE_TYPE='VIEW' ) ) TABLES | 0.000 |
|
| 413116 | myoper | localhost | NULL | Query | 0 | Init | show full processlist | 0.000 |
|
+--------+--------------------+--------------------+-------+--------------+---------+-----------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+
|
For further analysis it's important to know what is the transaction that the slave worker 125982 is on.
The transaction must be next to the last executed one on slave (gtid_slave_pos).
When this sort of hang occurs next time we will need to know
1. all details of at least slave_parallel_workers + 1 transactions starting off gtid_slave_pos.
mysqlbinlog -v for those or the whole pertinent master binlog file should be very helpful.
with the n of at least the number of slave workers, and
2. table definitions of the transaction range tables.
|
|
Binlog and schema are provided.
|
|
The 1st part of the fixes proves the fixes with the NEW master -> NEW slave.
It's ready for review at bb-10.4-andrei.
You can start on reviewing while the 2nd part that provides OLD -> NEW is at construction/testing.
|
|
Finished first round of review, left a few notes on commits bae30af and bdeaee3
|
|
Yes, I reviewed the patch already. It was mostly in good standing, and I imagine should be in the next release.
|
|
Approved the latest commit
|
|
Testing with the fixes looked okay for pushing.
|