Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Not a Bug
-
10.1.9
Description
A MariaDB 10.1.9 slave was running for over 24 hours with no change in Relay_Master_Log_File:
MariaDB [db]> show slave status \G
|
*************************** 1. row ***************************
|
Slave_IO_State: Waiting for master to send event
|
Master_Host: master
|
Master_User: replication
|
Master_Port: 3306
|
Connect_Retry: 60
|
Master_Log_File: mysql-bin.000157
|
Read_Master_Log_Pos: 161873700
|
Relay_Log_File: relay_log.000143
|
Relay_Log_Pos: 654
|
Relay_Master_Log_File: mysql-bin.000027
|
Slave_IO_Running: Yes
|
Slave_SQL_Running: Yes
|
Replicate_Do_DB:
|
Replicate_Ignore_DB:
|
Replicate_Do_Table:
|
Replicate_Ignore_Table:
|
Replicate_Wild_Do_Table:
|
Replicate_Wild_Ignore_Table:
|
Last_Errno: 0
|
Last_Error:
|
Skip_Counter: 0
|
Exec_Master_Log_Pos: 366
|
Relay_Log_Space: 703475834482
|
Until_Condition: None
|
Until_Log_File:
|
Until_Log_Pos: 0
|
Master_SSL_Allowed: No
|
Master_SSL_CA_File:
|
Master_SSL_CA_Path:
|
Master_SSL_Cert:
|
Master_SSL_Cipher:
|
Master_SSL_Key:
|
Seconds_Behind_Master: 115303
|
Master_SSL_Verify_Server_Cert: No
|
Last_IO_Errno: 0
|
Last_IO_Error:
|
Last_SQL_Errno: 0
|
Last_SQL_Error:
|
Replicate_Ignore_Server_Ids:
|
Master_Server_Id: 1
|
Master_SSL_Crl:
|
Master_SSL_Crlpath:
|
Using_Gtid: No
|
Gtid_IO_Pos:
|
Replicate_Do_Domain_Ids:
|
Replicate_Ignore_Domain_Ids:
|
Parallel_Mode: optimistic
|
It was assumed that the slave was stuck, so the user executed STOP SLAVE. This also hung. SHOW FULL PROCESSLIST output was obtained at that time:
MariaDB [db]> show full processlist;
|
+-----+-----------------+--------------------------------+-------------+---
|
----------+-------+--------------------------------------------------------
|
---------------+--------------------------------+----------+
|
| Id | User | Host | db |
|
Command | Time | State
|
| Info | Progress |
|
+-----+-----------------+--------------------------------+-------------+---
|
----------+-------+--------------------------------------------------------
|
---------------+--------------------------------+----------+
|
| 2 | event_scheduler | localhost | NULL |
|
Daemon | 99968 | Waiting on empty queue
|
| NULL | 0.000 |
|
| 4 | system user | | NULL |
|
Connect | 81773 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 5 | system user | | NULL |
|
Connect | 99968 | Waiting for master to send event
|
| NULL | 0.000 |
|
| 6 | system user | | NULL |
|
Connect | 0 | Write_rows_log_event::write_row(-1)
|
| NULL | 0.000 |
|
| 7 | system user | | NULL |
|
Connect | 80392 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 8 | system user | | NULL |
|
Connect | 82490 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 9 | system user | | NULL |
|
Connect | 82490 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 10 | system user | | NULL |
|
Connect | 82490 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 11 | system user | | NULL |
|
Connect | 81774 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 12 | system user | | NULL |
|
Connect | 82490 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 13 | system user | | NULL |
|
Connect | 81779 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 14 | system user | | NULL |
|
Connect | 82489 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 15 | system user | | NULL |
|
Connect | 81774 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 16 | system user | | NULL |
|
Connect | 81773 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 17 | system user | | NULL |
|
Connect | 44823 | Reading event from the relay log
|
| NULL | 0.000 |
|
| 19 | replication | master:47237 | NULL |
|
Binlog Dump | 99910 | Master has sent all binlog to slave; waiting for
|
binlog to be updated | NULL | 0.000 |
|
| 94 | root | master:50113 | NULL |
|
Query | 2652 | Filling schema table
|
| SHOW /*innotop*/ GLOBAL STATUS | 0.000 |
|
| 95 | root | localhost | db |
|
Killed | 2378 | init
|
| show slave status | 0.000 |
|
| 96 | root | localhost | db |
|
Query | 2730 | Killing slave
|
| stop slave | 0.000 |
|
| 97 | root | localhost | NULL |
|
Killed | 2421 | Filling schema table
|
| show status | 0.000 |
|
| 100 | root | localhost | NULL |
|
Killed | 2351 | Filling schema table
|
| show global status | 0.000 |
|
| 103 | root | localhost | NULL |
|
Killed | 1670 | Filling schema table
|
| show status | 0.000 |
|
| 106 | root | localhost | db |
|
Killed | 86 | Filling schema table
|
| show global variables | 0.000 |
|
| 111 | root | localhost | db |
|
Query | 0 | init
|
| show full processlist | 0.000 |
|
+-----+-----------------+--------------------------------+-------------+---
|
----------+-------+--------------------------------------------------------
|
---------------+--------------------------------+----------+
|
We thought that the one active worker thread may be holding things up, so we stopped that by executing `KILL 6`, but the SQL thread automatically restarted the thread.
We killed each thread one by one to see if that would free up any mutexes, but eventually all replication-related threads were killed, except for the IO thread (which had automatically restarted after being killed), and the user executed STOP SLAVE again, which hung again:
MariaDB [(none)]> show processlist;
|
+-----+-----------------+--------------------------------+-------------+---
|
----------+--------+-------------------------------------------------------
|
----------------+--------------------------------+----------+
|
| Id | User | Host | db |
|
Command | Time | State
|
| Info | Progress |
|
+-----+-----------------+--------------------------------+-------------+---
|
----------+--------+-------------------------------------------------------
|
----------------+--------------------------------+----------+
|
| 2 | event_scheduler | localhost | NULL |
|
Daemon | 106027 | Waiting on empty queue
|
| NULL | 0.000 |
|
| 4 | system user | | NULL |
|
Killed | 87833 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 5 | system user | | NULL |
|
Killed | 106027 | Waiting for slave mutex on exit
|
| NULL | 0.000 |
|
| 6 | system user | | NULL |
|
Killed | 0 | Write_rows_log_event::write_row(-1)
|
| NULL | 0.000 |
|
| 7 | system user | | NULL |
|
Killed | 86452 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 8 | system user | | NULL |
|
Killed | 88549 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 9 | system user | | NULL |
|
Killed | 88549 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 10 | system user | | NULL |
|
Killed | 88549 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 11 | system user | | NULL |
|
Killed | 87833 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 12 | system user | | NULL |
|
Killed | 88549 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 13 | system user | | NULL |
|
Killed | 87838 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 14 | system user | | NULL |
|
Killed | 88549 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 15 | system user | | NULL |
|
Killed | 87833 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 16 | system user | | NULL |
|
Killed | 87833 | Waiting for work from SQL thread
|
| NULL | 0.000 |
|
| 17 | system user | | NULL |
|
Killed | 50883 | Reading event from the relay log
|
| NULL | 0.000 |
|
| 94 | root | master:50113 | NULL |
|
Query | 8711 | Filling schema table
|
| SHOW /*innotop*/ GLOBAL STATUS | 0.000 |
|
| 95 | root | localhost | db |
|
Killed | 8437 | init
|
| show slave status | 0.000 |
|
| 96 | root | localhost | db |
|
Killed | 8789 | Killing slave
|
| stop slave | 0.000 |
|
| 97 | root | localhost | NULL |
|
Killed | 8480 | Filling schema table
|
| show status | 0.000 |
|
| 100 | root | localhost | NULL |
|
Killed | 8411 | Filling schema table
|
| show global status | 0.000 |
|
| 103 | root | localhost | NULL |
|
Killed | 7729 | Filling schema table
|
| show status | 0.000 |
|
| 106 | root | localhost | db |
|
Killed | 6145 | Filling schema table
|
| show global variables | 0.000 |
|
| 111 | root | localhost | db |
|
Killed | 6027 | Filling schema table
|
| show global status | 0.000 |
|
| 115 | root | localhost | NULL |
|
Query | 0 | init
|
| show processlist | 0.000 |
|
| 116 | replication |master:55614 | NULL |
|
Binlog Dump | 772 | Master has sent all binlog to slave; waiting for
|
binlog to be updated | NULL | 0.000 |
|
| 117 | root | localhost | NULL |
|
Query | 64 | init
|
| stop slave | 0.000 |
|
+-----+-----------------+--------------------------------+-------------+---
|
----------+--------+-------------------------------------------------------
|
----------------+--------------------------------+----------+
|