Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
Description
Read only slave with slave_parallel_threads=10 got deadlocked when mariabackup executed FTWRL and tried to copy non-InnoDB files and remaining part of the redo log. Neither replication, no mariabackup, nor later mariabackup calls could proceed.
In the processlist we see:
...
|
| 10 | system user | | NULL | Slave_IO | 1626750 | Waiting for master to send event | NULL | 0.000 |
|
| 13 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 12 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 14 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 15 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 16 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 17 | system user | | NULL | Slave_worker | 184973 | Waiting for global read lock | NULL | 0.000 |
|
| 18 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 19 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 20 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 21 | system user | | NULL | Slave_worker | 184973 | Waiting for prior transaction to start commit before starting next transaction | NULL | 0.000 |
|
| 11 | system user | | NULL | Slave_SQL | 187284 | Waiting for room in worker thread event queue | NULL | 0.000 |
|
| 271217 | dbcleaner | some_host:51410 | sa2 | Query | 0 | Waiting in MASTER_GTID_WAIT() (primary waiter) | SELECT MASTER_GTID_WAIT('0-11-5381276620', 4) | 0.000 |
|
...
|
| 402858 | bkpuser | localhost | NULL | Sleep | 184970 | | NULL | 0.000 |
|
| 427762 | bkpuser | localhost | NULL | Query | 98674 | Waiting for worker threads to pause for global read lock | FLUSH TABLES WITH READ LOCK | 0.000 |
|
| 452639 | bkpuser | localhost | NULL | Query | 12636 | Waiting while replication worker thread pool is busy | FLUSH TABLES WITH READ LOCK | 0.000 |
|
| 469510 | dbcleaner | some_host:63978 | sa2 | Sleep | 19 | | NULL | 0.000 |
|
| 469541 | dbcleaner | localhost | NULL | Query | 0 | init | show processlist | 0.000 |
|
See backtrace of all threads attached.
The oldest (3rd) maraibackup session (working as bkpuser) hangs with these last messages in the log:
...
[01] 2019-12-07 05:55:36 ...done
[01] 2019-12-07 05:55:36 Copying ./performance_schema/db.opt to /nfs/backup/2019-12-06/FULL/performance_schema/db.opt
[01] 2019-12-07 05:55:36 ...done
[00] 2019-12-07 05:55:36 Finished backing up non-InnoDB tables and files
[01] 2019-12-07 05:55:36 Copying aria_log_control to /nfs/backup/2019-12-06/FULL/aria_log_control
[01] 2019-12-07 05:55:36 ...done
[01] 2019-12-07 05:55:36 Copying aria_log.00000001 to /nfs/backup/2019-12-06/FULL/aria_log.00000001
[01] 2019-12-07 05:55:36 ...done
[00] 2019-12-07 05:55:36 Waiting for log copy thread to read lsn 78996210453049