Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.4.12
-
None
Description
It looks like there is a race condition between BACKUP STAGE BLOCK_COMMIT in mariabackup and parallel replication.
Replication got deadlocked and we needed to kill the backup process in order to recover.
Show processlist and show engine innodb below.
Rick
+----------+--------------+--------------------+------------------------+--------------+--------+-----------------------------------------------+----------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+----------+--------------+--------------------+------------------------+--------------+--------+-----------------------------------------------+----------------------------------------------------------------------------------------------+----------+
|
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
|
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
|
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
|
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
|
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
|
| 10 | system user | | NULL | Slave_IO | 913176 | Waiting for master to send event | NULL | 0.000 |
|
| 12 | system user | | NULL | Slave_worker | 913176 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 13 | system user | | NULL | Slave_worker | 913176 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 14 | system user | | NULL | Slave_worker | 913176 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 15 | system user | | NULL | Slave_worker | 913176 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 16 | system user | | NULL | Slave_worker | 913176 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 17 | system user | | NULL | Slave_worker | 913176 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 18 | system user | | NULL | Slave_worker | 913176 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 19 | system user | | NULL | Slave_worker | 913176 | Waiting for work from SQL thread | NULL | 0.000 |
|
| 20 | system user | | NULL | Slave_worker | 31214 | Waiting for backup lock | NULL | 0.000 |
|
| 21 | system user | | NULL | Slave_worker | 31215 | Waiting for backup lock | UPDATE `customer_schema`.`heartbeat` SET ts='2020-03-16T01:12:54.009860' WHERE id='1' | 0.000 |
|
| 22 | system user | | NULL | Slave_worker | 31215 | Waiting for backup lock | NULL | 0.000 |
|
| 23 | system user | | NULL | Slave_worker | 31215 | Waiting for prior transaction to commit | NULL | 0.000 |
|
| 25 | system user | | NULL | Slave_worker | 31215 | Waiting for backup lock | NULL | 0.000 |
|
| 26 | system user | | NULL | Slave_worker | 31215 | Waiting for backup lock | NULL | 0.000 |
|
| 24 | system user | | NULL | Slave_worker | 31215 | Waiting for prior transaction to commit | NULL | 0.000 |
|
| 27 | system user | | NULL | Slave_worker | 31215 | Waiting for backup lock | NULL | 0.000 |
|
| 11 | system user | | NULL | Slave_SQL | 31224 | Waiting for room in worker thread event queue | NULL | 0.000 |
|
| 63 | monyogmon | 192.168.4.5:32729 | NULL | Sleep | 0 | | NULL | 0.000 |
|
| 2473 | newrelic | 127.0.0.1:39359 | NULL | Sleep | 5 | | NULL | 0.000 |
|
| 5259 | monyogmon | 192.168.4.5:32790 | NULL | Sleep | 18 | | NULL | 0.000 |
|
| 29385185 | mariabackup | localhost | NULL | Query | 31215 | Waiting for backup lock | BACKUP STAGE BLOCK_COMMIT | 0.000 |
|
| 29941894 | user | 192.168.4.40:21990 | customer_schema | Sleep | 0 | | NULL | 0.000 |
|
| 31892219 | root | localhost | mysql | Query | 2552 | Waiting for backup lock | CREATE USER IF NOT EXISTS 'user'@'192.168.4.40' | 0.000 |
|
| 31994422 | root | localhost | NULL | Sleep | 1282 | | NULL | 0.000 |
|
| 31994463 | mariadbadmin | localhost | NULL | Query | 0 | Init | show processlist | 0.000 |
|
+----------+--------------+--------------------+------------------------+--------------+--------+-----------------------------------------------+----------------------------------------------------------------------------------------------+----------+
|
---TRANSACTION 99732954594, ACTIVE (PREPARED) 31143 sec
|
6 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 2
|
MySQL thread id 24, OS thread handle 139941516424960, query id 2210114407 Waiting for prior transaction to commit
|
---TRANSACTION 421513941321144, not started
|
0 lock struct(s), heap size 1136, 0 row lock(s)
|
---TRANSACTION 99732954650, ACTIVE 31143 sec
|
3 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 2
|
MySQL thread id 26, OS thread handle 139941516629760, query id 2210114591 Waiting for backup lock
|
---TRANSACTION 99732954630, ACTIVE 31143 sec
|
3 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 3
|
MySQL thread id 25, OS thread handle 139941516220160, query id 2210114496 Waiting for backup lock
|
---TRANSACTION 99732954619, ACTIVE 31143 sec
|
3 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 3
|
MySQL thread id 22, OS thread handle 139941517039360, query id 2210114460 Waiting for backup lock
|
---TRANSACTION 99732954598, ACTIVE (PREPARED) 31143 sec
|
2 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 2
|
MySQL thread id 23, OS thread handle 139941516834560, query id 2210114412 Waiting for prior transaction to commit
|
---TRANSACTION 99732954623, ACTIVE 31143 sec
|
3 lock struct(s), heap size 1136, 1 row lock(s), undo log entries 2
|
MySQL thread id 27, OS thread handle 139941516015360, query id 2210114468 Waiting for backup lock
|
---TRANSACTION 99732954626, ACTIVE 31143 sec
|
2 lock struct(s), heap size 1136, 0 row lock(s), undo log entries 4451
|
MySQL thread id 20, OS thread handle 139941517448960, query id 2210114475 Waiting for backup lock
|
---TRANSACTION 421513941291352, not started
|
0 lock struct(s), heap size 1136, 0 row lock(s)
|
---TRANSACTION 421513941287096, not started
|
0 lock struct(s), heap size 1136, 0 row lock(s)
|
Attachments
Issue Links
- is duplicated by
-
MDEV-33798 ROW base optimistic deadlock with concurrent writes on same table row and multi domain
-
- Closed
-
- relates to
-
MDEV-23586 Mariabackup: GTID saved for replication in 10.4.14 is wrong
-
- Closed
-
-
MDEV-30423 Deadlock on Replica during BACKUP STAGE BLOCK_COMMIT on XA transactions
-
- Closed
-
While merging this to 10.5, I omitted the changes to sql_class.cc:
diff --git a/sql/sql_class.cc b/sql/sql_class.cc
index 40e606425c5..15088148e02 100644
--- a/sql/sql_class.cc
+++ b/sql/sql_class.cc
@@ -1383,7 +1383,11 @@ void THD::update_all_stats()
void THD::init_for_queries()
{
set_time();
- ha_enable_transaction(this,TRUE);
+ /*
+ We don't need to call ha_enable_transaction() as we can't have
+ any active transactions that has to be commited
+ */
+ transaction.on= TRUE;
reset_root_defaults(mem_root, variables.query_alloc_block_size,
With the above change (or transaction->on instead of transaction.on), replication XA tests would crash in 10.5. I believed that the change is not wanted in 10.5 due to
MDEV-22531and related changes. All tests passed with that omission.