[MDEV-7830] parallel replication lock contention with DDL/DML on slave Created: 2015-03-25  Updated: 2022-09-12  Resolved: 2022-09-12

Status: Closed
Project: MariaDB Server
Component/s: OTHER
Affects Version/s: 10.0.17
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Kristian Nielsen
Resolution: Won't Fix Votes: 0
Labels: parallelslave


 Description   

This slave wasn't resolving its locks:

Trx total rec lock wait time 12700 SEC on some of the parallel threads:

MariaDB [hotspot]> CREATE TABLE UserExpiry SELECT Location, Expire FROM UserInfo;
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

MariaDB [hotspot]> SHOW ENGINE INNODB STATUS\G
*************************** 1. row ***************************
  Type: InnoDB
  Name: 
Status: 
=====================================
2015-03-25 13:36:21 7f96d3bf4700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 1 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 588997 srv_active, 0 srv_shutdown, 3166 srv_idle
srv_master_thread log flush and writes: 592163
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 15337153
OS WAIT ARRAY INFO: signal count 13206239
Mutex spin waits 42778016, rounds 498035174, OS waits 12195640
RW-shared spins 3104395, rounds 65795945, OS waits 1890737
RW-excl spins 3332601, rounds 52029014, OS waits 744942
Spin rounds per wait: 11.64 mutex, 21.19 RW-shared, 15.61 RW-excl
------------
TRANSACTIONS
------------
Trx id counter 2374048291
Purge done for trx's n:o < 2374048291 undo n:o < 0 state: running but idle
History list length 1224
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 2374048272, not started
MySQL thread id 377135, OS thread handle 0x7f9776c6b700, query id 117572064 223.27.66.13 overseer cleaning up
---TRANSACTION 2374048178, not started
MySQL thread id 373677, OS thread handle 0x7f96e346b700, query id 117571821 223.27.66.13 root cleaning up
---TRANSACTION 0, not started
MySQL thread id 371257, OS thread handle 0x7f96d3bf4700, query id 117572115 223.27.66.13 root init
SHOW ENGINE INNODB STATUS
---TRANSACTION 0, not started
MySQL thread id 371180, OS thread handle 0x7f971446b700, query id 117571987 223.27.66.13 root cleaning up
---TRANSACTION 2332704790, not started
MySQL thread id 97948, OS thread handle 0x7f96c53ab700, query id 0 Waiting for master to send event
---TRANSACTION 2374048234, not started
MySQL thread id 7, OS thread handle 0x7f96c52d0700, query id 117571901 Waiting for prior transaction to start commit before starting next transaction
---TRANSACTION 2374048203, not started
MySQL thread id 4, OS thread handle 0x7f96c51f5700, query id 117571860 Waiting for prior transaction to start commit before starting next transaction
---TRANSACTION 2374048246, not started
MySQL thread id 19, OS thread handle 0x7f96c5163700, query id 117571919 Waiting for prior transaction to start commit before starting next transaction
---TRANSACTION 2374048238, not started
MySQL thread id 11, OS thread handle 0x7f96c51ac700, query id 117571905 Waiting for prior transaction to start commit before starting next transaction
---TRANSACTION 2374048247, not started
MySQL thread id 9, OS thread handle 0x7f96c5088700, query id 117571921 Waiting for prior transaction to start commit before starting next transaction
---TRANSACTION 2374048245, not started
MySQL thread id 21, OS thread handle 0x7f980a86b700, query id 117571917 Waiting for prior transaction to start commit before starting next transaction
---TRANSACTION 2374047856, not started
MySQL thread id 18, OS thread handle 0x7f983bc48700, query id 117571289 Waiting for prior transaction to start commit before starting next transaction
---TRANSACTION 2318180099, not started
MySQL thread id 1, OS thread handle 0x7f96bc9fe700, query id 0 Waiting for background binlog tasks
---TRANSACTION 2374048288, ACTIVE (PREPARED) 0 sec
4 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 2
MySQL thread id 22, OS thread handle 0x7f9872d03700, query id 117572113 Waiting for prior transaction to commit
Trx #rec lock waits 159407 #table lock waits 142
Trx total rec lock wait time 12567 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."UserInfo" trx id 2374048288 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 15794 page no 1620 n bits 480 index "UserName" of table "hotspot"."UserInfo" trx table locks 2 total table locks 5  trx id 2374048288 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 15794 page no 1198 n bits 112 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 2 total table locks 5  trx id 2374048288 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048288 lock mode IX lock hold time 0 wait time before grant 0 
---TRANSACTION 2374048285, ACTIVE (PREPARED) 0 sec
6 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 4
MySQL thread id 3, OS thread handle 0x7f96c523e700, query id 117572111 Waiting for prior transaction to commit
Trx #rec lock waits 160163 #table lock waits 146
Trx total rec lock wait time 12561 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."radusage" trx id 2374048285 lock mode IX lock hold time 0 wait time before grant 0 
TABLE LOCK table "hotspot"."UserInfo" trx id 2374048285 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 15794 page no 1193 n bits 112 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 4 total table locks 5  trx id 2374048285 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
TABLE LOCK table "hotspot"."radacct" trx id 2374048285 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 11949 page no 138621 n bits 136 index "PRIMARY" of table "hotspot"."radacct" trx table locks 4 total table locks 5  trx id 2374048285 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048285 lock mode IX lock hold time 0 wait time before grant 0 
---TRANSACTION 2374048284, ACTIVE 0 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 2 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 1
MySQL thread id 14, OS thread handle 0x7f9872dde700, query id 117572106 update
REPLACE INTO radcheck (username, attribute, op, value) VALUES ('NOTREALUSER', 'password', ':=', 'NOTREALPASSWORD')
Trx #rec lock waits 159198 #table lock waits 137
Trx total rec lock wait time 12568 SEC
Trx total table lock wait time 0 SEC
------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 1 total table locks 3  trx id 2374048284 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
------------------
TABLE LOCK table "hotspot"."radcheck" trx id 2374048284 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 1 total table locks 3  trx id 2374048284 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 
---TRANSACTION 2374048283, ACTIVE (PREPARED) 0 sec
3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 2
MySQL thread id 15, OS thread handle 0x7f9872d4c700, query id 117572104 Waiting for prior transaction to commit
Trx #rec lock waits 160406 #table lock waits 150
Trx total rec lock wait time 12689 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."active_sessions" trx id 2374048283 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 152 page no 10110 n bits 80 index "PRIMARY" of table "hotspot"."active_sessions" trx table locks 2 total table locks 1  trx id 2374048283 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048283 lock mode IX lock hold time 0 wait time before grant 0 
---TRANSACTION 2374048282, ACTIVE (PREPARED) 0 sec
5 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 2
MySQL thread id 12, OS thread handle 0x7f96c5287700, query id 117572102 Waiting for prior transaction to commit
Trx #rec lock waits 159927 #table lock waits 139
Trx total rec lock wait time 12700 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."radcheck" trx id 2374048282 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 199 page no 533 n bits 664 index "UserName" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048282 lock_mode X lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 199 page no 1179 n bits 160 index "PRIMARY" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048282 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 199 page no 533 n bits 664 index "UserName" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048282 lock_mode X locks gap before rec lock hold time 0 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048282 lock mode IX lock hold time 0 wait time before grant 0 
---TRANSACTION 2374048281, ACTIVE (PREPARED) 0 sec
5 lock struct(s), heap size 1184, 7 row lock(s), undo log entries 4
MySQL thread id 20, OS thread handle 0x7f986f86f700, query id 117572100 closing tables
Trx #rec lock waits 158976 #table lock waits 157
Trx total rec lock wait time 12524 SEC
Trx total table lock wait time 1 SEC
TABLE LOCK table `hotspot`.`radreply` trx id 2374048281 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 11812 page no 599 n bits 752 index `UserName` of table `hotspot`.`radreply` trx table locks 2 total table locks 2  trx id 2374048281 lock_mode X lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 11812 page no 138 n bits 304 index `PRIMARY` of table `hotspot`.`radreply` trx table locks 2 total table locks 2  trx id 2374048281 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 11812 page no 599 n bits 752 index `UserName` of table `hotspot`.`radreply` trx table locks 2 total table locks 2  trx id 2374048281 lock_mode X locks gap before rec lock hold time 0 wait time before grant 0 
TABLE LOCK table `mysql`.`gtid_slave_pos` trx id 2374048281 lock mode IX lock hold time 0 wait time before grant 0 
---TRANSACTION 2374048280, ACTIVE (PREPARED) 0 sec
7 lock struct(s), heap size 1184, 5 row lock(s), undo log entries 7
MySQL thread id 5, OS thread handle 0x7f96c5362700, query id 117572098 closing tables
Trx #rec lock waits 158864 #table lock waits 153
Trx total rec lock wait time 12568 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."radusage" trx id 2374048280 lock mode IX lock hold time 0 wait time before grant 0 
TABLE LOCK table "hotspot"."UserInfo" trx id 2374048280 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 15794 page no 1148 n bits 112 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 4 total table locks 5  trx id 2374048280 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
TABLE LOCK table "hotspot"."radacct" trx id 2374048280 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 11949 page no 135350 n bits 144 index "PRIMARY" of table "hotspot"."radacct" trx table locks 4 total table locks 5  trx id 2374048280 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048280 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 6112 page no 3 n bits 336 index "PRIMARY" of table "mysql"."gtid_slave_pos" trx table locks 4 total table locks 12  trx id 2374048280 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
---TRANSACTION 2374048276, ACTIVE (PREPARED) 0 sec
6 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 4
MySQL thread id 13, OS thread handle 0x7f9872d95700, query id 117572093 Waiting for prior transaction to commit
Trx #rec lock waits 159427 #table lock waits 145
Trx total rec lock wait time 12551 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."radusage" trx id 2374048276 lock mode IX lock hold time 0 wait time before grant 0 
TABLE LOCK table "hotspot"."UserInfo" trx id 2374048276 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 15794 page no 509 n bits 104 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 4 total table locks 5  trx id 2374048276 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
TABLE LOCK table "hotspot"."radacct" trx id 2374048276 lock mode IX lock hold time 0 wait time before grant 0 
RECORD LOCKS space id 11949 page no 138781 n bits 136 index "PRIMARY" of table "hotspot"."radacct" trx table locks 4 total table locks 5  trx id 2374048276 lock_mode X locks rec but not gap lock hold time 0 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048276 lock mode IX lock hold time 0 wait time before grant 0 
---TRANSACTION 2374048253, ACTIVE (PREPARED) 51 sec
3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 2
MySQL thread id 10, OS thread handle 0x7f9872e27700, query id 117571943 Waiting for prior transaction to commit
Trx #rec lock waits 159074 #table lock waits 147
Trx total rec lock wait time 12529 SEC
Trx total table lock wait time 1 SEC
TABLE LOCK table "hotspot"."radacct" trx id 2374048253 lock mode IX lock hold time 51 wait time before grant 0 
RECORD LOCKS space id 11949 page no 127327 n bits 136 index "PRIMARY" of table "hotspot"."radacct" trx table locks 2 total table locks 5  trx id 2374048253 lock_mode X locks rec but not gap lock hold time 51 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048253 lock mode IX lock hold time 51 wait time before grant 0 
---TRANSACTION 2374048252, ACTIVE (PREPARED) 51 sec
3 lock struct(s), heap size 1184, 1 row lock(s), undo log entries 1
MySQL thread id 17, OS thread handle 0x7f9872cba700, query id 117571938 Waiting for prior transaction to commit
COMMIT
Trx #rec lock waits 159652 #table lock waits 145
Trx total rec lock wait time 12876 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."badauth_counts" trx id 2374048252 lock mode IX lock hold time 51 wait time before grant 0 
RECORD LOCKS space id 2864 page no 5 n bits 160 index "username" of table "hotspot"."badauth_counts" trx table locks 2 total table locks 1  trx id 2374048252 lock_mode X locks gap before rec lock hold time 51 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048252 lock mode IX lock hold time 51 wait time before grant 0 
---TRANSACTION 2374048251, ACTIVE (PREPARED) 51 sec
6 lock struct(s), heap size 1184, 2 row lock(s), undo log entries 4
MySQL thread id 6, OS thread handle 0x7f96c511a700, query id 117571935 Waiting for prior transaction to commit
Trx #rec lock waits 159091 #table lock waits 152
Trx total rec lock wait time 12655 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."radusage" trx id 2374048251 lock mode IX lock hold time 51 wait time before grant 0 
TABLE LOCK table "hotspot"."UserInfo" trx id 2374048251 lock mode IX lock hold time 51 wait time before grant 0 
RECORD LOCKS space id 15794 page no 1216 n bits 104 index "PRIMARY" of table "hotspot"."UserInfo" trx table locks 4 total table locks 5  trx id 2374048251 lock_mode X locks rec but not gap lock hold time 51 wait time before grant 0 
TABLE LOCK table "hotspot"."radacct" trx id 2374048251 lock mode IX lock hold time 51 wait time before grant 0 
RECORD LOCKS space id 11949 page no 138773 n bits 136 index "PRIMARY" of table "hotspot"."radacct" trx table locks 4 total table locks 5  trx id 2374048251 lock_mode X locks rec but not gap lock hold time 51 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048251 lock mode IX lock hold time 51 wait time before grant 0 
---TRANSACTION 2374048249, ACTIVE (PREPARED) 51 sec
2 lock struct(s), heap size 360, 0 row lock(s), undo log entries 2
MySQL thread id 8, OS thread handle 0x7f96c50d1700, query id 117571928 Waiting for prior transaction to commit
Trx #rec lock waits 159554 #table lock waits 148
Trx total rec lock wait time 12522 SEC
Trx total table lock wait time 1 SEC
TABLE LOCK table "hotspot"."radreply" trx id 2374048249 lock mode IX lock hold time 51 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048249 lock mode IX lock hold time 51 wait time before grant 0 
---TRANSACTION 2374048248, ACTIVE (PREPARED) 51 sec
5 lock struct(s), heap size 1184, 3 row lock(s), undo log entries 2
MySQL thread id 16, OS thread handle 0x7f96c5319700, query id 117571926 Waiting for prior transaction to commit
Trx #rec lock waits 159396 #table lock waits 168
Trx total rec lock wait time 12611 SEC
Trx total table lock wait time 0 SEC
TABLE LOCK table "hotspot"."radcheck" trx id 2374048248 lock mode IX lock hold time 51 wait time before grant 0 
RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048248 lock_mode X lock hold time 51 wait time before grant 0 
RECORD LOCKS space id 199 page no 1179 n bits 160 index "PRIMARY" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048248 lock_mode X locks rec but not gap lock hold time 51 wait time before grant 0 
RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 2 total table locks 3  trx id 2374048248 lock_mode X locks gap before rec lock hold time 51 wait time before grant 0 
TABLE LOCK table "mysql"."gtid_slave_pos" trx id 2374048248 lock mode IX lock hold time 51 wait time before grant 0 
--------
FILE I/O
--------

binlog and more detailed information in private upload (two files)



 Comments   
Comment by Daniel Black [ 2015-03-25 ]

looks like the only active transaction, 2374048284, is waiting on the lock (199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck) held by 2374048248, ACTIVE (PREPARED)

Comment by Elena Stepanova [ 2015-03-30 ]

knielsen,

Could you please take a look at the provided info to check if you see a clear bug in there, and if you do, if you need any more data to investigate it?

Thanks.

Comment by Kristian Nielsen [ 2015-04-28 ]

So we see in the output 20 parallel replication worker threads.
1 in update (waiting for row lock?)
2 in "closing tables"
rest in waiting for prior transactions.

It is not clear to me that this is not simply a long-running transaction
that takes more time than lock wait timeout?

------- TRX HAS BEEN WAITING 0 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 199 page no 868 n bits 832 index "UserName" of table "hotspot"."radcheck" trx table locks 1 total table locks 3  trx id 2374048284 lock_mode X locks gap before rec insert intention waiting lock hold time 0 wait time before grant 0 

If the transaction has been waiting only "0 seconds" then it suggests that
the system is actively progressing. Or alternatively, I am misunderstanding
what InnoDB output means (i am not familiar with it).

It is also possible that the parallel replication is hanging? There were a
number of parallel replication deadlocks fixed recently. It is not possible
to see from the provided information if any of these might be the cause (or
indeed even that there is a deadlock).

Comment by Kristian Nielsen [ 2015-04-28 ]

Hm, another thought: This CREATE TABLE UserExpiry SELECT ... is run as a
direct query on the slave (not replicated) ?

It is possible that this statement conflicts with locks needed by replicated
transactions, causing a deadlock. Such deadlocks may not be detected.

Say the direct transaction is U, and T1, T2 are being replicated in
parallel. All access table UserExpiry.

If T1 is waiting for a row lock held by U, and U is waiting for another row
lock held by T2, then this might cause a deadlock that is not automatically
detected (except by timeout). Because T2 is waiting for T1 to commit first.

This is kind of a general restriction with parallel replication, that the
DBA/application is responsible for not running any conflicting direct
transactions on the slave.

Though merely reading a table is not obviously something that should
conflict with replication. It might be worth investigating further if such
conflicts are actually possible (I did not test). Conflicts are only
possible for statements that keep row locks for the duration of the
transaction. It might be interesting to check if the conflict also occurs if
the CREATE is run in READ COMMITTED and with sql_log_bin=0.

Comment by Daniel Black [ 2015-04-29 ]

It is also possible that the parallel replication is hanging?

sounds plausible.

There were a number of parallel replication deadlocks fixed recently. It is not possible to see from the provided information if any of these might be the cause

Looking forward to the next release then.

(or indeed even that there is a deadlock).

So a MDEV-7802 back port could be useful.

This CREATE TABLE UserExpiry SELECT ... is run as a direct query on the slave (not replicated) ?

yes

If T1 is waiting for a row lock held by U, and U is waiting for another row lock held by T2, then this might cause a deadlock that is not automatically detected (except by timeout). Because T2 is waiting for T1 to commit first.

sounds plausible

This is kind of a general restriction with parallel replication, that the DBA/application is responsible for not running any conflicting direct

transactions on the slave.

sounds like a worth while addition to the documentation

It might be interesting to check if the conflict also occurs if the CREATE is run in READ COMMITTED and with sql_log_bin=0.

Ok. I will test this out.

Comment by Sergei Golubchik [ 2022-09-12 ]

10.0 was EOLed in March 2019

Generated at Thu Feb 08 07:22:35 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.