Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Not a Bug
-
10.6.7, 10.7(EOL)
-
None
Description
Here is the test scenario user provide:
Following situation:
- 3 connections (1x master, 2x worker)
- autocommit = 0
- tx_isolation = read-committed on worker
DB layout:
CREATE TABLE `jobs` ( |
id int(11) NOT NULL AUTO_INCREMENT, |
state varchar(25) DEFAULT NULL, |
created timestamp NOT NULL DEFAULT current_timestamp(), |
PRIMARY KEY ("id") |
);
|
The master writes jobs in the table and then worker nodes fetch jobs, do some processing in the application and in the end delete the job from the table. The jobs table acts as a task queue and doesn't get any updates in this simplified test. Also this test is done in a controlled environment without any concurrent connections.
master is doing 5 insert statements
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> commit;
|
worker 1 takes all unlocked rows from table and locks them for update
mysql (test)> begin;
|
mysql (test)> select * from jobs for update skip locked;
|
+----+---------+---------------------+
|
| id | state | created |
|
+----+---------+---------------------+
|
| 1 | pending | 2022-03-03 17:11:33 |
|
...
|
| 5 | pending | 2022-03-03 17:11:37 |
|
+----+---------+---------------------+
|
5 rows in set (0.000 sec)
|
master does more inserts
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> insert into jobs (state) value('pending');
|
mysql (test)> insert into jobs (state) value('pending');
|
worker 2 takes unlocked rows
mysql (test)> select * from jobs for update skip locked;
|
+----+---------+---------------------+
|
| id | state | created |
|
+----+---------+---------------------+
|
| 6 | pending | 2022-03-03 17:18:22 |
|
...
|
| 10 | pending | 2022-03-03 17:18:24 |
|
+----+---------+---------------------+
|
5 rows in set (0.001 sec)
|
worker 1 finished working on his jobs and deletes his locked rows. this is where it starts to get weird. If deleting all rows in a single where in () clause it will wait for worker 2, even though both shouldn't have any dependencies.
mysql (test)> delete from jobs where id in (1,2,3,4,5);
|
... waiting
|
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
|
If deleting only one row at once (no matter which one) it will work.
mysql (test)> delete from jobs where id = 1;
|
Query OK, 1 row affected (0.001 sec)
|
It will even work when some rows are deleted at once:
mysql (test)> delete from jobs where id in (1,2,3);
|
Query OK, 2 rows affected (0.001 sec)
|
However, in this example row 4 and 5 cannot be deleted together with other rows:
mysql (test)> delete from jobs where id in (1,2,3,5);
|
...
|
ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction
|
Except for only row 4 and 5 are deleted alone
mysql (test)> delete from jobs where id in (4,5);
|
Query OK, 2 rows affected (0.001 sec)
|
Why can it happen that some rows can be deleted together? When deleting all rows one by one it works without any problems.
Another thing:
Worker 1 tries to delete rows
mysql (test)> delete from jobs where id in (1,2,3,4,5);
|
... waiting
|
Worker 2 tries to delete rows concurrently to Worker 1
mysql (test)> delete from jobs where id in (6,7,8,9,10);
|
ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction
|
Worker 1 succeeds
...
|
Query OK, 5 rows affected (48.561 sec)
|
Why is Worker 2 rolled back with a deadlock? There should be no overlapping locks as both locked only some rows. Out of curiosity I ran the same test against MySQL 8.0.28 where it succeded.
Testcase included, test.py. It requires python3, pymysql, threading
Attachments
Issue Links
- relates to
-
MDEV-13115 Implement SELECT [FOR UPDATE|LOCK IN SHARED MODE] SKIP LOCKED
- Closed
-
MDEV-16402 Support Index Condition Pushdown for clustered PK scans
- Confirmed
Activity
Field | Original Value | New Value |
---|---|---|
Fix Version/s | 10.6 [ 24028 ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Priority | Major [ 3 ] | Critical [ 2 ] |
Link |
This issue relates to |
Link | This issue relates to MDEV-16402 [ MDEV-16402 ] |
Component/s | Locking [ 10900 ] | |
Component/s | Optimizer [ 10200 ] | |
Component/s | Storage Engine - InnoDB [ 10129 ] | |
Fix Version/s | 10.7 [ 24805 ] | |
Fix Version/s | 10.8 [ 26121 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Affects Version/s | 10.7 [ 24805 ] | |
Assignee | Daniel Black [ danblack ] | Sergei Petrunia [ psergey ] |
Assignee | Sergei Petrunia [ psergey ] | Rex Johnston [ JIRAUSER52533 ] |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
Fix Version/s | 10.7 [ 24805 ] |
Fix Version/s | 10.8 [ 26121 ] |
Description |
Here is the test scenario user provide:
bq. Following situation: * 3 connections (1x master, 2x worker) * autocommit = 0 * tx_isolation = read-committed on worker DB layout: CREATE TABLE `jobs` ( id int(11) NOT NULL AUTO_INCREMENT, state varchar(25) DEFAULT NULL, created timestamp NOT NULL DEFAULT current_timestamp(), PRIMARY KEY ("id") ); The master writes jobs in the table and then worker nodes fetch jobs, do some processing in the application and in the end delete the job from the table. The jobs table acts as a task queue and doesn't get any updates in this simplified test. Also this test is done in a controlled environment without any concurrent connections. master is doing 5 insert statements {code} mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> commit; {code} worker 1 takes all unlocked rows from table and locks them for update {code} mysql (test)> begin; mysql (test)> select * from jobs for update skip locked; +----+---------+---------------------+ | id | state | created | +----+---------+---------------------+ | 1 | pending | 2022-03-03 17:11:33 | ... | 5 | pending | 2022-03-03 17:11:37 | +----+---------+---------------------+ 5 rows in set (0.000 sec) {code} master does more inserts {code} mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); {code} worker 2 takes unlocked rows {code} mysql (test)> select * from jobs for update skip locked; +----+---------+---------------------+ | id | state | created | +----+---------+---------------------+ | 6 | pending | 2022-03-03 17:18:22 | ... | 10 | pending | 2022-03-03 17:18:24 | +----+---------+---------------------+ 5 rows in set (0.001 sec) {code} worker 1 finished working on his jobs and deletes his locked rows. this is where it starts to get weird. If deleting all rows in a single {{where in ()}} clause it will wait for worker 2, even though both shouldn't have any dependencies. {code} mysql (test)> delete from jobs where id in (1,2,3,4,5); ... waiting ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction {code} If deleting only one row at once (no matter which one) it will work. {code} mysql (test)> delete from jobs where id = 1; Query OK, 1 row affected (0.001 sec) {code} It will even work when some rows are deleted at once: {code} mysql (test)> delete from jobs where id in (1,2,3); Query OK, 2 rows affected (0.001 sec) {code} However, in this example row 4 and 5 cannot be deleted together with other rows: {code} mysql (test)> delete from jobs where id in (1,2,3,5); ... ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction {code} Except for only row 4 and 5 are deleted alone {code} mysql (test)> delete from jobs where id in (4,5); Query OK, 2 rows affected (0.001 sec) {code} Why can it happen that some rows can be deleted together? When deleting all rows one by one it works without any problems. Another thing: Worker 1 tries to delete rows {code} mysql (test)> delete from jobs where id in (1,2,3,4,5); ... waiting {code} Worker 2 tries to delete rows concurrently to Worker 1 {code} mysql (test)> delete from jobs where id in (6,7,8,9,10); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction {code} Worker 1 succeeds {code} ... Query OK, 5 rows affected (48.561 sec) {code} Why is Worker 2 rolled back with a deadlock? There should be no overlapping locks as both locked only some rows. Out of curiosity I ran the same test against MySQL 8.0.28 where it succeded. Testcase included, test.py. It requires python3, pymysql, threading |
Here is the test scenario user provide:
Following situation: * 3 connections (1x master, 2x worker) * autocommit = 0 * tx_isolation = read-committed on worker DB layout: CREATE TABLE `jobs` ( id int(11) NOT NULL AUTO_INCREMENT, state varchar(25) DEFAULT NULL, created timestamp NOT NULL DEFAULT current_timestamp(), PRIMARY KEY ("id") ); The master writes jobs in the table and then worker nodes fetch jobs, do some processing in the application and in the end delete the job from the table. The jobs table acts as a task queue and doesn't get any updates in this simplified test. Also this test is done in a controlled environment without any concurrent connections. master is doing 5 insert statements {code} mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> commit; {code} worker 1 takes all unlocked rows from table and locks them for update {code} mysql (test)> begin; mysql (test)> select * from jobs for update skip locked; +----+---------+---------------------+ | id | state | created | +----+---------+---------------------+ | 1 | pending | 2022-03-03 17:11:33 | ... | 5 | pending | 2022-03-03 17:11:37 | +----+---------+---------------------+ 5 rows in set (0.000 sec) {code} master does more inserts {code} mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); {code} worker 2 takes unlocked rows {code} mysql (test)> select * from jobs for update skip locked; +----+---------+---------------------+ | id | state | created | +----+---------+---------------------+ | 6 | pending | 2022-03-03 17:18:22 | ... | 10 | pending | 2022-03-03 17:18:24 | +----+---------+---------------------+ 5 rows in set (0.001 sec) {code} worker 1 finished working on his jobs and deletes his locked rows. this is where it starts to get weird. If deleting all rows in a single {{where in ()}} clause it will wait for worker 2, even though both shouldn't have any dependencies. {code} mysql (test)> delete from jobs where id in (1,2,3,4,5); ... waiting ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction {code} If deleting only one row at once (no matter which one) it will work. {code} mysql (test)> delete from jobs where id = 1; Query OK, 1 row affected (0.001 sec) {code} It will even work when some rows are deleted at once: {code} mysql (test)> delete from jobs where id in (1,2,3); Query OK, 2 rows affected (0.001 sec) {code} However, in this example row 4 and 5 cannot be deleted together with other rows: {code} mysql (test)> delete from jobs where id in (1,2,3,5); ... ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction {code} Except for only row 4 and 5 are deleted alone {code} mysql (test)> delete from jobs where id in (4,5); Query OK, 2 rows affected (0.001 sec) {code} Why can it happen that some rows can be deleted together? When deleting all rows one by one it works without any problems. Another thing: Worker 1 tries to delete rows {code} mysql (test)> delete from jobs where id in (1,2,3,4,5); ... waiting {code} Worker 2 tries to delete rows concurrently to Worker 1 {code} mysql (test)> delete from jobs where id in (6,7,8,9,10); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction {code} Worker 1 succeeds {code} ... Query OK, 5 rows affected (48.561 sec) {code} Why is Worker 2 rolled back with a deadlock? There should be no overlapping locks as both locked only some rows. Out of curiosity I ran the same test against MySQL 8.0.28 where it succeded. Testcase included, test.py. It requires python3, pymysql, threading |
Description |
Here is the test scenario user provide:
Following situation: * 3 connections (1x master, 2x worker) * autocommit = 0 * tx_isolation = read-committed on worker DB layout: CREATE TABLE `jobs` ( id int(11) NOT NULL AUTO_INCREMENT, state varchar(25) DEFAULT NULL, created timestamp NOT NULL DEFAULT current_timestamp(), PRIMARY KEY ("id") ); The master writes jobs in the table and then worker nodes fetch jobs, do some processing in the application and in the end delete the job from the table. The jobs table acts as a task queue and doesn't get any updates in this simplified test. Also this test is done in a controlled environment without any concurrent connections. master is doing 5 insert statements {code} mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> commit; {code} worker 1 takes all unlocked rows from table and locks them for update {code} mysql (test)> begin; mysql (test)> select * from jobs for update skip locked; +----+---------+---------------------+ | id | state | created | +----+---------+---------------------+ | 1 | pending | 2022-03-03 17:11:33 | ... | 5 | pending | 2022-03-03 17:11:37 | +----+---------+---------------------+ 5 rows in set (0.000 sec) {code} master does more inserts {code} mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); {code} worker 2 takes unlocked rows {code} mysql (test)> select * from jobs for update skip locked; +----+---------+---------------------+ | id | state | created | +----+---------+---------------------+ | 6 | pending | 2022-03-03 17:18:22 | ... | 10 | pending | 2022-03-03 17:18:24 | +----+---------+---------------------+ 5 rows in set (0.001 sec) {code} worker 1 finished working on his jobs and deletes his locked rows. this is where it starts to get weird. If deleting all rows in a single {{where in ()}} clause it will wait for worker 2, even though both shouldn't have any dependencies. {code} mysql (test)> delete from jobs where id in (1,2,3,4,5); ... waiting ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction {code} If deleting only one row at once (no matter which one) it will work. {code} mysql (test)> delete from jobs where id = 1; Query OK, 1 row affected (0.001 sec) {code} It will even work when some rows are deleted at once: {code} mysql (test)> delete from jobs where id in (1,2,3); Query OK, 2 rows affected (0.001 sec) {code} However, in this example row 4 and 5 cannot be deleted together with other rows: {code} mysql (test)> delete from jobs where id in (1,2,3,5); ... ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction {code} Except for only row 4 and 5 are deleted alone {code} mysql (test)> delete from jobs where id in (4,5); Query OK, 2 rows affected (0.001 sec) {code} Why can it happen that some rows can be deleted together? When deleting all rows one by one it works without any problems. Another thing: Worker 1 tries to delete rows {code} mysql (test)> delete from jobs where id in (1,2,3,4,5); ... waiting {code} Worker 2 tries to delete rows concurrently to Worker 1 {code} mysql (test)> delete from jobs where id in (6,7,8,9,10); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction {code} Worker 1 succeeds {code} ... Query OK, 5 rows affected (48.561 sec) {code} Why is Worker 2 rolled back with a deadlock? There should be no overlapping locks as both locked only some rows. Out of curiosity I ran the same test against MySQL 8.0.28 where it succeded. Testcase included, test.py. It requires python3, pymysql, threading |
Here is the test scenario user provide:
Following situation: * 3 connections (1x master, 2x worker) * autocommit = 0 * tx_isolation = read-committed on worker DB layout:{code:sql} CREATE TABLE `jobs` ( id int(11) NOT NULL AUTO_INCREMENT, state varchar(25) DEFAULT NULL, created timestamp NOT NULL DEFAULT current_timestamp(), PRIMARY KEY ("id") );{code} The master writes jobs in the table and then worker nodes fetch jobs, do some processing in the application and in the end delete the job from the table. The jobs table acts as a task queue and doesn't get any updates in this simplified test. Also this test is done in a controlled environment without any concurrent connections. master is doing 5 insert statements {code} mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> commit; {code} worker 1 takes all unlocked rows from table and locks them for update {code} mysql (test)> begin; mysql (test)> select * from jobs for update skip locked; +----+---------+---------------------+ | id | state | created | +----+---------+---------------------+ | 1 | pending | 2022-03-03 17:11:33 | ... | 5 | pending | 2022-03-03 17:11:37 | +----+---------+---------------------+ 5 rows in set (0.000 sec) {code} master does more inserts {code} mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); mysql (test)> insert into jobs (state) value('pending'); {code} worker 2 takes unlocked rows {code} mysql (test)> select * from jobs for update skip locked; +----+---------+---------------------+ | id | state | created | +----+---------+---------------------+ | 6 | pending | 2022-03-03 17:18:22 | ... | 10 | pending | 2022-03-03 17:18:24 | +----+---------+---------------------+ 5 rows in set (0.001 sec) {code} worker 1 finished working on his jobs and deletes his locked rows. this is where it starts to get weird. If deleting all rows in a single {{where in ()}} clause it will wait for worker 2, even though both shouldn't have any dependencies. {code} mysql (test)> delete from jobs where id in (1,2,3,4,5); ... waiting ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction {code} If deleting only one row at once (no matter which one) it will work. {code} mysql (test)> delete from jobs where id = 1; Query OK, 1 row affected (0.001 sec) {code} It will even work when some rows are deleted at once: {code} mysql (test)> delete from jobs where id in (1,2,3); Query OK, 2 rows affected (0.001 sec) {code} However, in this example row 4 and 5 cannot be deleted together with other rows: {code} mysql (test)> delete from jobs where id in (1,2,3,5); ... ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction {code} Except for only row 4 and 5 are deleted alone {code} mysql (test)> delete from jobs where id in (4,5); Query OK, 2 rows affected (0.001 sec) {code} Why can it happen that some rows can be deleted together? When deleting all rows one by one it works without any problems. Another thing: Worker 1 tries to delete rows {code} mysql (test)> delete from jobs where id in (1,2,3,4,5); ... waiting {code} Worker 2 tries to delete rows concurrently to Worker 1 {code} mysql (test)> delete from jobs where id in (6,7,8,9,10); ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction {code} Worker 1 succeeds {code} ... Query OK, 5 rows affected (48.561 sec) {code} Why is Worker 2 rolled back with a deadlock? There should be no overlapping locks as both locked only some rows. Out of curiosity I ran the same test against MySQL 8.0.28 where it succeded. Testcase included, test.py. It requires python3, pymysql, threading |
Fix Version/s | 10.11 [ 27614 ] |
Attachment | test-1.py [ 70749 ] |
Fix Version/s | 10.6.14 [ 28914 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Fix Version/s | 10.9 [ 26905 ] | |
Fix Version/s | 10.10 [ 27530 ] | |
Fix Version/s | 10.11 [ 27614 ] | |
Resolution | Not a Bug [ 6 ] | |
Status | In Progress [ 3 ] | Closed [ 6 ] |
Zendesk Related Tickets | 178763 |
I checked the conflicting lock with the help of https://rr-project.org and an empty database that was bootstrapped by mtr as follows:
cd mysql-test
./mtr innodb.innodb
mkdir /dev/shm/rr
_RR_TRACE_DIR=/dev/shm/rr rr record ../sql/mariadbd --datadir $(pwd)/mysql-test/var/install.db --innodb-log-file-size=10m --innodb-buffer-pool-size=10m &
python3 test.py
I had to add user='root' to the connect clauses and remove " from the CREATE TABLE statement so that it would not throw an error. I also replaced the table name jobs in case it is a reserved word in 10.7 (the version that I happened to use), but in the output below I replaced the original name jobs again.
The script instantly reported a lock wait timeout. I then shut down the server by ctrl-\ (SIGQUIT) and checked the trace:
rr replay /dev/shm/rr/latest-trace
break lock_wait
continue
next
next
…
watch -l wait_lock.trx
reverse-continue
watch -l c_lock.trx
reverse-continue
That is, I followed the execution backwards, using to find where the waiting lock request and the conflicting lock were created.
I see nothing wrong in the execution that I captured:
I am not sure, but it may be that fixing MDEV-16402 would avoid the unnecessary locking of the out-of-range record id=16.