[MDEV-27999] select~for update skip locked locks unnecessary record - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Not a Bug
Affects Version/s: 10.6.7, 10.7(EOL)
Fix Version/s: 10.6.14
Component/s: Locking, Optimizer, Storage Engine - InnoDB
Labels:
None

Description

Here is the test scenario user provide:

Following situation:

3 connections (1x master, 2x worker)
autocommit = 0
tx_isolation = read-committed on worker

DB layout:

CREATE TABLE `jobs` (

id int(11) NOT NULL AUTO_INCREMENT,

state varchar(25) DEFAULT NULL,

created timestamp NOT NULL DEFAULT current_timestamp(),

PRIMARY KEY ("id")

);

The master writes jobs in the table and then worker nodes fetch jobs, do some processing in the application and in the end delete the job from the table. The jobs table acts as a task queue and doesn't get any updates in this simplified test. Also this test is done in a controlled environment without any concurrent connections.

master is doing 5 insert statements

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> commit;

worker 1 takes all unlocked rows from table and locks them for update

mysql (test)> begin;

mysql (test)> select * from jobs for update skip locked;

+----+---------+---------------------+

| id | state | created |

+----+---------+---------------------+

| 1 | pending | 2022-03-03 17:11:33 |

...

| 5 | pending | 2022-03-03 17:11:37 |

+----+---------+---------------------+

5 rows in set (0.000 sec)

master does more inserts

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> insert into jobs (state) value('pending');

mysql (test)> insert into jobs (state) value('pending');

worker 2 takes unlocked rows

mysql (test)> select * from jobs for update skip locked;

+----+---------+---------------------+

| id | state | created |

+----+---------+---------------------+

| 6 | pending | 2022-03-03 17:18:22 |

...

| 10 | pending | 2022-03-03 17:18:24 |

+----+---------+---------------------+

5 rows in set (0.001 sec)

worker 1 finished working on his jobs and deletes his locked rows. this is where it starts to get weird. If deleting all rows in a single where in () clause it will wait for worker 2, even though both shouldn't have any dependencies.

mysql (test)> delete from jobs where id in (1,2,3,4,5);

... waiting

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

If deleting only one row at once (no matter which one) it will work.

mysql (test)> delete from jobs where id = 1;

Query OK, 1 row affected (0.001 sec)

It will even work when some rows are deleted at once:

mysql (test)> delete from jobs where id in (1,2,3);

Query OK, 2 rows affected (0.001 sec)

However, in this example row 4 and 5 cannot be deleted together with other rows:

mysql (test)> delete from jobs where id in (1,2,3,5);

...

ERROR 1205 (HY000): Lock wait timeout exceeded; try restarting transaction

Except for only row 4 and 5 are deleted alone

mysql (test)> delete from jobs where id in (4,5);

Query OK, 2 rows affected (0.001 sec)

Why can it happen that some rows can be deleted together? When deleting all rows one by one it works without any problems.

Another thing:
Worker 1 tries to delete rows

mysql (test)> delete from jobs where id in (1,2,3,4,5);

... waiting

Worker 2 tries to delete rows concurrently to Worker 1

mysql (test)> delete from jobs where id in (6,7,8,9,10);

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

Worker 1 succeeds

...

Query OK, 5 rows affected (48.561 sec)

Why is Worker 2 rolled back with a deadlock? There should be no overlapping locks as both locked only some rows. Out of curiosity I ran the same test against MySQL 8.0.28 where it succeded.
Testcase included, test.py. It requires python3, pymysql, threading

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

test.py
5 kB
2022-03-04 06:38
test-1.py
6 kB
2023-06-28 04:08

Issue Links

relates to

MDEV-13115 Implement SELECT [FOR UPDATE|LOCK IN SHARED MODE] SKIP LOCKED

Closed

MDEV-16402 Support Index Condition Pushdown for clustered PK scans

Confirmed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2022-06-22 06:58

I checked the conflicting lock with the help of https://rr-project.org and an empty database that was bootstrapped by mtr as follows:

cd mysql-test

./mtr innodb.innodb

mkdir /dev/shm/rr

_RR_TRACE_DIR=/dev/shm/rr rr record ../sql/mariadbd --datadir $(pwd)/mysql-test/var/install.db --innodb-log-file-size=10m --innodb-buffer-pool-size=10m &

python3 test.py

I had to add user='root' to the connect clauses and remove " from the CREATE TABLE statement so that it would not throw an error. I also replaced the table name jobs in case it is a reserved word in 10.7 (the version that I happened to use), but in the output below I replaced the original name jobs again.
The script instantly reported a lock wait timeout. I then shut down the server by ctrl-\ (SIGQUIT) and checked the trace:

rr replay /dev/shm/rr/latest-trace

break lock_wait

continue

next

next

…

watch -l wait_lock.trx

reverse-continue

watch -l c_lock.trx

reverse-continue

That is, I followed the execution backwards, using to find where the waiting lock request and the conflicting lock were created.

I see nothing wrong in the execution that I captured:

select id from jobs for update skip locked acquires an exclusive lock on a previously unlocked record
delete from jobs where id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) (without any SKIP LOCKED) will time out because the record with id=16 was already locked.

I am not sure, but it may be that fixing MDEV-16402 would avoid the unnecessary locking of the out-of-range record id=16.

Marko Mäkelä added a comment - 2022-06-22 06:58 I checked the conflicting lock with the help of https://rr-project.org and an empty database that was bootstrapped by mtr as follows: cd mysql-test ./mtr innodb.innodb mkdir /dev/shm/rr _RR_TRACE_DIR=/dev/shm/rr rr record ../sql/mariadbd --datadir $(pwd)/mysql-test/var/install.db --innodb-log-file-size=10m --innodb-buffer-pool-size=10m & python3 test.py I had to add user='root' to the connect clauses and remove " from the CREATE TABLE statement so that it would not throw an error. I also replaced the table name jobs in case it is a reserved word in 10.7 (the version that I happened to use), but in the output below I replaced the original name jobs again. The script instantly reported a lock wait timeout. I then shut down the server by ctrl-\ (SIGQUIT) and checked the trace: rr replay /dev/shm/rr/latest-trace break lock_wait continue next next … watch -l wait_lock.trx reverse-continue watch -l c_lock.trx reverse-continue That is, I followed the execution backwards, using to find where the waiting lock request and the conflicting lock were created. I see nothing wrong in the execution that I captured: select id from jobs for update skip locked acquires an exclusive lock on a previously unlocked record delete from jobs where id in (1,2,3,4,5,6,7,8,9,10,11,12,13,14,15) (without any SKIP LOCKED ) will time out because the record with id=16 was already locked. I am not sure, but it may be that fixing MDEV-16402 would avoid the unnecessary locking of the out-of-range record id=16 .

Sergei Golubchik added a comment - 2023-06-27 17:04 - edited

This is fixed in 11.0. That is, in 11.0 DELETE uses a different access method. It's "range" in 11.0 and "ALL" (full table scan) before 11.0.

I think that if DELETE is doing full table scan than it's very natural to expect it to wait on the lock, it'll try to read all rows, so eventually it'll hit one of those that is locked by another worker.

It'll likely be impossible to backport 11.0 fix to 10.6, there were lots of huge and very intrusive changes related to optimizer costs, they cannot be backported.

But you can force "range" in 10.6 too by

using FORCE INDEX, and
adding a second condition on any other column to suppress "using index" access method

For example

delete jobs.* from jobs force index(primary) where id in (1,2,3,4,5) and created>'2000-01-01';

Sergei Golubchik added a comment - 2023-06-27 17:04 - edited This is fixed in 11.0. That is, in 11.0 DELETE uses a different access method. It's "range" in 11.0 and "ALL" (full table scan) before 11.0. I think that if DELETE is doing full table scan than it's very natural to expect it to wait on the lock, it'll try to read all rows, so eventually it'll hit one of those that is locked by another worker. It'll likely be impossible to backport 11.0 fix to 10.6, there were lots of huge and very intrusive changes related to optimizer costs, they cannot be backported. But you can force "range" in 10.6 too by using FORCE INDEX , and adding a second condition on any other column to suppress "using index" access method For example delete jobs.* from jobs force index ( primary ) where id in (1,2,3,4,5) and created> '2000-01-01' ;

Rex Johnston added a comment - 2023-06-28 04:07

This issue does not affect version 11+, these versions choosing a range scan.
Versions prior to 11.0, for small tables like in this example will choose a full table scan.

A full table scan currently locks the whole table.

A simple workaround is

set sql_safe_updates = 1

in each of the worker threads.

Rex Johnston added a comment - 2023-06-28 04:07 This issue does not affect version 11+, these versions choosing a range scan. Versions prior to 11.0, for small tables like in this example will choose a full table scan. A full table scan currently locks the whole table. A simple workaround is set sql_safe_updates = 1 in each of the worker threads.

Rex Johnston added a comment - 2023-06-28 04:09

Attached, test-1.py is an altered version of the original, sidestepping the issue.

Rex Johnston added a comment - 2023-06-28 04:09 Attached, test-1.py is an altered version of the original, sidestepping the issue.

People

Assignee:: Rex Johnston

Reporter:: Allen Lee (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2022-03-04 06:38

Updated:: 2024-07-07 20:41

Resolved:: 2023-06-28 04:12

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration