[MDEV-27999] select~for update skip locked locks unnecessary record Created: 2022-03-04 Updated: 2023-07-04 Resolved: 2023-06-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Locking, Optimizer, Storage Engine - InnoDB |
| Affects Version/s: | 10.6.7, 10.7 |
| Fix Version/s: | 10.6.14 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Allen Lee (Inactive) | Assignee: | Rex Johnston |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||||||
| Issue Links: |
|
||||||||||||
| Description |
|
Here is the test scenario user provide: Following situation:
DB layout:
The master writes jobs in the table and then worker nodes fetch jobs, do some processing in the application and in the end delete the job from the table. The jobs table acts as a task queue and doesn't get any updates in this simplified test. Also this test is done in a controlled environment without any concurrent connections. master is doing 5 insert statements
worker 1 takes all unlocked rows from table and locks them for update
master does more inserts
worker 2 takes unlocked rows
worker 1 finished working on his jobs and deletes his locked rows. this is where it starts to get weird. If deleting all rows in a single where in () clause it will wait for worker 2, even though both shouldn't have any dependencies.
If deleting only one row at once (no matter which one) it will work.
It will even work when some rows are deleted at once:
However, in this example row 4 and 5 cannot be deleted together with other rows:
Except for only row 4 and 5 are deleted alone
Why can it happen that some rows can be deleted together? When deleting all rows one by one it works without any problems. Another thing:
Worker 2 tries to delete rows concurrently to Worker 1
Worker 1 succeeds
Why is Worker 2 rolled back with a deadlock? There should be no overlapping locks as both locked only some rows. Out of curiosity I ran the same test against MySQL 8.0.28 where it succeded. |
| Comments |
| Comment by Marko Mäkelä [ 2022-06-22 ] | |||||||||||||||
|
I checked the conflicting lock with the help of https://rr-project.org and an empty database that was bootstrapped by mtr as follows:
I had to add user='root' to the connect clauses and remove " from the CREATE TABLE statement so that it would not throw an error. I also replaced the table name jobs in case it is a reserved word in 10.7 (the version that I happened to use), but in the output below I replaced the original name jobs again.
That is, I followed the execution backwards, using to find where the waiting lock request and the conflicting lock were created. I see nothing wrong in the execution that I captured:
I am not sure, but it may be that fixing MDEV-16402 would avoid the unnecessary locking of the out-of-range record id=16. | |||||||||||||||
| Comment by Sergei Golubchik [ 2023-06-27 ] | |||||||||||||||
|
This is fixed in 11.0. That is, in 11.0 DELETE uses a different access method. It's "range" in 11.0 and "ALL" (full table scan) before 11.0. I think that if DELETE is doing full table scan than it's very natural to expect it to wait on the lock, it'll try to read all rows, so eventually it'll hit one of those that is locked by another worker. It'll likely be impossible to backport 11.0 fix to 10.6, there were lots of huge and very intrusive changes related to optimizer costs, they cannot be backported. But you can force "range" in 10.6 too by
For example
| |||||||||||||||
| Comment by Rex Johnston [ 2023-06-28 ] | |||||||||||||||
|
This issue does not affect version 11+, these versions choosing a range scan. A full table scan currently locks the whole table. A simple workaround is
in each of the worker threads. | |||||||||||||||
| Comment by Rex Johnston [ 2023-06-28 ] | |||||||||||||||
|
Attached, test-1.py is an altered version of the original, sidestepping the issue. |