[MDEV-29081] trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.8.3
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2, 10.11.0
Component/s: Data Manipulation - Update, Storage Engine - InnoDB
Labels:
- crash
Environment:
Ubuntu 20.04

Description

I used my fuzzing tool to test Mariadb , and found a transaction-related bug that can result in an abortion.

Mariadb installation：
1) cd mariadb-10.8.3
2) mkdir build; cd build
3) cmake .. -DCMAKE_BUILD_TYPE=Debug
4) make -j12 && sudo make install

Test driver compilation:
Note: "mariadb_reproduce" sets up several transactions and execute SQL statements according to /tmp/mysql_bk.sql, /tmp/stmts.sql and /tmp/tid.txt
1) g++ -I/usr/local/mysql/include/ mariadb_reproduce.cc -o mariadb_reproduce -lmysqlclient -g

Reproduce the bug:
1) cp mysql_bk.sql /tmp; cp stmts.sql /tmp; cp tid.txt /tmp
2) export ASAN_OPTIONS=detect_leaks=0
3) /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql &
4) bash reproduce.sh # "reproduce.sh" repeatedly executes "mariadb_reproduce" 100 times. "reproduce.sh" and "mariadb_reproduce" should be in the same directory

I have simplified the content of fuzz.sql, and I hope fuzz.sql can help you reproduce and fix the bug. In addition, I attached the failure report (which has its stack trace).

This bug seems related to concurrency and it cannot be reproduced stably. However, it did occur and crashed the server in my test. To stably trigger this bug, I use "reproduce.sh" to execute "mariadb_reproduce" 100 times. Hope you could reproduce the bug successfully

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

bug_report.txt
6 kB
2022-07-11 15:06
mariadb_reproduce.cc
14 kB
2022-07-11 15:04
MDEV-29081.test
4 kB
2022-08-11 15:27
mysql_bk.sql
7 kB
2022-07-11 15:03
reproduce.sh
0.1 kB
2022-07-11 15:04
stmts.sql
18 kB
2022-07-11 15:03
tid.txt
0.1 kB
2022-07-11 15:03

Issue Links

duplicates

MDEV-26809 Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed

Closed

relates to

MDEV-29323 Galera ha_abort_transaction is not honored if there are no InnoDB lock conflicts

Open

MDEV-29515 innodb.deadlock_victim_race is unstable

Closed

MDEV-29635 race on trx->lock.wait_lock in deadlock resolution

Closed

Activity

Ascending order - Click to sort in descending order

View 9 older comments

Vladislav Lesin added a comment - 2022-08-11 17:50 - edited

I can't backport the test on 10.5 because of two reasons:

1. trx->lock.was_chosen_as_deadlock_victim is not reset from the functions which cancel and release locks in 10.5, in 10.6 it is,

2. lock_trx_handle_wait() cancels waiting in 10.5, while in 10.6 it just checks lock state, and the thread will be suspended in row_search_mvcc() if the lock is still in "waiting" state.

I think 10.5 is not affected because of (1).

UPD: I was wrong about (1). 10.5 resets trx->lock.was_chosen_as_deadlock_victim in lock_wait_release_thread_if_suspended(). So currently I am not so sure about if 10.5 is affected or not. The current test will not work anyway due to (2), so 10.5 requires separate test case.

Vladislav Lesin added a comment - 2022-08-11 17:50 - edited I can't backport the test on 10.5 because of two reasons: 1. trx->lock.was_chosen_as_deadlock_victim is not reset from the functions which cancel and release locks in 10.5, in 10.6 it is, 2. lock_trx_handle_wait() cancels waiting in 10.5, while in 10.6 it just checks lock state, and the thread will be suspended in row_search_mvcc() if the lock is still in "waiting" state. I think 10.5 is not affected because of (1). UPD: I was wrong about (1). 10.5 resets trx->lock.was_chosen_as_deadlock_victim in lock_wait_release_thread_if_suspended(). So currently I am not so sure about if 10.5 is affected or not. The current test will not work anyway due to (2), so 10.5 requires separate test case.

Vladislav Lesin added a comment - 2022-08-12 16:07

There are two ways to fix it:

1. Reset trx->lock.was_chosen_as_deadlock_victim only on rollback(full to savepoint). Particularly, don't reset it in lock_wait_end(). It can be set during deadlock checking by another transaction, but if it is set, the transaction should have ability to check somehow that it was chosen as deadlock victim. We should be careful here as trx->lock.was_chosen_as_deadlock_victim is used not only to notify about deadlock, but also to notify transaction that it is a victim of Galera abort (see trx_lock_t::set_wsrep_victim() for details).

2. There is also trx->error_state which is set to DB_DEADLOCK. But it's not atomic and not protected. It's said in the corresponding comment that "ONLY the thread doing the transaction is allowed to set this field". In 10.5 trx->error_state was set to DB_DEADLOCK only before trx->lock.was_chosen_as_deadlock_victim is reset. In 10.6 the latter became atomic(commit 3e45f8e36a80), and the corresponding bit is reset before trx->error_state=DB_DEADLOCK in lock_wait_end(), which can be executed by other thread. So trx->error_state can be changed in the thread which executes some other transaction. And there is some root between trx->lock.was_chosen_as_deadlock_victim resetting and trx->error_state=DB_DEADLOCK setting. But, anyway, I think this will not cause race condition because trx->error_state is set under lock_sys.wait_mutex lock in lock_wait_end(), and we could check trx->error_state in lock_trx_handle_wait() under lock_sys.wait_mutex lock too.

We could also check both trx->lock.was_chosen_as_deadlock_victim and trx->error_state. Because Galera transaction abort sets the second bit of trx->lock.was_chosen_as_deadlock_victim and does set trx->error_state = DB_DEADLOCK.

But the question is still open to me. Why do we need to reset trx->lock.was_chosen_as_deadlock_victim elsewhere except rollback? marko, maybe you know?

Another concern is that currently we have two different ways to notify transaction about it was chosen as deadlock victim. And this ambiguousness is quite confusing, because it's not clear what to check in what cases.

Vladislav Lesin added a comment - 2022-08-12 16:07 There are two ways to fix it: 1. Reset trx->lock.was_chosen_as_deadlock_victim only on rollback(full to savepoint). Particularly, don't reset it in lock_wait_end(). It can be set during deadlock checking by another transaction, but if it is set, the transaction should have ability to check somehow that it was chosen as deadlock victim. We should be careful here as trx->lock.was_chosen_as_deadlock_victim is used not only to notify about deadlock, but also to notify transaction that it is a victim of Galera abort (see trx_lock_t::set_wsrep_victim() for details). 2. There is also trx->error_state which is set to DB_DEADLOCK. But it's not atomic and not protected. It's said in the corresponding comment that "ONLY the thread doing the transaction is allowed to set this field". In 10.5 trx->error_state was set to DB_DEADLOCK only before trx->lock.was_chosen_as_deadlock_victim is reset. In 10.6 the latter became atomic(commit 3e45f8e36a80), and the corresponding bit is reset before trx->error_state=DB_DEADLOCK in lock_wait_end(), which can be executed by other thread. So trx->error_state can be changed in the thread which executes some other transaction. And there is some root between trx->lock.was_chosen_as_deadlock_victim resetting and trx->error_state=DB_DEADLOCK setting. But, anyway, I think this will not cause race condition because trx->error_state is set under lock_sys.wait_mutex lock in lock_wait_end(), and we could check trx->error_state in lock_trx_handle_wait() under lock_sys.wait_mutex lock too. We could also check both trx->lock.was_chosen_as_deadlock_victim and trx->error_state. Because Galera transaction abort sets the second bit of trx->lock.was_chosen_as_deadlock_victim and does set trx->error_state = DB_DEADLOCK. But the question is still open to me. Why do we need to reset trx->lock.was_chosen_as_deadlock_victim elsewhere except rollback? marko , maybe you know? Another concern is that currently we have two different ways to notify transaction about it was chosen as deadlock victim. And this ambiguousness is quite confusing, because it's not clear what to check in what cases.

Marko Mäkelä added a comment - 2022-08-12 16:37

vlad.lesin, my intuition says that it should suffice to reset trx->lock.was_chosen_as_deadlock_victim at the end of trx_t::commit_in_memory(). A deadlock should be a hard error; the only way to "resume" the transaction is to re-execute the statements in a new transaction. An example of a soft error would be a lock wait timeout. There is an option to trigger a roll back on timeout, but the user could also keep executing further statements after one statement encountered a lock wait timeout and was rolled back.

In ~~MDEV-24671~~ and ~~MDEV-24738~~ I cleaned up the code quite a bit, but I can’t remember why I did not remove the trx_lock_t::clear_deadlock_victim(). Maybe I just wanted to limit the amount of changes to reduce the risk of breaking things.

It now seems to me that the separate flag for the Galera set_wsrep_victim() is unnecessary if we just remove the trx_lock_t::clear_deadlock_victim() and calls to it. Then the trx_lock_t::was_chosen_as_deadlock_victim could be a simple atomic Boolean flag.

Marko Mäkelä added a comment - 2022-08-12 16:37 vlad.lesin , my intuition says that it should suffice to reset trx->lock.was_chosen_as_deadlock_victim at the end of trx_t::commit_in_memory() . A deadlock should be a hard error; the only way to "resume" the transaction is to re-execute the statements in a new transaction. An example of a soft error would be a lock wait timeout. There is an option to trigger a roll back on timeout, but the user could also keep executing further statements after one statement encountered a lock wait timeout and was rolled back. In MDEV-24671 and MDEV-24738 I cleaned up the code quite a bit, but I can’t remember why I did not remove the trx_lock_t::clear_deadlock_victim() . Maybe I just wanted to limit the amount of changes to reduce the risk of breaking things. It now seems to me that the separate flag for the Galera set_wsrep_victim() is unnecessary if we just remove the trx_lock_t::clear_deadlock_victim() and calls to it. Then the trx_lock_t::was_chosen_as_deadlock_victim could be a simple atomic Boolean flag.

Matthias Leich added a comment - 2022-08-19 11:55 - edited

Intermediate result of testing:

origin/bb-10.6-MDEV-29081-deadlock-victim fd75a1b755d9216d731d524d28da072b8b27ebde 2022-08-18T16:11:03+03:00

performed well in the RQGstandard test battery. No new bad effects.

But some slightly modified test battery seems to find new crashes.

mysqld: storage/innobase/lock/lock0lock.cc:1493: dberr_t lock_rec_lock(bool, unsigned int, const buf_block_t*, ulint, dict_index_t*, que_thr_t*): Assertion `trx->is_wsrep() || !trx->lock.was_chosen_as_deadlock_victim' failed.

All tests hitting that assert have a server start with

transaction-isolation=READ-COMMITTED or READ-UNCOMMITTED

pluto:/data/results/1660910976/TBR-1585$ _RR_TRACE_DIR=./1/rr/ rr replay

Matthias Leich added a comment - 2022-08-19 11:55 - edited Intermediate result of testing: origin/bb-10.6-MDEV-29081-deadlock-victim fd75a1b755d9216d731d524d28da072b8b27ebde 2022-08-18T16:11:03+03:00 performed well in the RQGstandard test battery. No new bad effects. But some slightly modified test battery seems to find new crashes. mysqld: storage/innobase/lock/lock0lock.cc:1493: dberr_t lock_rec_lock(bool, unsigned int, const buf_block_t*, ulint, dict_index_t*, que_thr_t*): Assertion `trx->is_wsrep() || !trx->lock.was_chosen_as_deadlock_victim' failed. All tests hitting that assert have a server start with transaction-isolation=READ-COMMITTED or READ-UNCOMMITTED pluto:/data/results/1660910976/TBR-1585$ _RR_TRACE_DIR=./1/rr/ rr replay

Matthias Leich added a comment - 2022-08-23 13:18

origin/bb-10.6-~~MDEV-29081~~-deadlock-victim f473354c5ca617bfde7306d571052ac4aa9c5a86 2022-08-22T15:13:26+03:00
performed well in RQG testing. The bug mentioned in the previous comment is fixed.
Please be aware that the capabilities to check the correctness of result sets or error messages are quite limited when using RQG.
The new experimental RQG validator "DmlStability" (TODO-3508) did not found problems.

Matthias Leich added a comment - 2022-08-23 13:18 origin/bb-10.6- MDEV-29081 -deadlock-victim f473354c5ca617bfde7306d571052ac4aa9c5a86 2022-08-22T15:13:26+03:00 performed well in RQG testing. The bug mentioned in the previous comment is fixed. Please be aware that the capabilities to check the correctness of result sets or error messages are quite limited when using RQG. The new experimental RQG validator "DmlStability" (TODO-3508) did not found problems.

MariaDB Server

trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end()

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration