[MDEV-29081] trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end() - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.8.3
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2, 10.11.0
Component/s: Data Manipulation - Update, Storage Engine - InnoDB
Labels:
- crash
Environment:
Ubuntu 20.04

Description

I used my fuzzing tool to test Mariadb , and found a transaction-related bug that can result in an abortion.

Mariadb installation：
1) cd mariadb-10.8.3
2) mkdir build; cd build
3) cmake .. -DCMAKE_BUILD_TYPE=Debug
4) make -j12 && sudo make install

Test driver compilation:
Note: "mariadb_reproduce" sets up several transactions and execute SQL statements according to /tmp/mysql_bk.sql, /tmp/stmts.sql and /tmp/tid.txt
1) g++ -I/usr/local/mysql/include/ mariadb_reproduce.cc -o mariadb_reproduce -lmysqlclient -g

Reproduce the bug:
1) cp mysql_bk.sql /tmp; cp stmts.sql /tmp; cp tid.txt /tmp
2) export ASAN_OPTIONS=detect_leaks=0
3) /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql &
4) bash reproduce.sh # "reproduce.sh" repeatedly executes "mariadb_reproduce" 100 times. "reproduce.sh" and "mariadb_reproduce" should be in the same directory

I have simplified the content of fuzz.sql, and I hope fuzz.sql can help you reproduce and fix the bug. In addition, I attached the failure report (which has its stack trace).

This bug seems related to concurrency and it cannot be reproduced stably. However, it did occur and crashed the server in my test. To stably trigger this bug, I use "reproduce.sh" to execute "mariadb_reproduce" 100 times. Hope you could reproduce the bug successfully

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

bug_report.txt
2022-07-11 15:06
6 kB
Zuming Jiang
mariadb_reproduce.cc
2022-07-11 15:04
14 kB
Zuming Jiang
MDEV-29081.test
2022-08-11 15:27
4 kB
Vladislav Lesin
mysql_bk.sql
2022-07-11 15:03
7 kB
Zuming Jiang
reproduce.sh
2022-07-11 15:04
0.1 kB
Zuming Jiang
stmts.sql
2022-07-11 15:03
18 kB
Zuming Jiang
tid.txt
2022-07-11 15:03
0.1 kB
Zuming Jiang

Issue Links

duplicates

MDEV-26809 Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed

Closed

relates to

MDEV-29323 Galera ha_abort_transaction is not honored if there are no InnoDB lock conflicts

Open

MDEV-29515 innodb.deadlock_victim_race is unstable

Closed

MDEV-29635 race on trx->lock.wait_lock in deadlock resolution

Closed

Activity

Ascending order - Click to sort in descending order

Zuming Jiang created issue - 2022-07-11 15:05

Zuming Jiang made changes - 2022-07-11 15:06

Field	Original Value	New Value
Attachment		bug_report.txt [ 64577 ]

Alice Sherepa added a comment - 2022-07-11 15:39

from the log (bug_report.txt):

mysqld: /home/mysql/mariadb/storage/innobase/lock/lock0lock.cc:5994: bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&,

page_id_t, ulint): Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed.

220711 14:08:02 [ERROR] mysqld got signal 6 ;

Server version: 10.8.3-MariaDB-debug

mysys/stacktrace.c:212(my_print_stacktrace)[0x558f7165af7e]

sql/signal_handler.cc:226(handle_fatal_signal)[0x558f70da8bd0]

??:0(__restore_rt)[0x7f91f7ba63c0]

??:0(gsignal)[0x7f91f768f03b]

??:0(abort)[0x7f91f766e859]

/lib/x86_64-linux-gnu/libc.so.6(+0x22729)[0x7f91f766e729]

??:0(__assert_fail)[0x7f91f7680006]

lock/lock0lock.cc:5993(lock_trx_has_expl_x_lock(trx_t const&, dict_table_t const&, page_id_t, unsigned long))[0x558f7126d6ad]

row/row0upd.cc:2661(row_upd_clust_step(upd_node_t*, que_thr_t*))[0x558f713c33d5]

row/row0upd.cc:2790(row_upd(upd_node_t*, que_thr_t*))[0x558f713c3b10]

row/row0upd.cc:2935(row_upd_step(que_thr_t*))[0x558f713c410a]

row/row0mysql.cc:1702(row_update_for_mysql(row_prebuilt_t*))[0x558f7137028a]

handler/ha_innodb.cc:8582(ha_innobase::update_row(unsigned char const*, unsigned char const*))[0x558f711aca56]

sql/handler.cc:7602(handler::ha_update_row(unsigned char const*, unsigned char const*))[0x558f70dc2dc5]

sql/sql_update.cc:1088(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*))[0x558f70b3ccb7]

sql/sql_parse.cc:4405(mysql_execute_command(THD*, bool))[0x558f709e954e]

sql/sql_parse.cc:8027(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x558f709f57c2]

sql/sql_parse.cc:1896(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x558f709e1cae]

sql/sql_parse.cc:1407(do_command(THD*, bool))[0x558f709e068f]

sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x558f70bb9df1]

sql/sql_connect.cc:1314(handle_one_connection)[0x558f70bb9a81]

perfschema/pfs.cc:2203(pfs_spawn_thread)[0x558f710ca123]

??:0(start_thread)[0x7f91f7b9a609]

??:0(clone)[0x7f91f776b163]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7f91ac0140d0): update t_khn17c set

  wkey = 158,

  c_ci0ked = PI(),

  c_qdo5gb = t_khn17c.pkey

where t_khn17c.pkey = t_khn17c.pkey

Connection ID (thread ID): 377

it seems to be the same bug as ~~MDEV-26809~~

Alice Sherepa added a comment - 2022-07-11 15:39 from the log (bug_report.txt): mysqld: /home/mysql/mariadb/storage/innobase/lock/lock0lock.cc:5994: bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&, page_id_t, ulint): Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed. 220711 14:08:02 [ERROR] mysqld got signal 6 ; Server version: 10.8.3-MariaDB-debug mysys/stacktrace.c:212(my_print_stacktrace)[0x558f7165af7e] sql/signal_handler.cc:226(handle_fatal_signal)[0x558f70da8bd0] ??:0(__restore_rt)[0x7f91f7ba63c0] ??:0(gsignal)[0x7f91f768f03b] ??:0(abort)[0x7f91f766e859] /lib/x86_64-linux-gnu/libc.so.6(+0x22729)[0x7f91f766e729] ??:0(__assert_fail)[0x7f91f7680006] lock/lock0lock.cc:5993(lock_trx_has_expl_x_lock(trx_t const&, dict_table_t const&, page_id_t, unsigned long))[0x558f7126d6ad] row/row0upd.cc:2661(row_upd_clust_step(upd_node_t*, que_thr_t*))[0x558f713c33d5] row/row0upd.cc:2790(row_upd(upd_node_t*, que_thr_t*))[0x558f713c3b10] row/row0upd.cc:2935(row_upd_step(que_thr_t*))[0x558f713c410a] row/row0mysql.cc:1702(row_update_for_mysql(row_prebuilt_t*))[0x558f7137028a] handler/ha_innodb.cc:8582(ha_innobase::update_row(unsigned char const*, unsigned char const*))[0x558f711aca56] sql/handler.cc:7602(handler::ha_update_row(unsigned char const*, unsigned char const*))[0x558f70dc2dc5] sql/sql_update.cc:1088(mysql_update(THD*, TABLE_LIST*, List<Item>&, List<Item>&, Item*, unsigned int, st_order*, unsigned long long, bool, unsigned long long*, unsigned long long*))[0x558f70b3ccb7] sql/sql_parse.cc:4405(mysql_execute_command(THD*, bool))[0x558f709e954e] sql/sql_parse.cc:8027(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x558f709f57c2] sql/sql_parse.cc:1896(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x558f709e1cae] sql/sql_parse.cc:1407(do_command(THD*, bool))[0x558f709e068f] sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x558f70bb9df1] sql/sql_connect.cc:1314(handle_one_connection)[0x558f70bb9a81] perfschema/pfs.cc:2203(pfs_spawn_thread)[0x558f710ca123] ??:0(start_thread)[0x7f91f7b9a609] ??:0(clone)[0x7f91f776b163] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f91ac0140d0): update t_khn17c set wkey = 158, c_ci0ked = PI(), c_qdo5gb = t_khn17c.pkey where t_khn17c.pkey = t_khn17c.pkey Connection ID (thread ID): 377 it seems to be the same bug as MDEV-26809

Alice Sherepa made changes - 2022-07-11 15:39

Link

This issue duplicates ~~MDEV-26809~~ [ ~~MDEV-26809~~ ]

Sergei Golubchik made changes - 2022-07-12 08:07

Fix Version/s		N/A [ 14700 ]
Resolution		Duplicate [ 3 ]
Status	Open [ 1 ]	Closed [ 6 ]

Marko Mäkelä added a comment - 2022-07-14 14:09

I think that this is best kept open until a fix has been developed and tested.

Marko Mäkelä added a comment - 2022-07-14 14:09 I think that this is best kept open until a fix has been developed and tested.

Marko Mäkelä made changes - 2022-07-14 14:09

Assignee		Marko Mäkelä [ marko ]
Resolution	Duplicate [ 3 ]
Status	Closed [ 6 ]	Stalled [ 10000 ]

Marko Mäkelä made changes - 2022-07-14 14:09

Fix Version/s		10.6 [ 24028 ]
Fix Version/s		10.7 [ 24805 ]
Fix Version/s		10.8 [ 26121 ]
Fix Version/s		10.9 [ 26905 ]
Fix Version/s		10.10 [ 27530 ]
Fix Version/s	N/A [ 14700 ]

Marko Mäkelä made changes - 2022-07-27 06:04

Status

Stalled [ 10000 ]

In Progress [ 3 ]

Marko Mäkelä added a comment - 2022-07-27 06:24

Because the trace in ~~MDEV-26809~~ seems to have been invalidated by changes to shared libraries, I am unable to debug it.

To be able to compile mariadb_reproduce.cc with the libmariadb-dev that is installed on my Debian system, I had to change the name of the first header file:

#include <mariadb/mysql.h>

After that, the executable can be built with GCC 12.1.0:

g++ mariadb_reproduce.cc -o mariadb_reproduce -lmariadb

I also had to replace the hard-coded path /usr/local/mysql/bin/mysql with the path of a mariadb client executable, and I had to create a symlink from /run/mysqld/mysqld.sock to the socket that my server is using.
With those changes, on my first try of

while ./mariadb_reproduce; do :; done

I was able to reproduce the crash in less than 100 attempts. The crash that I got was:

10.6 772e3f61eb2956143bbdb1d85490bd5aad73d43c

Version: '10.6.9-MariaDB-debug'  socket: '/dev/shm/mariadb.sock'  port: 3306  Source distribution

mariadbd: /mariadb/10.6/storage/innobase/lock/lock0lock.cc:6023: bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&, page_id_t, ulint): Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed.

My subsequent attempts to reproduce this crash failed. I will keep trying, also by running the server under https://rr-project.org so that I will get something debuggable.

Marko Mäkelä added a comment - 2022-07-27 06:24 Because the trace in MDEV-26809 seems to have been invalidated by changes to shared libraries, I am unable to debug it. To be able to compile mariadb_reproduce.cc with the libmariadb-dev that is installed on my Debian system, I had to change the name of the first header file: #include <mariadb/mysql.h> After that, the executable can be built with GCC 12.1.0: g++ mariadb_reproduce.cc -o mariadb_reproduce -lmariadb I also had to replace the hard-coded path /usr/local/mysql/bin/mysql with the path of a mariadb client executable, and I had to create a symlink from /run/mysqld/mysqld.sock to the socket that my server is using. With those changes, on my first try of while ./mariadb_reproduce; do :; done I was able to reproduce the crash in less than 100 attempts. The crash that I got was: 10.6 772e3f61eb2956143bbdb1d85490bd5aad73d43c Version: '10.6.9-MariaDB-debug' socket: '/dev/shm/mariadb.sock' port: 3306 Source distribution mariadbd: /mariadb/10.6/storage/innobase/lock/lock0lock.cc:6023: bool lock_trx_has_expl_x_lock(const trx_t&, const dict_table_t&, page_id_t, ulint): Assertion `lock_rec_has_expl(LOCK_X | 1024U, g.cell(), id, heap_no, &trx)' failed. My subsequent attempts to reproduce this crash failed. I will keep trying, also by running the server under https://rr-project.org so that I will get something debuggable.

Marko Mäkelä added a comment - 2022-07-27 06:41

After I rebuilt the executable using cmake -DPLUGIN_PERFSCHEMA=NO so that rr record would have to deal with fewer conditional branches, the test program seems to have become much slower, as if it were waiting for innodb_lock_wait_timeout very often. After I restarted the server with innodb_lock_wait_timeout=0, the test program got quick again, but failed to reproduce anything.

There does not seem to be much hope to get an rr replay trace from this test case. I will try to produce a core dump again and then try to figure it out from there.

Marko Mäkelä added a comment - 2022-07-27 06:41 After I rebuilt the executable using cmake -DPLUGIN_PERFSCHEMA=NO so that rr record would have to deal with fewer conditional branches, the test program seems to have become much slower, as if it were waiting for innodb_lock_wait_timeout very often. After I restarted the server with innodb_lock_wait_timeout=0 , the test program got quick again, but failed to reproduce anything. There does not seem to be much hope to get an rr replay trace from this test case. I will try to produce a core dump again and then try to figure it out from there.

Marko Mäkelä added a comment - 2022-07-27 09:00

I was able to reproduce this a few more times while running the server inside GDB. I had to recompile the server with GCC 11, because the type information emitted by GCC 12.1.0 would often cause GDB 12.1 to crash.

Often, depending on the server build options, the test program would end up frequently waiting for innodb_lock_wait_timeout. In such cases, I restarted the server from the scratch to try again. Altogether, I was only able to reproduce the failure a few times during several hours. For the last-but-one crash, the while loop around the script was running for about 7 minutes. The very first and the last crash that I checked were instantaneous.

I only debugged the last crash. The assertion in row_upd_clust_step() that invoked lock_trx_has_expl_x_lock() would pass heap_no=49 to that call, corresponding to the record pkey=153000 in the clustered index, which was a single page. The record is exclusively locked by another transaction, which at the time of the assertion failure is waiting in lock_wait() for a lock on heap_no=2 (pkey=28000), on which the crashing thread holds an exclusive lock.

I think that this must be reproduced under rr record so that we can determine what is actually going on. While I was typing this, ramesh wrote that he just succeeded at that. I will debug that trace shortly.

Marko Mäkelä added a comment - 2022-07-27 09:00 I was able to reproduce this a few more times while running the server inside GDB. I had to recompile the server with GCC 11, because the type information emitted by GCC 12.1.0 would often cause GDB 12.1 to crash. Often, depending on the server build options, the test program would end up frequently waiting for innodb_lock_wait_timeout . In such cases, I restarted the server from the scratch to try again. Altogether, I was only able to reproduce the failure a few times during several hours. For the last-but-one crash, the while loop around the script was running for about 7 minutes. The very first and the last crash that I checked were instantaneous. I only debugged the last crash. The assertion in row_upd_clust_step() that invoked lock_trx_has_expl_x_lock() would pass heap_no=49 to that call, corresponding to the record pkey=153000 in the clustered index, which was a single page. The record is exclusively locked by another transaction, which at the time of the assertion failure is waiting in lock_wait() for a lock on heap_no=2 ( pkey=28000 ), on which the crashing thread holds an exclusive lock. I think that this must be reproduced under rr record so that we can determine what is actually going on. While I was typing this, ramesh wrote that he just succeeded at that. I will debug that trace shortly.

Marko Mäkelä added a comment - 2022-07-27 12:47

This appears to be a race condition that involves 2 threads that are concurrently blocked in lock_wait() while a 3rd thread is creating an explicit record lock for a record that is implicitly locked by one of the lock_wait() threads. The rr replay trace that I got is very similar to the GDB session that I analyzed.

I debugged it this far:

ssh galapq

rr replay /test/mtest/10.10/mariadb-10.10.0-linux-x86_64/rr/latest-trace

continue

set $l=(ib_lock_t *) 0x1538dfb8be00

watch -l *(size_t*)&$l.hash.hash[1]

reverse-continue

…

The record in question is heap_no=50 (pkey=154000,DB_TRX_ID=0x188a). In my GDB session, it was heap_no=49 and pkey=153000. In the rr replay session, this transaction was associated with Thread 10, which is one of the threads that was blocked in lock_wait() when the implicit-to-explicit conversion occurred.

I think that we must check if those 2 lock_wait() are waiting for a lock on the same record, and what exactly happens to the record between that time and the crash.

Marko Mäkelä added a comment - 2022-07-27 12:47 This appears to be a race condition that involves 2 threads that are concurrently blocked in lock_wait() while a 3rd thread is creating an explicit record lock for a record that is implicitly locked by one of the lock_wait() threads. The rr replay trace that I got is very similar to the GDB session that I analyzed. I debugged it this far: ssh galapq rr replay /test/mtest/10.10/mariadb-10.10.0-linux-x86_64/rr/latest-trace continue set $l=(ib_lock_t *) 0x1538dfb8be00 watch -l *(size_t*)&$l.hash.hash[1] reverse-continue … The record in question is heap_no=50 ( pkey=154000 , DB_TRX_ID=0x188a ). In my GDB session, it was heap_no=49 and pkey=153000 . In the rr replay session, this transaction was associated with Thread 10, which is one of the threads that was blocked in lock_wait() when the implicit-to-explicit conversion occurred. I think that we must check if those 2 lock_wait() are waiting for a lock on the same record, and what exactly happens to the record between that time and the crash.

Marko Mäkelä made changes - 2022-07-27 12:47

Assignee

Marko Mäkelä [ marko ]

Vladislav Lesin [ vlad.lesin ]

Vladislav Lesin added a comment - 2022-07-29 13:03 - edited

The scenario is the following:

1. Trx 6287 does semi-consistent read for UPDATE. See sel_set_rec_lock() call from row_search_mvcc() just after no_gap_lock label. Then sel_set_rec_lock() returns DB_LOCK_WAIT, and row_search_mvcc() reads offsets for committed version of the record in row_sel_build_committed_vers_for_mysql() call. Then it checks, what is going on with the lock it set on the record in lock_trx_handle_wait() call, which, in turns, waits for lock_sys.wait_mutex.

2. Trx 6290 does read for INSERT, it was just woken up after waiting for some lock to be released in lock_wait() and holds lock_sys.wait_mutex, which trx 6287 is waiting for. It does deadlock checking and chooses trx 6287 as a victim, then cancels wait on the lock created on step 1:
2a. It sets victim->lock.was_chosen_as_deadlock_victim= true in Deadlock::report(),
2b. It dequeues the lock from page, resets wait state, transaction waiting lock, etc, and the invokes lock_wait_end(), see lock_cancel_waiting_and_release() for details,
2c. lock_wait_end() resets trx->lock.was_chosen_as_deadlock_victim, sets trx->error_state= DB_DEADLOCK;
2d lock_sys.wait_mutex is released

3. Trx 6287 acquires lock_sys.wait_mutex in lock_trx_handle_wait(), checks trx->lock.was_chosen_as_deadlock_victim, and, as it is no set, returns DB_SUCCESS to row_search_mvcc() instead of DB_DEADLOCK. row_search_mvcc() continues execution as if the the lock is granted.

4. Trx 6287 aborts execution in row_upd_clust_step() as X-lock on the record was not set in row_search_mvcc() on step 3.

Suggested fix:
If trx->lock.was_chosen_as_deadlock_victim is reset in lock_wait_end(), then, maybe, lock_trx_handle_wait() should check trx->error_state instead of trx->error_state to return DB_DEADLOCK?

Vladislav Lesin added a comment - 2022-07-29 13:03 - edited The scenario is the following: 1. Trx 6287 does semi-consistent read for UPDATE. See sel_set_rec_lock() call from row_search_mvcc() just after no_gap_lock label. Then sel_set_rec_lock() returns DB_LOCK_WAIT, and row_search_mvcc() reads offsets for committed version of the record in row_sel_build_committed_vers_for_mysql() call. Then it checks, what is going on with the lock it set on the record in lock_trx_handle_wait() call, which, in turns, waits for lock_sys.wait_mutex. 2. Trx 6290 does read for INSERT, it was just woken up after waiting for some lock to be released in lock_wait() and holds lock_sys.wait_mutex, which trx 6287 is waiting for. It does deadlock checking and chooses trx 6287 as a victim, then cancels wait on the lock created on step 1: 2a. It sets victim->lock.was_chosen_as_deadlock_victim= true in Deadlock::report(), 2b. It dequeues the lock from page, resets wait state, transaction waiting lock, etc, and the invokes lock_wait_end(), see lock_cancel_waiting_and_release() for details, 2c. lock_wait_end() resets trx->lock.was_chosen_as_deadlock_victim, sets trx->error_state= DB_DEADLOCK; 2d lock_sys.wait_mutex is released 3. Trx 6287 acquires lock_sys.wait_mutex in lock_trx_handle_wait(), checks trx->lock.was_chosen_as_deadlock_victim, and, as it is no set, returns DB_SUCCESS to row_search_mvcc() instead of DB_DEADLOCK. row_search_mvcc() continues execution as if the the lock is granted. 4. Trx 6287 aborts execution in row_upd_clust_step() as X-lock on the record was not set in row_search_mvcc() on step 3. Suggested fix: If trx->lock.was_chosen_as_deadlock_victim is reset in lock_wait_end(), then, maybe, lock_trx_handle_wait() should check trx->error_state instead of trx->error_state to return DB_DEADLOCK?

Vladislav Lesin made changes - 2022-08-11 13:11

Attachment

MDEV-29081.test [ 65015 ]

Vladislav Lesin added a comment - 2022-08-11 13:11 - edited

At last I developed test case for it: MDEV-29081.test.

The following two new syncpoints are necessary for the test:

--- a/storage/innobase/lock/lock0lock.cc

+++ b/storage/innobase/lock/lock0lock.cc

@@ -1849,6 +1849,7 @@ dberr_t lock_wait(que_thr_t *thr)

   thd_wait_end(trx->mysql_thd);

   trx->error_state= error_state;

+  DEBUG_SYNC_C("lock_wait_suspend_thread_exit");

   return error_state;

@@ -5815,6 +5816,7 @@ while holding a clustered index leaf page latch.

                      lock request was released */

 dberr_t lock_trx_handle_wait(trx_t *trx)

+  DEBUG_SYNC_C("lock_trx_handle_wait_enter");

   if (trx->lock.was_chosen_as_deadlock_victim)

     return DB_DEADLOCK;

   if (!trx->lock.wait_lock)

Vladislav Lesin added a comment - 2022-08-11 13:11 - edited At last I developed test case for it: MDEV-29081.test . The following two new syncpoints are necessary for the test: --- a/storage/innobase/lock/lock0lock.cc +++ b/storage/innobase/lock/lock0lock.cc @@ - 1849 , 6 + 1849 , 7 @@ dberr_t lock_wait(que_thr_t *thr) thd_wait_end(trx->mysql_thd); trx->error_state= error_state; + DEBUG_SYNC_C( "lock_wait_suspend_thread_exit" ); return error_state; } @@ - 5815 , 6 + 5816 , 7 @@ while holding a clustered index leaf page latch. lock request was released */ dberr_t lock_trx_handle_wait(trx_t *trx) { + DEBUG_SYNC_C( "lock_trx_handle_wait_enter" ); if (trx->lock.was_chosen_as_deadlock_victim) return DB_DEADLOCK; if (!trx->lock.wait_lock)

Vladislav Lesin made changes - 2022-08-11 13:26

Attachment

~~MDEV-29081~~.test [ 65015 ]

Vladislav Lesin made changes - 2022-08-11 13:27

Attachment

MDEV-29081.test [ 65016 ]

Vladislav Lesin made changes - 2022-08-11 15:27

Attachment

~~MDEV-29081~~.test [ 65016 ]

Vladislav Lesin made changes - 2022-08-11 15:27

Attachment

MDEV-29081.test [ 65026 ]

Vladislav Lesin added a comment - 2022-08-11 17:18

The bug affects 10.6+. I am checking 10.5, it can take some time as related code is different.

Vladislav Lesin added a comment - 2022-08-11 17:18 The bug affects 10.6+. I am checking 10.5, it can take some time as related code is different.

Vladislav Lesin added a comment - 2022-08-11 17:50 - edited

I can't backport the test on 10.5 because of two reasons:

1. trx->lock.was_chosen_as_deadlock_victim is not reset from the functions which cancel and release locks in 10.5, in 10.6 it is,

2. lock_trx_handle_wait() cancels waiting in 10.5, while in 10.6 it just checks lock state, and the thread will be suspended in row_search_mvcc() if the lock is still in "waiting" state.

I think 10.5 is not affected because of (1).

UPD: I was wrong about (1). 10.5 resets trx->lock.was_chosen_as_deadlock_victim in lock_wait_release_thread_if_suspended(). So currently I am not so sure about if 10.5 is affected or not. The current test will not work anyway due to (2), so 10.5 requires separate test case.

Vladislav Lesin added a comment - 2022-08-11 17:50 - edited I can't backport the test on 10.5 because of two reasons: 1. trx->lock.was_chosen_as_deadlock_victim is not reset from the functions which cancel and release locks in 10.5, in 10.6 it is, 2. lock_trx_handle_wait() cancels waiting in 10.5, while in 10.6 it just checks lock state, and the thread will be suspended in row_search_mvcc() if the lock is still in "waiting" state. I think 10.5 is not affected because of (1). UPD: I was wrong about (1). 10.5 resets trx->lock.was_chosen_as_deadlock_victim in lock_wait_release_thread_if_suspended(). So currently I am not so sure about if 10.5 is affected or not. The current test will not work anyway due to (2), so 10.5 requires separate test case.

Vladislav Lesin added a comment - 2022-08-12 16:07

There are two ways to fix it:

1. Reset trx->lock.was_chosen_as_deadlock_victim only on rollback(full to savepoint). Particularly, don't reset it in lock_wait_end(). It can be set during deadlock checking by another transaction, but if it is set, the transaction should have ability to check somehow that it was chosen as deadlock victim. We should be careful here as trx->lock.was_chosen_as_deadlock_victim is used not only to notify about deadlock, but also to notify transaction that it is a victim of Galera abort (see trx_lock_t::set_wsrep_victim() for details).

2. There is also trx->error_state which is set to DB_DEADLOCK. But it's not atomic and not protected. It's said in the corresponding comment that "ONLY the thread doing the transaction is allowed to set this field". In 10.5 trx->error_state was set to DB_DEADLOCK only before trx->lock.was_chosen_as_deadlock_victim is reset. In 10.6 the latter became atomic(commit 3e45f8e36a80), and the corresponding bit is reset before trx->error_state=DB_DEADLOCK in lock_wait_end(), which can be executed by other thread. So trx->error_state can be changed in the thread which executes some other transaction. And there is some root between trx->lock.was_chosen_as_deadlock_victim resetting and trx->error_state=DB_DEADLOCK setting. But, anyway, I think this will not cause race condition because trx->error_state is set under lock_sys.wait_mutex lock in lock_wait_end(), and we could check trx->error_state in lock_trx_handle_wait() under lock_sys.wait_mutex lock too.

We could also check both trx->lock.was_chosen_as_deadlock_victim and trx->error_state. Because Galera transaction abort sets the second bit of trx->lock.was_chosen_as_deadlock_victim and does set trx->error_state = DB_DEADLOCK.

But the question is still open to me. Why do we need to reset trx->lock.was_chosen_as_deadlock_victim elsewhere except rollback? marko, maybe you know?

Another concern is that currently we have two different ways to notify transaction about it was chosen as deadlock victim. And this ambiguousness is quite confusing, because it's not clear what to check in what cases.

Vladislav Lesin added a comment - 2022-08-12 16:07 There are two ways to fix it: 1. Reset trx->lock.was_chosen_as_deadlock_victim only on rollback(full to savepoint). Particularly, don't reset it in lock_wait_end(). It can be set during deadlock checking by another transaction, but if it is set, the transaction should have ability to check somehow that it was chosen as deadlock victim. We should be careful here as trx->lock.was_chosen_as_deadlock_victim is used not only to notify about deadlock, but also to notify transaction that it is a victim of Galera abort (see trx_lock_t::set_wsrep_victim() for details). 2. There is also trx->error_state which is set to DB_DEADLOCK. But it's not atomic and not protected. It's said in the corresponding comment that "ONLY the thread doing the transaction is allowed to set this field". In 10.5 trx->error_state was set to DB_DEADLOCK only before trx->lock.was_chosen_as_deadlock_victim is reset. In 10.6 the latter became atomic(commit 3e45f8e36a80), and the corresponding bit is reset before trx->error_state=DB_DEADLOCK in lock_wait_end(), which can be executed by other thread. So trx->error_state can be changed in the thread which executes some other transaction. And there is some root between trx->lock.was_chosen_as_deadlock_victim resetting and trx->error_state=DB_DEADLOCK setting. But, anyway, I think this will not cause race condition because trx->error_state is set under lock_sys.wait_mutex lock in lock_wait_end(), and we could check trx->error_state in lock_trx_handle_wait() under lock_sys.wait_mutex lock too. We could also check both trx->lock.was_chosen_as_deadlock_victim and trx->error_state. Because Galera transaction abort sets the second bit of trx->lock.was_chosen_as_deadlock_victim and does set trx->error_state = DB_DEADLOCK. But the question is still open to me. Why do we need to reset trx->lock.was_chosen_as_deadlock_victim elsewhere except rollback? marko , maybe you know? Another concern is that currently we have two different ways to notify transaction about it was chosen as deadlock victim. And this ambiguousness is quite confusing, because it's not clear what to check in what cases.

Marko Mäkelä added a comment - 2022-08-12 16:37

vlad.lesin, my intuition says that it should suffice to reset trx->lock.was_chosen_as_deadlock_victim at the end of trx_t::commit_in_memory(). A deadlock should be a hard error; the only way to "resume" the transaction is to re-execute the statements in a new transaction. An example of a soft error would be a lock wait timeout. There is an option to trigger a roll back on timeout, but the user could also keep executing further statements after one statement encountered a lock wait timeout and was rolled back.

In ~~MDEV-24671~~ and ~~MDEV-24738~~ I cleaned up the code quite a bit, but I can’t remember why I did not remove the trx_lock_t::clear_deadlock_victim(). Maybe I just wanted to limit the amount of changes to reduce the risk of breaking things.

It now seems to me that the separate flag for the Galera set_wsrep_victim() is unnecessary if we just remove the trx_lock_t::clear_deadlock_victim() and calls to it. Then the trx_lock_t::was_chosen_as_deadlock_victim could be a simple atomic Boolean flag.

Marko Mäkelä added a comment - 2022-08-12 16:37 vlad.lesin , my intuition says that it should suffice to reset trx->lock.was_chosen_as_deadlock_victim at the end of trx_t::commit_in_memory() . A deadlock should be a hard error; the only way to "resume" the transaction is to re-execute the statements in a new transaction. An example of a soft error would be a lock wait timeout. There is an option to trigger a roll back on timeout, but the user could also keep executing further statements after one statement encountered a lock wait timeout and was rolled back. In MDEV-24671 and MDEV-24738 I cleaned up the code quite a bit, but I can’t remember why I did not remove the trx_lock_t::clear_deadlock_victim() . Maybe I just wanted to limit the amount of changes to reduce the risk of breaking things. It now seems to me that the separate flag for the Galera set_wsrep_victim() is unnecessary if we just remove the trx_lock_t::clear_deadlock_victim() and calls to it. Then the trx_lock_t::was_chosen_as_deadlock_victim could be a simple atomic Boolean flag.

Vladislav Lesin made changes - 2022-08-17 14:21

Summary

Transaction-related assertions in lock0lock.cc:5994

trx_t::lock.was_chosen_as_deadlock_victim race

Vladislav Lesin made changes - 2022-08-17 14:22

Summary

trx_t::lock.was_chosen_as_deadlock_victim race

trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end()

Marko Mäkelä made changes - 2022-08-18 11:09

Link

This issue relates to MDEV-29323 [ MDEV-29323 ]

Matthias Leich added a comment - 2022-08-19 11:55 - edited

Intermediate result of testing:

origin/bb-10.6-MDEV-29081-deadlock-victim fd75a1b755d9216d731d524d28da072b8b27ebde 2022-08-18T16:11:03+03:00

performed well in the RQGstandard test battery. No new bad effects.

But some slightly modified test battery seems to find new crashes.

mysqld: storage/innobase/lock/lock0lock.cc:1493: dberr_t lock_rec_lock(bool, unsigned int, const buf_block_t*, ulint, dict_index_t*, que_thr_t*): Assertion `trx->is_wsrep() || !trx->lock.was_chosen_as_deadlock_victim' failed.

All tests hitting that assert have a server start with

transaction-isolation=READ-COMMITTED or READ-UNCOMMITTED

pluto:/data/results/1660910976/TBR-1585$ _RR_TRACE_DIR=./1/rr/ rr replay

Matthias Leich added a comment - 2022-08-19 11:55 - edited Intermediate result of testing: origin/bb-10.6-MDEV-29081-deadlock-victim fd75a1b755d9216d731d524d28da072b8b27ebde 2022-08-18T16:11:03+03:00 performed well in the RQGstandard test battery. No new bad effects. But some slightly modified test battery seems to find new crashes. mysqld: storage/innobase/lock/lock0lock.cc:1493: dberr_t lock_rec_lock(bool, unsigned int, const buf_block_t*, ulint, dict_index_t*, que_thr_t*): Assertion `trx->is_wsrep() || !trx->lock.was_chosen_as_deadlock_victim' failed. All tests hitting that assert have a server start with transaction-isolation=READ-COMMITTED or READ-UNCOMMITTED pluto:/data/results/1660910976/TBR-1585$ _RR_TRACE_DIR=./1/rr/ rr replay

Matthias Leich added a comment - 2022-08-23 13:18

origin/bb-10.6-~~MDEV-29081~~-deadlock-victim f473354c5ca617bfde7306d571052ac4aa9c5a86 2022-08-22T15:13:26+03:00
performed well in RQG testing. The bug mentioned in the previous comment is fixed.
Please be aware that the capabilities to check the correctness of result sets or error messages are quite limited when using RQG.
The new experimental RQG validator "DmlStability" (TODO-3508) did not found problems.

Matthias Leich added a comment - 2022-08-23 13:18 origin/bb-10.6- MDEV-29081 -deadlock-victim f473354c5ca617bfde7306d571052ac4aa9c5a86 2022-08-22T15:13:26+03:00 performed well in RQG testing. The bug mentioned in the previous comment is fixed. Please be aware that the capabilities to check the correctness of result sets or error messages are quite limited when using RQG. The new experimental RQG validator "DmlStability" (TODO-3508) did not found problems.

Vladislav Lesin made changes - 2022-08-25 08:58

Component/s		Storage Engine - InnoDB [ 10129 ]
Fix Version/s		10.6.10 [ 28407 ]
Fix Version/s		10.7.6 [ 28408 ]
Fix Version/s		10.8.5 [ 28308 ]
Fix Version/s		10.9.3 [ 28409 ]
Fix Version/s		10.10.2 [ 28410 ]
Fix Version/s		10.11.0 [ 28411 ]
Fix Version/s	10.6 [ 24028 ]
Fix Version/s	10.7 [ 24805 ]
Fix Version/s	10.8 [ 26121 ]
Fix Version/s	10.9 [ 26905 ]
Fix Version/s	10.10 [ 27530 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Vladislav Lesin made changes - 2022-09-12 11:40

Link

This issue relates to ~~MDEV-29515~~ [ ~~MDEV-29515~~ ]

Vladislav Lesin made changes - 2022-09-26 12:00

Link

This issue relates to ~~MDEV-29635~~ [ ~~MDEV-29635~~ ]

People

Assignee:: Vladislav Lesin

Reporter:: Zuming Jiang

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2022-07-11 15:05

Updated:: 2022-09-26 12:00

Resolved:: 2022-08-25 08:58

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration