Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29081

trx_t::lock.was_chosen_as_deadlock_victim race in lock_wait_end()

Details

    Description

      I used my fuzzing tool to test Mariadb , and found a transaction-related bug that can result in an abortion.

      Mariadb installation:
      1) cd mariadb-10.8.3
      2) mkdir build; cd build
      3) cmake .. -DCMAKE_BUILD_TYPE=Debug
      4) make -j12 && sudo make install

      Test driver compilation:
      Note: "mariadb_reproduce" sets up several transactions and execute SQL statements according to /tmp/mysql_bk.sql, /tmp/stmts.sql and /tmp/tid.txt
      1) g++ -I/usr/local/mysql/include/ mariadb_reproduce.cc -o mariadb_reproduce -lmysqlclient -g

      Reproduce the bug:
      1) cp mysql_bk.sql /tmp; cp stmts.sql /tmp; cp tid.txt /tmp
      2) export ASAN_OPTIONS=detect_leaks=0
      3) /usr/local/mysql/bin/mysqld --basedir=/usr/local/mysql --datadir=/usr/local/mysql/data --plugin-dir=/usr/local/mysql/lib/plugin --user=mysql &
      4) bash reproduce.sh # "reproduce.sh" repeatedly executes "mariadb_reproduce" 100 times. "reproduce.sh" and "mariadb_reproduce" should be in the same directory

      I have simplified the content of fuzz.sql, and I hope fuzz.sql can help you reproduce and fix the bug. In addition, I attached the failure report (which has its stack trace).

      This bug seems related to concurrency and it cannot be reproduced stably. However, it did occur and crashed the server in my test. To stably trigger this bug, I use "reproduce.sh" to execute "mariadb_reproduce" 100 times. Hope you could reproduce the bug successfully

      Attachments

        1. bug_report.txt
          6 kB
        2. mariadb_reproduce.cc
          14 kB
        3. MDEV-29081.test
          4 kB
        4. mysql_bk.sql
          7 kB
        5. reproduce.sh
          0.1 kB
        6. stmts.sql
          18 kB
        7. tid.txt
          0.1 kB

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin added a comment - - edited

            I can't backport the test on 10.5 because of two reasons:

            1. trx->lock.was_chosen_as_deadlock_victim is not reset from the functions which cancel and release locks in 10.5, in 10.6 it is,

            2. lock_trx_handle_wait() cancels waiting in 10.5, while in 10.6 it just checks lock state, and the thread will be suspended in row_search_mvcc() if the lock is still in "waiting" state.

            I think 10.5 is not affected because of (1).

            UPD: I was wrong about (1). 10.5 resets trx->lock.was_chosen_as_deadlock_victim in lock_wait_release_thread_if_suspended(). So currently I am not so sure about if 10.5 is affected or not. The current test will not work anyway due to (2), so 10.5 requires separate test case.

            vlad.lesin Vladislav Lesin added a comment - - edited I can't backport the test on 10.5 because of two reasons: 1. trx->lock.was_chosen_as_deadlock_victim is not reset from the functions which cancel and release locks in 10.5, in 10.6 it is, 2. lock_trx_handle_wait() cancels waiting in 10.5, while in 10.6 it just checks lock state, and the thread will be suspended in row_search_mvcc() if the lock is still in "waiting" state. I think 10.5 is not affected because of (1). UPD: I was wrong about (1). 10.5 resets trx->lock.was_chosen_as_deadlock_victim in lock_wait_release_thread_if_suspended(). So currently I am not so sure about if 10.5 is affected or not. The current test will not work anyway due to (2), so 10.5 requires separate test case.

            There are two ways to fix it:

            1. Reset trx->lock.was_chosen_as_deadlock_victim only on rollback(full to savepoint). Particularly, don't reset it in lock_wait_end(). It can be set during deadlock checking by another transaction, but if it is set, the transaction should have ability to check somehow that it was chosen as deadlock victim. We should be careful here as trx->lock.was_chosen_as_deadlock_victim is used not only to notify about deadlock, but also to notify transaction that it is a victim of Galera abort (see trx_lock_t::set_wsrep_victim() for details).

            2. There is also trx->error_state which is set to DB_DEADLOCK. But it's not atomic and not protected. It's said in the corresponding comment that "ONLY the thread doing the transaction is allowed to set this field". In 10.5 trx->error_state was set to DB_DEADLOCK only before trx->lock.was_chosen_as_deadlock_victim is reset. In 10.6 the latter became atomic(commit 3e45f8e36a80), and the corresponding bit is reset before trx->error_state=DB_DEADLOCK in lock_wait_end(), which can be executed by other thread. So trx->error_state can be changed in the thread which executes some other transaction. And there is some root between trx->lock.was_chosen_as_deadlock_victim resetting and trx->error_state=DB_DEADLOCK setting. But, anyway, I think this will not cause race condition because trx->error_state is set under lock_sys.wait_mutex lock in lock_wait_end(), and we could check trx->error_state in lock_trx_handle_wait() under lock_sys.wait_mutex lock too.

            We could also check both trx->lock.was_chosen_as_deadlock_victim and trx->error_state. Because Galera transaction abort sets the second bit of trx->lock.was_chosen_as_deadlock_victim and does set trx->error_state = DB_DEADLOCK.

            But the question is still open to me. Why do we need to reset trx->lock.was_chosen_as_deadlock_victim elsewhere except rollback? marko, maybe you know?

            Another concern is that currently we have two different ways to notify transaction about it was chosen as deadlock victim. And this ambiguousness is quite confusing, because it's not clear what to check in what cases.

            vlad.lesin Vladislav Lesin added a comment - There are two ways to fix it: 1. Reset trx->lock.was_chosen_as_deadlock_victim only on rollback(full to savepoint). Particularly, don't reset it in lock_wait_end(). It can be set during deadlock checking by another transaction, but if it is set, the transaction should have ability to check somehow that it was chosen as deadlock victim. We should be careful here as trx->lock.was_chosen_as_deadlock_victim is used not only to notify about deadlock, but also to notify transaction that it is a victim of Galera abort (see trx_lock_t::set_wsrep_victim() for details). 2. There is also trx->error_state which is set to DB_DEADLOCK. But it's not atomic and not protected. It's said in the corresponding comment that "ONLY the thread doing the transaction is allowed to set this field". In 10.5 trx->error_state was set to DB_DEADLOCK only before trx->lock.was_chosen_as_deadlock_victim is reset. In 10.6 the latter became atomic(commit 3e45f8e36a80), and the corresponding bit is reset before trx->error_state=DB_DEADLOCK in lock_wait_end(), which can be executed by other thread. So trx->error_state can be changed in the thread which executes some other transaction. And there is some root between trx->lock.was_chosen_as_deadlock_victim resetting and trx->error_state=DB_DEADLOCK setting. But, anyway, I think this will not cause race condition because trx->error_state is set under lock_sys.wait_mutex lock in lock_wait_end(), and we could check trx->error_state in lock_trx_handle_wait() under lock_sys.wait_mutex lock too. We could also check both trx->lock.was_chosen_as_deadlock_victim and trx->error_state. Because Galera transaction abort sets the second bit of trx->lock.was_chosen_as_deadlock_victim and does set trx->error_state = DB_DEADLOCK. But the question is still open to me. Why do we need to reset trx->lock.was_chosen_as_deadlock_victim elsewhere except rollback? marko , maybe you know? Another concern is that currently we have two different ways to notify transaction about it was chosen as deadlock victim. And this ambiguousness is quite confusing, because it's not clear what to check in what cases.

            vlad.lesin, my intuition says that it should suffice to reset trx->lock.was_chosen_as_deadlock_victim at the end of trx_t::commit_in_memory(). A deadlock should be a hard error; the only way to "resume" the transaction is to re-execute the statements in a new transaction. An example of a soft error would be a lock wait timeout. There is an option to trigger a roll back on timeout, but the user could also keep executing further statements after one statement encountered a lock wait timeout and was rolled back.

            In MDEV-24671 and MDEV-24738 I cleaned up the code quite a bit, but I can’t remember why I did not remove the trx_lock_t::clear_deadlock_victim(). Maybe I just wanted to limit the amount of changes to reduce the risk of breaking things.

            It now seems to me that the separate flag for the Galera set_wsrep_victim() is unnecessary if we just remove the trx_lock_t::clear_deadlock_victim() and calls to it. Then the trx_lock_t::was_chosen_as_deadlock_victim could be a simple atomic Boolean flag.

            marko Marko Mäkelä added a comment - vlad.lesin , my intuition says that it should suffice to reset trx->lock.was_chosen_as_deadlock_victim at the end of trx_t::commit_in_memory() . A deadlock should be a hard error; the only way to "resume" the transaction is to re-execute the statements in a new transaction. An example of a soft error would be a lock wait timeout. There is an option to trigger a roll back on timeout, but the user could also keep executing further statements after one statement encountered a lock wait timeout and was rolled back. In MDEV-24671 and MDEV-24738 I cleaned up the code quite a bit, but I can’t remember why I did not remove the trx_lock_t::clear_deadlock_victim() . Maybe I just wanted to limit the amount of changes to reduce the risk of breaking things. It now seems to me that the separate flag for the Galera set_wsrep_victim() is unnecessary if we just remove the trx_lock_t::clear_deadlock_victim() and calls to it. Then the trx_lock_t::was_chosen_as_deadlock_victim could be a simple atomic Boolean flag.
            mleich Matthias Leich added a comment - - edited

            Intermediate result of testing:
            origin/bb-10.6-MDEV-29081-deadlock-victim fd75a1b755d9216d731d524d28da072b8b27ebde 2022-08-18T16:11:03+03:00
            performed well in the RQGstandard test battery. No new bad effects.
            But some slightly modified test battery seems to find new crashes.
            mysqld: storage/innobase/lock/lock0lock.cc:1493: dberr_t lock_rec_lock(bool, unsigned int, const buf_block_t*, ulint, dict_index_t*, que_thr_t*): Assertion `trx->is_wsrep() || !trx->lock.was_chosen_as_deadlock_victim' failed.
            All tests hitting that assert have a server start with 
            transaction-isolation=READ-COMMITTED or READ-UNCOMMITTED
             
            pluto:/data/results/1660910976/TBR-1585$ _RR_TRACE_DIR=./1/rr/ rr replay
            

            mleich Matthias Leich added a comment - - edited Intermediate result of testing: origin/bb-10.6-MDEV-29081-deadlock-victim fd75a1b755d9216d731d524d28da072b8b27ebde 2022-08-18T16:11:03+03:00 performed well in the RQGstandard test battery. No new bad effects. But some slightly modified test battery seems to find new crashes. mysqld: storage/innobase/lock/lock0lock.cc:1493: dberr_t lock_rec_lock(bool, unsigned int, const buf_block_t*, ulint, dict_index_t*, que_thr_t*): Assertion `trx->is_wsrep() || !trx->lock.was_chosen_as_deadlock_victim' failed. All tests hitting that assert have a server start with transaction-isolation=READ-COMMITTED or READ-UNCOMMITTED   pluto:/data/results/1660910976/TBR-1585$ _RR_TRACE_DIR=./1/rr/ rr replay

            origin/bb-10.6-MDEV-29081-deadlock-victim f473354c5ca617bfde7306d571052ac4aa9c5a86 2022-08-22T15:13:26+03:00
            performed well in RQG testing. The bug mentioned in the previous comment is fixed.
            Please be aware that the capabilities to check the correctness of result sets or error messages are quite limited when using RQG.
            The new experimental RQG validator "DmlStability" (TODO-3508) did not found problems.

            mleich Matthias Leich added a comment - origin/bb-10.6- MDEV-29081 -deadlock-victim f473354c5ca617bfde7306d571052ac4aa9c5a86 2022-08-22T15:13:26+03:00 performed well in RQG testing. The bug mentioned in the previous comment is fixed. Please be aware that the capabilities to check the correctness of result sets or error messages are quite limited when using RQG. The new experimental RQG validator "DmlStability" (TODO-3508) did not found problems.

            People

              vlad.lesin Vladislav Lesin
              Zuming Jiang Zuming Jiang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.