Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29622

Wrong assertions in lock_cancel_waiting_and_release() for deadlock resolving caller

Details

    Description

      The scenario is the following:

      1. trx 1 does semi-consistent read for UPDATE, creates waiting lock.

      2. trx 2 executes UPDATE, does deadlock check in lock_wait(), sets trx->lock.was_chosen_as_deadlock_victim for trx 1 in Deadlock::report() just before lock_cancel_waiting_and_release() call.

      3. trx 1 checks trx->lock.was_chosen_as_deadlock_victim in lock_trx_handle_wait(), and as it's set, does rollback from row_mysql_handle_errors(). trx_commit_or_rollback_prepare() zeroes out trx->lock.wait_thr.

      4. trx 2 executes lock_cancel_waiting_and_release(). lock_wait_end() aborts on checking trx->lock.wait_thr.

      The comment of trx->lock.wait_thr says:

              que_thr_t*      wait_thr;       /*!< query thread belonging to this      
                                              trx that is in waiting                  
                                              state. For threads suspended in a       
                                              lock wait, this is protected by         
                                              lock_sys.latch. Otherwise, this may     
                                              only be modified by the thread that is  
                                              serving the running transaction. */
      

      And lock_wait() acquires lock_sys.wait_mutex before deadlock check:

        mysql_mutex_lock(&lock_sys.wait_mutex);                                       
        if (trx->lock.wait_lock)                                                      
        {                                                                             
          if (Deadlock::check_and_resolve(trx))                                       
          {                                                                           
            ut_ad(!trx->lock.wait_lock);                                              
            trx->error_state= DB_DEADLOCK;                                            
            goto end_wait;                                                            
          }                                                                           
        } 
      

      But lock_sys.wait_mutex is not acquired in the following call stack:

      0x00005562572076ca in trx_commit_or_rollback_prepare (trx=0x4a6d5ba30040)       
          at ./storage/innobase/trx/trx0trx.cc:1507                                   
      1507                    trx->lock.wait_thr = NULL;                              
      (rr) bt                                                                         
      #0  0x00005562572076ca in trx_commit_or_rollback_prepare (trx=0x4a6d5ba30040)   
          at ./storage/innobase/trx/trx0trx.cc:1507                                   
      #1  0x00005562571e7c38 in trx_rollback_step (thr=0x6160085ce218) at ./storage/innobase/trx/trx0roll.cc:915
      #2  0x0000556256ffe0d9 in que_thr_step (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:659
      #3  0x0000556256ffe430 in que_run_threads_low (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:709
      #4  0x0000556256ffe5d2 in que_run_threads (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:729
      #5  0x00005562571e9102 in trx_t::rollback_low (this=0x4a6d5ba30040, savept=0x0) 
          at ./storage/innobase/trx/trx0roll.cc:124                                   
      #6  0x00005562571e3593 in trx_t::rollback (this=0x4a6d5ba30040, savept=0x0)     
          at ./storage/innobase/trx/trx0roll.cc:176                                   
      #7  0x00005562570c0f1b in row_mysql_handle_errors (new_err=0x640000aae430, trx=0x4a6d5ba30040, thr=0x620000241848, savept=0x0)
          at ./storage/innobase/row/row0mysql.cc:696 
      

      Theoretically, trx_t::mutex could protect trx->lock.wait_thr from racing. It's acquired in trx_rollback_step() before trx_commit_or_rollback_prepare() call, and it's also acquired in lock_cancel_waiting_and_release(). But there is some room between victim->lock.was_chosen_as_deadlock_victim= true assigning and trx->mutex_lock() in lock_cancel_waiting_and_release() call in Deadlock::report(), and this gives the ability to rollback the transaction in parallel thread before transaction mutex is acquired in lock_cancel_waiting_and_release().

      By the same reason acquiring lock_sys.mutex in trx_commit_or_rollback_prepare() before trx->lock.wait_thr zeroing out can't solve the issue.

      Attachments

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin added a comment - - edited

            I have checked only 10.6 version, but I suppose 10.6+ are affected also.

            vlad.lesin Vladislav Lesin added a comment - - edited I have checked only 10.6 version, but I suppose 10.6+ are affected also.
            vlad.lesin Vladislav Lesin added a comment - Test case: MDEV-29622.test.diff .

            Currently I don't see any way to fix it except moving victim->lock.was_chosen_as_deadlock_victim= true assignment from Deadlock::report() to lock_cancel_waiting_and_release() under trx->mutex_lock() protection, then finishing lock_cancel_waiting_and_release() if trx->lock.wait_thr == 0. trx->lock.wait_thr comparison must also be protected with trx->mutex_lock(). The reason of such solution is that both trx_commit_or_rollback_prepare() and lock_cancel_waiting_and_release() are executed under trx->mutex_lock() protection.

            vlad.lesin Vladislav Lesin added a comment - Currently I don't see any way to fix it except moving victim->lock.was_chosen_as_deadlock_victim= true assignment from Deadlock::report() to lock_cancel_waiting_and_release() under trx->mutex_lock() protection, then finishing lock_cancel_waiting_and_release() if trx->lock.wait_thr == 0. trx->lock.wait_thr comparison must also be protected with trx->mutex_lock(). The reason of such solution is that both trx_commit_or_rollback_prepare() and lock_cancel_waiting_and_release() are executed under trx->mutex_lock() protection.
            vlad.lesin Vladislav Lesin added a comment - - edited

            The previous comment is wrong. The first, there is no need to push victim->lock.was_chosen_as_deadlock_victim set under trx_t::mutex_lock() protection. The second, finishing lock_cancel_waiting_and_release() if trx->lock.wait_thr == 0 is wrong too. The explanation is the following.

            Suppose we have two transactions, trx 1 and trx 2 like in the issue description. trx 2 does deadlock resolving from lock_wait(), it set victim->lock.was_chosen_as_deadlock_victim= true for trx 1, but has not yet invoked lock_cancel_waiting_and_release().

            trx 1 checked the flag in lock_trx_handle_wait(), and started rollback from row_mysql_handle_errors(). It can change trx->lock.wait_thr and trx->state as it holds trx_t::mutex, but trx 2 has not yet requested it, as lock_cancel_waiting_and_release() has not yet been called. Moreover, I suppose trx 1 intentionally resets trx->lock.wait_thr in trx_commit_or_rollback_prepare(), as there is some logic related to it in que_run_threads().
            After that trx 1 tries to release locks in trx_t::rollback_low(), invoking trx_t::rollback_finish(). lock_release() is blocked on try to acquire lock_sys.rd_lock(SRW_LOCK_CALL) in lock_release_try(), as lock_sys is blocked by trx 2, as deadlock resolution works under lock_sys.wr_lock(SRW_LOCK_CALL), see Deadlock::report() for details.

            trx 2 executes lock_cancel_waiting_and_release() for deadlock victim, i. e. for trx 1. lock_cancel_waiting_and_release() contains some trx->lock.wait_thr and trx->state assertions, which will fail, because trx 1 has changed them during rollback execution. But, suppose we commented out the assertions, and lock_cancel_waiting_and_release() continues execution without crash. What does it actually do? It dequeues victim waiting lock from queue in lock_rec_dequeue_from_page(), i. e. it removes it from queue, corresponding to (space,page), and then it traverses the queue. For each element in the queue it tries to find another conflicting element in the queue, and, if it is found, makes the first element to be waiting for the found element, i.e. it rebuilds waiting graph.

            The most interesting thing in the above logic is that victim transaction can has another lock, conflicting, with the transaction which does deadlock resolving. And the procedure of waiting queue rebuilding will make the deadlock resolving transaction to be waiting for deadlock victim again. This is exactly what is happening in the test case: MDEV-29622.test.diff.

            I.e. (20,20) record was locked by one transaction with SELECT FOR UPDATE, then (10,10) was locked by UPDATE of another transaction, but then the UPDATE created waiting lock for (20,20), as (20,20) was locked with conflicting lock by the first transaction. Then the first transaction executes SELECT FOR UPDATE again, tries to lock (10,10), detects deadlock and releases waiting lock of the UPDATE for (20,20). But the UPDATE still holds lock in (10,10), and the procedure of waiting queue rebuilding make the first transaction to be waiting for the UPDATE again.

            So, trx 2 has just executed lock_cancel_waiting_and_release() and returned to lock_wait(). As trx 2 is still waiting for trx 1, trx 2 is suspended.

            trx 1 acquires lock_sys mutex, releases its locks and wakes trx 2 up.

            So, according to the above scenario, it's legal to have trx->lock.wait_thr==0 and trx->state!=TRX_STATE_ACTIVE in lock_cancel_waiting_and_release(), if it was invoked from Deadlock::report(), and the fix could be just in the assertions condition changing.

            But why do we need to invoke lock_cancel_waiting_and_release(victim->lock.wait_lock) in Deadlock::report()? I suppose it's because the victim can be suspended in lock_wait(), and we need to wake it up to start rollback. Ok then, we can just wake it up with condition variable signalling(lock_wait_end(trx)), but we don't need to release the lock and rebuild waiting graph, it will be released in rollback, called from row_mysql_handle_errors().

            vlad.lesin Vladislav Lesin added a comment - - edited The previous comment is wrong. The first, there is no need to push victim->lock.was_chosen_as_deadlock_victim set under trx_t::mutex_lock() protection. The second, finishing lock_cancel_waiting_and_release() if trx->lock.wait_thr == 0 is wrong too. The explanation is the following. Suppose we have two transactions, trx 1 and trx 2 like in the issue description. trx 2 does deadlock resolving from lock_wait(), it set victim->lock.was_chosen_as_deadlock_victim= true for trx 1, but has not yet invoked lock_cancel_waiting_and_release(). trx 1 checked the flag in lock_trx_handle_wait(), and started rollback from row_mysql_handle_errors(). It can change trx->lock.wait_thr and trx->state as it holds trx_t::mutex, but trx 2 has not yet requested it, as lock_cancel_waiting_and_release() has not yet been called. Moreover, I suppose trx 1 intentionally resets trx->lock.wait_thr in trx_commit_or_rollback_prepare(), as there is some logic related to it in que_run_threads(). After that trx 1 tries to release locks in trx_t::rollback_low(), invoking trx_t::rollback_finish(). lock_release() is blocked on try to acquire lock_sys.rd_lock(SRW_LOCK_CALL) in lock_release_try(), as lock_sys is blocked by trx 2, as deadlock resolution works under lock_sys.wr_lock(SRW_LOCK_CALL), see Deadlock::report() for details. trx 2 executes lock_cancel_waiting_and_release() for deadlock victim, i. e. for trx 1. lock_cancel_waiting_and_release() contains some trx->lock.wait_thr and trx->state assertions, which will fail, because trx 1 has changed them during rollback execution. But, suppose we commented out the assertions, and lock_cancel_waiting_and_release() continues execution without crash. What does it actually do? It dequeues victim waiting lock from queue in lock_rec_dequeue_from_page(), i. e. it removes it from queue, corresponding to (space,page), and then it traverses the queue. For each element in the queue it tries to find another conflicting element in the queue, and, if it is found, makes the first element to be waiting for the found element, i.e. it rebuilds waiting graph. The most interesting thing in the above logic is that victim transaction can has another lock, conflicting, with the transaction which does deadlock resolving. And the procedure of waiting queue rebuilding will make the deadlock resolving transaction to be waiting for deadlock victim again. This is exactly what is happening in the test case: MDEV-29622.test.diff . I.e. (20,20) record was locked by one transaction with SELECT FOR UPDATE, then (10,10) was locked by UPDATE of another transaction, but then the UPDATE created waiting lock for (20,20), as (20,20) was locked with conflicting lock by the first transaction. Then the first transaction executes SELECT FOR UPDATE again, tries to lock (10,10), detects deadlock and releases waiting lock of the UPDATE for (20,20). But the UPDATE still holds lock in (10,10), and the procedure of waiting queue rebuilding make the first transaction to be waiting for the UPDATE again. So, trx 2 has just executed lock_cancel_waiting_and_release() and returned to lock_wait(). As trx 2 is still waiting for trx 1, trx 2 is suspended. trx 1 acquires lock_sys mutex, releases its locks and wakes trx 2 up. So, according to the above scenario, it's legal to have trx->lock.wait_thr==0 and trx->state!=TRX_STATE_ACTIVE in lock_cancel_waiting_and_release(), if it was invoked from Deadlock::report(), and the fix could be just in the assertions condition changing. But why do we need to invoke lock_cancel_waiting_and_release(victim->lock.wait_lock) in Deadlock::report()? I suppose it's because the victim can be suspended in lock_wait(), and we need to wake it up to start rollback. Ok then, we can just wake it up with condition variable signalling(lock_wait_end(trx)), but we don't need to release the lock and rebuild waiting graph, it will be released in rollback, called from row_mysql_handle_errors().

            I agree that the rollback initiation in Deadlock::report() can and should be simplified. As far as I can tell, we’ll only need to set the victim flag and signal the condition variable. An extra check for the flag would be needed after the wait for trx->lock.cond in lock_wait(), so that the rollback would be initiated. That is, we’d better let the transaction owner thread itself roll back the transaction and release all locks.

            Similarly, lock_wait_wsrep_kill() could just wake up the victim, instead of invoking lock_sys.cancel_lock_wait_for_trx(vtrx).

            All other invocations of lock_cancel_waiting_and_release(), including those via lock_sys_t::cancel() and lock_trx_handle_wait(), should be in the thread that is executing the transaction.

            marko Marko Mäkelä added a comment - I agree that the rollback initiation in Deadlock::report() can and should be simplified. As far as I can tell, we’ll only need to set the victim flag and signal the condition variable. An extra check for the flag would be needed after the wait for trx->lock.cond in lock_wait() , so that the rollback would be initiated. That is, we’d better let the transaction owner thread itself roll back the transaction and release all locks. Similarly, lock_wait_wsrep_kill() could just wake up the victim, instead of invoking lock_sys.cancel_lock_wait_for_trx(vtrx) . All other invocations of lock_cancel_waiting_and_release() , including those via lock_sys_t::cancel() and lock_trx_handle_wait() , should be in the thread that is executing the transaction.

            People

              vlad.lesin Vladislav Lesin
              vlad.lesin Vladislav Lesin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.