Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29635

race on trx->lock.wait_lock in deadlock resolution

Details

    Description

      1. trx 1 does semi-consistent read for UPDATE, checks its was_chosen_as_deadlock_victim in lock_trx_handle_wait(), it's not set, and the execution goes to the trx->lock.wait_lock check.

      2. trx 2 executes DELETE and does deadlock resolution, it sets victim->lock.was_chosen_as_deadlock_victim= true for trx 1 and executes lock_cancel_waiting_and_release(), which, in turns, calls lock_reset_lock_and_trx_wait() where trx->lock.wait_lock is reset for trx 2.

      3. trx 1 loads trx->lock.wait_lock value, as it was reset by trx 1, the value is nullptr, and lock_trx_handle_wait() returns success. As it returns success, row_search_mvcc() tries to lock the next record and lock_rec_lock() aborts execution as the condition ut_ad(!(trx->lock.was_chosen_as_deadlock_victim & 1)) is not true;

      The logic which is supposed to be is the following. Trx 1 created waiting lock and either already suspended in lock_wait() or is moving to it. Trx 2 does deadlock resolution, chooses trx 1 as a victim and releases and cancels waiting lock. Trx 1 checks trx->lock.was_chosen_as_deadlock_victim before or after suspending in lock_wait(), and rolls back itself. And there must not be any new locks created by Trx 1. That's why MDEV-29081 made sense.

      This logic does not work in the case of semi-consistent read. As in this case trx 1 checks lock state in lock_trx_handle_wait() after creating waiting lock and trying to read committed version of the record. And if lock_trx_handle_wait() returns "success", lock_wait() will not be executed, and trx->lock.was_chosen_as_deadlock_victim will not be checked, what allows trx 1 to continue execution and create new locks even if trx->lock.was_chosen_as_deadlock_victim was set by some other transaction after lock_trx_handle_wait() call.

      Attachments

        Issue Links

          Activity

            Test case is added: MDEV-29635.test.diff .

            Note the test crashes not in the lock_rec_lock(), but in lock_trx_has_expl_x_lock(), like MDEV-29711(I even assume it's a duplicate). But it isn't so important, because the test shows the root reason of the bug, and repeating the crash in the same location, as it was found during our testing, would require additional effort. And we can't create test case, which would fail before the fix and be successful after the fix, as after the fix the debug sync points will be blocked.

            vlad.lesin Vladislav Lesin added a comment - Test case is added: MDEV-29635.test.diff . Note the test crashes not in the lock_rec_lock(), but in lock_trx_has_expl_x_lock(), like MDEV-29711 (I even assume it's a duplicate). But it isn't so important, because the test shows the root reason of the bug, and repeating the crash in the same location, as it was found during our testing, would require additional effort. And we can't create test case, which would fail before the fix and be successful after the fix, as after the fix the debug sync points will be blocked.
            vlad.lesin Vladislav Lesin added a comment - - edited

            I think that the following optimization:

            dberr_t lock_trx_handle_wait(trx_t *trx)                                        
            {                                                                               
              DEBUG_SYNC_C("lock_trx_handle_wait_enter");                                   
              if (trx->lock.was_chosen_as_deadlock_victim)                                  
                return DB_DEADLOCK;                                                         
              if (!trx->lock.wait_lock)                                                     
                return DB_SUCCESS;                                                          
              dberr_t err= DB_SUCCESS;                                                      
              mysql_mutex_lock(&lock_sys.wait_mutex);                                       
              if (trx->lock.was_chosen_as_deadlock_victim)                                  
                err= DB_DEADLOCK;                                                           
              else if (lock_t *wait_lock= trx->lock.wait_lock)                              
                err= lock_sys_t::cancel<true>(trx, wait_lock);                              
              lock_sys.deadlock_check();                                                    
              mysql_mutex_unlock(&lock_sys.wait_mutex);                                     
              return err;                                                                   
            } 
            

            for trx->lock.wait_lock is wrong.

            Because even if trx->lock.was_chosen_as_deadlock_victim was not set before the first check, it can be set after the check, and trx->lock.wait_lock can be reset by another thread from lock_reset_lock_and_trx_wait() if the transaction was chosen as deadlock victim. In this case lock_trx_handle_wait() will return DB_SUCCESS even the transaction was marked as deadlock victim, and continue execution instead of rolling back.

            We can leave the first check for trx->lock.was_chosen_as_deadlock_victim and remove the first check for trx->lock.wait_lock. In this case after lock_trx_handle_wait() is executed, the lock will be either acquired or cancelled, there will not be waiting lock for the transaction, thus the transaction will not be selected as deadlock victim after lock_trx_handle_wait() call and before the next waiting record lock is created.

            Note the above claim is true for 10.6+, but needs to be checked for 10.3, and, if it's wrong, it can be the reason of MDEV-29711.

            vlad.lesin Vladislav Lesin added a comment - - edited I think that the following optimization: dberr_t lock_trx_handle_wait(trx_t *trx) { DEBUG_SYNC_C( "lock_trx_handle_wait_enter" ); if (trx->lock.was_chosen_as_deadlock_victim) return DB_DEADLOCK; if (!trx->lock.wait_lock) return DB_SUCCESS; dberr_t err= DB_SUCCESS; mysql_mutex_lock(&lock_sys.wait_mutex); if (trx->lock.was_chosen_as_deadlock_victim) err= DB_DEADLOCK; else if (lock_t *wait_lock= trx->lock.wait_lock) err= lock_sys_t::cancel< true >(trx, wait_lock); lock_sys.deadlock_check(); mysql_mutex_unlock(&lock_sys.wait_mutex); return err; } for trx->lock.wait_lock is wrong. Because even if trx->lock.was_chosen_as_deadlock_victim was not set before the first check, it can be set after the check, and trx->lock.wait_lock can be reset by another thread from lock_reset_lock_and_trx_wait() if the transaction was chosen as deadlock victim. In this case lock_trx_handle_wait() will return DB_SUCCESS even the transaction was marked as deadlock victim, and continue execution instead of rolling back. We can leave the first check for trx->lock.was_chosen_as_deadlock_victim and remove the first check for trx->lock.wait_lock. In this case after lock_trx_handle_wait() is executed, the lock will be either acquired or cancelled, there will not be waiting lock for the transaction, thus the transaction will not be selected as deadlock victim after lock_trx_handle_wait() call and before the next waiting record lock is created. Note the above claim is true for 10.6+, but needs to be checked for 10.3, and, if it's wrong, it can be the reason of MDEV-29711 .

            BTW, the above test will even work for the fix, proposed above.

            vlad.lesin Vladislav Lesin added a comment - BTW, the above test will even work for the fix, proposed above.

            People

              vlad.lesin Vladislav Lesin
              vlad.lesin Vladislav Lesin
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.