Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29622

Wrong assertions in lock_cancel_waiting_and_release() for deadlock resolving caller

    XMLWordPrintable

Details

    Description

      The scenario is the following:

      1. trx 1 does semi-consistent read for UPDATE, creates waiting lock.

      2. trx 2 executes UPDATE, does deadlock check in lock_wait(), sets trx->lock.was_chosen_as_deadlock_victim for trx 1 in Deadlock::report() just before lock_cancel_waiting_and_release() call.

      3. trx 1 checks trx->lock.was_chosen_as_deadlock_victim in lock_trx_handle_wait(), and as it's set, does rollback from row_mysql_handle_errors(). trx_commit_or_rollback_prepare() zeroes out trx->lock.wait_thr.

      4. trx 2 executes lock_cancel_waiting_and_release(). lock_wait_end() aborts on checking trx->lock.wait_thr.

      The comment of trx->lock.wait_thr says:

              que_thr_t*      wait_thr;       /*!< query thread belonging to this      
                                              trx that is in waiting                  
                                              state. For threads suspended in a       
                                              lock wait, this is protected by         
                                              lock_sys.latch. Otherwise, this may     
                                              only be modified by the thread that is  
                                              serving the running transaction. */
      

      And lock_wait() acquires lock_sys.wait_mutex before deadlock check:

        mysql_mutex_lock(&lock_sys.wait_mutex);                                       
        if (trx->lock.wait_lock)                                                      
        {                                                                             
          if (Deadlock::check_and_resolve(trx))                                       
          {                                                                           
            ut_ad(!trx->lock.wait_lock);                                              
            trx->error_state= DB_DEADLOCK;                                            
            goto end_wait;                                                            
          }                                                                           
        } 
      

      But lock_sys.wait_mutex is not acquired in the following call stack:

      0x00005562572076ca in trx_commit_or_rollback_prepare (trx=0x4a6d5ba30040)       
          at ./storage/innobase/trx/trx0trx.cc:1507                                   
      1507                    trx->lock.wait_thr = NULL;                              
      (rr) bt                                                                         
      #0  0x00005562572076ca in trx_commit_or_rollback_prepare (trx=0x4a6d5ba30040)   
          at ./storage/innobase/trx/trx0trx.cc:1507                                   
      #1  0x00005562571e7c38 in trx_rollback_step (thr=0x6160085ce218) at ./storage/innobase/trx/trx0roll.cc:915
      #2  0x0000556256ffe0d9 in que_thr_step (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:659
      #3  0x0000556256ffe430 in que_run_threads_low (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:709
      #4  0x0000556256ffe5d2 in que_run_threads (thr=0x6160085ce218) at ./storage/innobase/que/que0que.cc:729
      #5  0x00005562571e9102 in trx_t::rollback_low (this=0x4a6d5ba30040, savept=0x0) 
          at ./storage/innobase/trx/trx0roll.cc:124                                   
      #6  0x00005562571e3593 in trx_t::rollback (this=0x4a6d5ba30040, savept=0x0)     
          at ./storage/innobase/trx/trx0roll.cc:176                                   
      #7  0x00005562570c0f1b in row_mysql_handle_errors (new_err=0x640000aae430, trx=0x4a6d5ba30040, thr=0x620000241848, savept=0x0)
          at ./storage/innobase/row/row0mysql.cc:696 
      

      Theoretically, trx_t::mutex could protect trx->lock.wait_thr from racing. It's acquired in trx_rollback_step() before trx_commit_or_rollback_prepare() call, and it's also acquired in lock_cancel_waiting_and_release(). But there is some room between victim->lock.was_chosen_as_deadlock_victim= true assigning and trx->mutex_lock() in lock_cancel_waiting_and_release() call in Deadlock::report(), and this gives the ability to rollback the transaction in parallel thread before transaction mutex is acquired in lock_cancel_waiting_and_release().

      By the same reason acquiring lock_sys.mutex in trx_commit_or_rollback_prepare() before trx->lock.wait_thr zeroing out can't solve the issue.

      Attachments

        Issue Links

          Activity

            People

              vlad.lesin Vladislav Lesin
              vlad.lesin Vladislav Lesin
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.