Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28043

Race condition between mtr_t::commit() and checkpoint

    XMLWordPrintable

Details

    Description

      When MDEV-27774 replaced log_sys.mutex with log_sys.latch, it introduced a race condition in mtr_t::do_write():

          if (!ex)
          {
            log_sys.latch.rd_unlock();
            log_sys.latch.wr_lock(SRW_LOCK_CALL);
            if (UNIV_LIKELY(!m_user_space->max_lsn))
              name_write();
            std::pair<lsn_t,mtr_t::page_flush_ahead> p{finish_write(len, true)};
            log_sys.latch.wr_unlock();
            log_sys.latch.rd_lock(SRW_LOCK_CALL);
            return p;
          }
      

      It is not safe to release the exclusive log_sys.latch between finish_write() and ReleaseBlocks. Because we have no portable operation that would downgrade the latch from exclusive to shared, we must retain that exclusive latch until the end of the critical section in mtr_t::commit().

      I debugged an rr replay trace of this:

      ssh pluto
      rr replay /data/results/1647008467/TBR-1420/dev/shm/rqg/1647008467/53/1/rr/latest-trace
      

      continue
      watch -l log_sys.last_checkpoint_lsn.m._M_i
      watch -l buf_pool.flush_list.count
      reverse-continue
      reverse-continue
      reverse-continue
      thread apply 24 backtrace
      

      From the end of the start, we have Thread 3 hitting an assertion failure:

      mysqld: /data/Server/bb-10.9-MDEV-26603-async-redo-writeB/storage/innobase/buf/buf0flu.cc:1877: bool log_checkpoint_low(lsn_t, lsn_t): Assertion `oldest_lsn > log_sys.last_checkpoint_lsn' failed.
      

      Before that, we had Thread 24 inserting the unexpectedly old block to buf_pool.flush_list, and before that, Thread 3 updating the checkpoint LSN to the too new value.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.