Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-36886

log_t::get_lsn_approx() not always returns a lower bound

    XMLWordPrintable

Details

    • Not for Release Notes

    Description

      While testing MDEV-36301, which depends on MDEV-21923, saahil encountered the following assertion failure:

      MDEV-36301 2ca4fefe874fccfd48f505a782562a1d991cef45

      mariadbd: /data/Server/MDEV-36301C/storage/innobase/buf/buf0flu.cc:2355: ulint page_cleaner_flush_pages_recommendation(ulint, lsn_t, double, ulint, double): Assertion `oldest_lsn <= cur_lsn' failed.
       
      Thread 3 received signal SIGABRT, Aborted.
      (rr) backtrace
      ...
      #7  0x000063e774c424da in page_cleaner_flush_pages_recommendation (last_pages_in=last_pages_in@entry=0x0, oldest_lsn=oldest_lsn@entry=0x9d0b02, pct_lwm=pct_lwm@entry=0, dirty_blocks=<optimized out>, 
          dirty_blocks@entry=0x2b, dirty_pct=dirty_pct@entry=0.26520291106451216) at /data/Server/MDEV-36301C/storage/innobase/buf/buf0flu.cc:2355
      #8  0x000063e774c496fc in buf_flush_page_cleaner () at /data/Server/MDEV-36301C/storage/innobase/buf/buf0flu.cc:2707
      ...
      (rr) thread apply 22 backtrace
       
      Thread 22 (Thread 3475872.3477959 (ib_tpool_worker)):
      #0  0x000063e774a49a26 in Atomic_relaxed<unsigned long>::store (this=this@entry=0x63e7761b5940 <log_sys>, i=i@entry=0x0, o=o@entry=std::memory_order_relaxed) at /data/Server/MDEV-36301C/include/my_atomic_wrapper.h:47
      #1  0x000063e774ab2b3f in log_t::persist (this=this@entry=0x63e7761b5940 <log_sys>, lsn=lsn@entry=0x9d15e8) at /data/Server/MDEV-36301C/storage/innobase/log/log0log.cc:1228
      #2  0x000063e774ab38c8 in log_write_persist (lsn=lsn@entry=0x9d15e8) at /data/Server/MDEV-36301C/storage/innobase/log/log0log.cc:1239
      #3  0x000063e774ab3a6f in log_write_up_to (lsn=0x9d15e8, durable=durable@entry=0x1, callback=callback@entry=0x0) at /data/Server/MDEV-36301C/storage/innobase/log/log0log.cc:1466
      #4  0x000063e774ab3d74 in log_buffer_flush_to_disk (durable=durable@entry=0x1) at /data/Server/MDEV-36301C/storage/innobase/log/log0log.cc:1539
      

      It turns out that exactly at the moment where log_sys.get_lsn_approx() is executing, another thread is about to update log_sys.base_lsn:

      log_t::persist()

        uint64_t offset{write_lsn_offset};
        const lsn_t new_base_lsn= base_lsn.load(std::memory_order_relaxed) +
          (offset & (WRITE_BACKOFF - 1));
        ut_ad(new_base_lsn >= lsn);
        write_to_buf+= size_t(offset >> WRITE_TO_BUF_SHIFT);
        /* This synchronizes with get_lsn_approx();
        we must store write_lsn_offset before base_lsn. */
        write_lsn_offset.store(0, std::memory_order_relaxed);
        base_lsn.store(new_base_lsn, std::memory_order_release);
        flushed_to_disk_lsn.store(lsn, std::memory_order_relaxed);
        log_flush_notify(lsn);
      

      At this point, log_sys.write_lsn_offset is 0, but new_base_lsn has not been written back to log_sys.base_lsn yet.

      For the current situation, the function comment would be misleading:

      diff --git a/storage/innobase/include/log0log.h b/storage/innobase/include/log0log.h
      index 55ffbc0df99..ea6e5edf632 100644
      --- a/storage/innobase/include/log0log.h
      +++ b/storage/innobase/include/log0log.h
      @@ -406,9 +406,9 @@ struct log_t
         @param encrypted  whether the log is encrypted */
         static void header_write(byte *buf, lsn_t lsn, bool encrypted) noexcept;
       
      -  /** @return a lower bound estimate of get_lsn(),
      +  /** @return an estimate of get_lsn(),
         using acquire-release ordering with write_buf() or persist();
      -  this is exact unless append_prepare_wait() is pending */
      +  this is exact unless said functions or append_prepare_wait() are pending */
         lsn_t get_lsn_approx() const noexcept
         {
           /* acquire-release ordering with write_buf() and persist() */
      

      This would actually be an upper bound estimate in this case. In case an overshoot is happening and log_t::append_prepare_wait() is pending, this function would return a lower bound estimate. So, "an estimate" is the best we can claim.

      There are only two callers of get_lsn_approx(), both in the buf_flush_page_cleaner thread: af_needed_for_redo() and page_cleaner_flush_pages_recommendation(). Both are comparing the approximate count to buf_pool.get_oldest_modification(0), and they expect that the approximate LSN never is smaller than that. We must carefully evaluate what happens when the assumption does not hold.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Marko Mäkelä Marko Mäkelä
              Debarun Banerjee Debarun Banerjee (Inactive)
              Saahil Alam Saahil Alam
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.