Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34307

Recovery or mariadb-backup --prepare debug failure [FATAL] InnoDB: Page ... still fixed or dirty

    XMLWordPrintable

Details

    Description

      In MDEV-34062, mleich reported an assertion failure in mariadb-backup --prepare:

      10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c

      2024-05-31 13:24:00 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=622] still fixed or dirty
      

      Based on analyzing the rr replay trace, it initially looked rather harmless, because the data page had actually been written back, and we were waiting for that write-back, by waiting for the buf_pool.flush_list to be cleared and the checkpoint LSN to be advanced.

      ssh pluto
      rr replay /data/results/1717145612/MB-2134/1_clone/rr/mariadb-backup-1
      

      10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c

      Continuing.
       
      Thread 4 hit Hardware watchpoint 1: -location block.page.zip.fix.m._M_i
       
      Old value = 2684354560
      New value = 536870912
      buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
      295	  lock.u_unlock(true);
      (rr) bt
      #0  buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560)
          at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:292
      #1  0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
      #2  0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
      #3  0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
      #4  0x00005654f239a7a0 in tpool::task_group::execute (this=0x5654f4b6d2f0, t=t@entry=0x5654f4b6d5e0) at /data/Server/10.11-MDEV-34062/tpool/task_group.cc:70
      (rr) thread apply 1 bt
      #14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f3090af0 <buf_pool+17520>, mp=0x5654f3090980 <buf_pool+17152>, file=0x5654f28719a0 "/data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc", line=2060) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
      #15 0x00005654f1f4a8b2 in buf_flush_wait (lsn=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2060
      #16 0x00005654f1f4aa68 in buf_flush_sync_batch (lsn=<optimized out>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2736
      #17 0x00005654f20fd58f in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3996
      #18 0x00005654f2118199 in recv_sys_t::parse<recv_buf, true> (this=this@entry=0x5654f3098080 <recv_sys>, l=..., if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:2944
      #19 0x00005654f211894e in recv_sys_t::parse_mtr<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3112
      #20 0x00005654f211aea2 in recv_sys_t::parse_mmap<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3123
      #21 0x00005654f21029cc in recv_scan_log (last_phase=last_phase@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4161
      #22 0x00005654f21037e0 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4678
      #23 0x00005654f2270fb2 in srv_start (create_new_db=create_new_db@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/srv/srv0start.cc:1458
      #24 0x00005654f16f9935 in innodb_init () at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:2627
      #25 0x00005654f170b768 in xtrabackup_prepare_func (argv=argv@entry=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:6819
      #26 0x00005654f17135c6 in main_low (argv=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7754
      #27 0x00005654f1713819 in main (argc=10, argv=0x7ffdabaf0688) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7538
      

      The state of the block was changed to UNFIXED, with no buffer-fix or io-fix. Thread 1 above would be the thread where the assertion fails a little later. This block is returned by buf_pool_t::chunk_t::not_freed() because the page latch had not been released:

      10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c

      #0  buf_pool_t::chunk_t::not_freed (this=this@entry=0x5654f4b77810) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:1163
      #1  0x00005654f1f21381 in buf_pool_t::assert_all_freed (this=this@entry=0x5654f308c680 <buf_pool>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:3985
      #2  0x00005654f1f21470 in buf_pool_invalidate () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:4009
      #3  0x00005654f20fd594 in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3997
      (rr) thread apply 4 bt
      Thread 4 (Thread 3423413.3423967 (mariadb-backup)):
      #0  buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
      #1  0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
      #2  0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
      #3  0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
      (rr) p write_slots.m_cache.m_pos
      $6 = 1
      

      The following should fix this:

      diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
      index 49f73105f99..2d997d4c750 100644
      --- a/storage/innobase/buf/buf0buf.cc
      +++ b/storage/innobase/buf/buf0buf.cc
      @@ -3999,15 +3999,13 @@ void buf_refresh_io_stats()
       All pages must be in a replaceable state (not modified or latched). */
       void buf_pool_invalidate()
       {
      -	mysql_mutex_lock(&buf_pool.mutex);
      -
       	/* It is possible that a write batch that has been posted
       	earlier is still not complete. For buffer pool invalidation to
       	proceed we must ensure there is NO write activity happening. */
       
      -	ut_d(mysql_mutex_unlock(&buf_pool.mutex));
      +	os_aio_wait_until_no_pending_writes(false);
       	ut_d(buf_pool.assert_all_freed());
      -	ut_d(mysql_mutex_lock(&buf_pool.mutex));
      +	mysql_mutex_lock(&buf_pool.mutex);
       
       	while (UT_LIST_GET_LEN(buf_pool.LRU)) {
       		buf_LRU_scan_and_free_block();
      

      It is possible that enabling the memory-mapped log writes on HDD storage (MDEV-34062) made it much easier to reproduce this race condition. The above fix should be applicable to 10.6 as well. In 10.5 the code is different; it had been refactored in MDEV-26827.

      I was worried if this could cause a hang in a non-debug build, because buf_LRU_free_page(), which would be invoked in the while loop in the above patch snippet, is not guaranteed to release and re-acquire buf_pool.mutex. But, it looks like we are safe. The very last thing that buf_page_t::write_complete() does is releasing the page latch, and it was blocked exactly there in the above rr replay trace. It is called by the following:

        if (UNIV_UNLIKELY(!persistent) && UNIV_LIKELY(!error))
        {
          /* We must hold buf_pool.mutex while releasing the block, so that
          no other thread can access it before we have freed it. */
          mysql_mutex_lock(&buf_pool.mutex);
          bpage->write_complete(persistent, error, state);
          buf_LRU_free_page(bpage, true);
          mysql_mutex_unlock(&buf_pool.mutex);
        }
        else
        {
          bpage->write_complete(persistent, error, state);
          if (state < buf_page_t::WRITE_FIX_REINIT &&
              request.node->space->use_doublewrite())
          {
            ut_ad(persistent);
            buf_dblwr.write_completed();
          }
        }
      

      Obviously, the problematic call must be the else branch, because the branch above that is holding buf_pool.mutex and therefore would block buf_pool_invalidate() from executing. The call to buf_dblwr is not accessing anything in buf_pool; it would basically just release a further doublewrite batch (which are impossible in this scenario).

      The function buf_pool_invalidate() is also being invoked from buf_dblwr::create() during database creation, so the crash could theoretically also occur at that point. Normally, it would occur on recovery or mariadb-backup --prepare when multiple recovery batches are needed.

      This bug seems to affect debug builds only. Nevertheless, I would add the call to os_aio_wait_until_no_pending_writes() in all builds.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.