Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34307

Recovery or mariadb-backup --prepare debug failure [FATAL] InnoDB: Page ... still fixed or dirty

Details

    Description

      In MDEV-34062, mleich reported an assertion failure in mariadb-backup --prepare:

      10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c

      2024-05-31 13:24:00 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=622] still fixed or dirty
      

      Based on analyzing the rr replay trace, it initially looked rather harmless, because the data page had actually been written back, and we were waiting for that write-back, by waiting for the buf_pool.flush_list to be cleared and the checkpoint LSN to be advanced.

      ssh pluto
      rr replay /data/results/1717145612/MB-2134/1_clone/rr/mariadb-backup-1
      

      10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c

      Continuing.
       
      Thread 4 hit Hardware watchpoint 1: -location block.page.zip.fix.m._M_i
       
      Old value = 2684354560
      New value = 536870912
      buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
      295	  lock.u_unlock(true);
      (rr) bt
      #0  buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560)
          at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:292
      #1  0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
      #2  0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
      #3  0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
      #4  0x00005654f239a7a0 in tpool::task_group::execute (this=0x5654f4b6d2f0, t=t@entry=0x5654f4b6d5e0) at /data/Server/10.11-MDEV-34062/tpool/task_group.cc:70
      (rr) thread apply 1 bt
      #14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f3090af0 <buf_pool+17520>, mp=0x5654f3090980 <buf_pool+17152>, file=0x5654f28719a0 "/data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc", line=2060) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
      #15 0x00005654f1f4a8b2 in buf_flush_wait (lsn=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2060
      #16 0x00005654f1f4aa68 in buf_flush_sync_batch (lsn=<optimized out>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2736
      #17 0x00005654f20fd58f in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3996
      #18 0x00005654f2118199 in recv_sys_t::parse<recv_buf, true> (this=this@entry=0x5654f3098080 <recv_sys>, l=..., if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:2944
      #19 0x00005654f211894e in recv_sys_t::parse_mtr<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3112
      #20 0x00005654f211aea2 in recv_sys_t::parse_mmap<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3123
      #21 0x00005654f21029cc in recv_scan_log (last_phase=last_phase@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4161
      #22 0x00005654f21037e0 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4678
      #23 0x00005654f2270fb2 in srv_start (create_new_db=create_new_db@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/srv/srv0start.cc:1458
      #24 0x00005654f16f9935 in innodb_init () at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:2627
      #25 0x00005654f170b768 in xtrabackup_prepare_func (argv=argv@entry=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:6819
      #26 0x00005654f17135c6 in main_low (argv=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7754
      #27 0x00005654f1713819 in main (argc=10, argv=0x7ffdabaf0688) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7538
      

      The state of the block was changed to UNFIXED, with no buffer-fix or io-fix. Thread 1 above would be the thread where the assertion fails a little later. This block is returned by buf_pool_t::chunk_t::not_freed() because the page latch had not been released:

      10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c

      #0  buf_pool_t::chunk_t::not_freed (this=this@entry=0x5654f4b77810) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:1163
      #1  0x00005654f1f21381 in buf_pool_t::assert_all_freed (this=this@entry=0x5654f308c680 <buf_pool>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:3985
      #2  0x00005654f1f21470 in buf_pool_invalidate () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:4009
      #3  0x00005654f20fd594 in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3997
      (rr) thread apply 4 bt
      Thread 4 (Thread 3423413.3423967 (mariadb-backup)):
      #0  buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
      #1  0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
      #2  0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
      #3  0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
      (rr) p write_slots.m_cache.m_pos
      $6 = 1
      

      The following should fix this:

      diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
      index 49f73105f99..2d997d4c750 100644
      --- a/storage/innobase/buf/buf0buf.cc
      +++ b/storage/innobase/buf/buf0buf.cc
      @@ -3999,15 +3999,13 @@ void buf_refresh_io_stats()
       All pages must be in a replaceable state (not modified or latched). */
       void buf_pool_invalidate()
       {
      -	mysql_mutex_lock(&buf_pool.mutex);
      -
       	/* It is possible that a write batch that has been posted
       	earlier is still not complete. For buffer pool invalidation to
       	proceed we must ensure there is NO write activity happening. */
       
      -	ut_d(mysql_mutex_unlock(&buf_pool.mutex));
      +	os_aio_wait_until_no_pending_writes(false);
       	ut_d(buf_pool.assert_all_freed());
      -	ut_d(mysql_mutex_lock(&buf_pool.mutex));
      +	mysql_mutex_lock(&buf_pool.mutex);
       
       	while (UT_LIST_GET_LEN(buf_pool.LRU)) {
       		buf_LRU_scan_and_free_block();
      

      It is possible that enabling the memory-mapped log writes on HDD storage (MDEV-34062) made it much easier to reproduce this race condition. The above fix should be applicable to 10.6 as well. In 10.5 the code is different; it had been refactored in MDEV-26827.

      I was worried if this could cause a hang in a non-debug build, because buf_LRU_free_page(), which would be invoked in the while loop in the above patch snippet, is not guaranteed to release and re-acquire buf_pool.mutex. But, it looks like we are safe. The very last thing that buf_page_t::write_complete() does is releasing the page latch, and it was blocked exactly there in the above rr replay trace. It is called by the following:

        if (UNIV_UNLIKELY(!persistent) && UNIV_LIKELY(!error))
        {
          /* We must hold buf_pool.mutex while releasing the block, so that
          no other thread can access it before we have freed it. */
          mysql_mutex_lock(&buf_pool.mutex);
          bpage->write_complete(persistent, error, state);
          buf_LRU_free_page(bpage, true);
          mysql_mutex_unlock(&buf_pool.mutex);
        }
        else
        {
          bpage->write_complete(persistent, error, state);
          if (state < buf_page_t::WRITE_FIX_REINIT &&
              request.node->space->use_doublewrite())
          {
            ut_ad(persistent);
            buf_dblwr.write_completed();
          }
        }
      

      Obviously, the problematic call must be the else branch, because the branch above that is holding buf_pool.mutex and therefore would block buf_pool_invalidate() from executing. The call to buf_dblwr is not accessing anything in buf_pool; it would basically just release a further doublewrite batch (which are impossible in this scenario).

      The function buf_pool_invalidate() is also being invoked from buf_dblwr::create() during database creation, so the crash could theoretically also occur at that point. Normally, it would occur on recovery or mariadb-backup --prepare when multiple recovery batches are needed.

      This bug seems to affect debug builds only. Nevertheless, I would add the call to os_aio_wait_until_no_pending_writes() in all builds.

      Attachments

        Issue Links

          Activity

            The following shows that there is no bug in buf_flush_wait(). Right before buf_page_t::write_complete() released the write fix on the block descriptor, it had cleared the buf_page_t::oldest_modification_ field to 1, indicating that the block is actually clean but residing in the buf_pool.flush_list because we want to avoid too frequent operations on buf_pool.flush_list_mutex:

            buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:292
            292	  zip.fix.fetch_sub((state >= WRITE_FIX_REINIT)
            (rr) rn
            0x00005654f1f4b046	292	  zip.fix.fetch_sub((state >= WRITE_FIX_REINIT)
            (rr) 
            292	  zip.fix.fetch_sub((state >= WRITE_FIX_REINIT)
            (rr) 
            290	    oldest_modification_.store(persistent, std::memory_order_release);
            (rr) thread apply 1 bt
            …
            #14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f3090af0 <buf_pool+17520>, mp=0x5654f3090980 <buf_pool+17152>, file=0x5654f28719a0 "/data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc", line=2060) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
            #15 0x00005654f1f4a8b2 in buf_flush_wait (lsn=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2060
            #16 0x00005654f1f4aa68 in buf_flush_sync_batch (lsn=<optimized out>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2736
            #17 0x00005654f20fd58f in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3996
            …
            #22 0x00005654f21037e0 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4678
            #23 0x00005654f2270fb2 in srv_start (create_new_db=create_new_db@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/srv/srv0start.cc:1458
            #24 0x00005654f16f9935 in innodb_init () at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:2627
            #25 0x00005654f170b768 in xtrabackup_prepare_func (argv=argv@entry=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:6819
            #26 0x00005654f17135c6 in main_low (argv=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7754
            #27 0x00005654f1713819 in main (argc=10, argv=0x7ffdabaf0688) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7538
            (rr) thread apply 3 bt
            …
            #14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f4b6d240, mp=0x5654f4b6d190, file=0x5654f28d3068 "/data/Server/10.11-MDEV-34062/tpool/tpool_structs.h", line=140) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
            #15 0x00005654f213e542 in tpool::cache<tpool::aiocb>::get (this=0x5654f4b6d190) at /data/Server/10.11-MDEV-34062/tpool/tpool_structs.h:140
            #16 io_slots::acquire (this=0x5654f4b6d190) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:93
            #17 os_aio (type=..., buf=buf@entry=0x285575d48000, offset=offset@entry=20512768, n=n@entry=32768) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3655
            #18 0x00005654f1fa6ca9 in fil_space_t::io (this=this@entry=0x5654f4b9c190, type=..., offset=20512768, len=32768, buf=buf@entry=0x285575d48000, bpage=bpage@entry=0x285575b28700) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2826
            #19 0x00005654f1f4ce3e in buf_page_t::flush (this=this@entry=0x285575b28700, space=space@entry=0x5654f4b9c190) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:851
            #20 0x00005654f1f4fa40 in buf_do_flush_list_batch (max_n=max_n@entry=2000, lsn=lsn@entry=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:1474
            #21 0x00005654f1f5030f in buf_flush_list_holding_mutex (max_n=max_n@entry=2000, lsn=lsn@entry=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:1548
            #22 0x00005654f1f52d5c in buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2618
            (rr) break buf_pool_t::get_oldest_modification
            Breakpoint 2 at 0x5654f1714538: file /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h, line 1393.
            (rr) continue
            …
            Continuing.
            [Switching to Thread 3423413.3423967]
             
            Thread 4 hit Hardware watchpoint 1: -location $1.page.zip.fix.m._M_i
             
            Old value = 2684354560
            New value = 536870912
            buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
            295	  lock.u_unlock(true);
            (rr) 
            Continuing.
            [Switching to Thread 3423413.3423425]
             
            Thread 3 hit Breakpoint 2, buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393
            1393	  lsn_t get_oldest_modification(lsn_t empty_lsn)
            (rr) bt
            #0  buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393
            #1  0x00005654f1f52775 in buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2548
            …
            (rr) finish
            Run till exit from #0  buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393
            buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2549
            2549	      if (!oldest_lsn)
            Value returned is $3 = 0
            (rr) n
            2482	      buf_flush_sync_lsn= 0;
            (rr) 
            2484	      buf_pool.page_cleaner_set_idle(true);
            (rr) 
            2486	      pthread_cond_broadcast(&buf_pool.done_flush_LRU);
            (rr) 
            2487	      pthread_cond_broadcast(&buf_pool.done_flush_list);
            

            At this point, the buf_pool.flush_list was cleared of the ‘garbage’ (blocks that are actually clean) and the buf_flush_page_cleaner will mark itself as idle and wake up buf_flush_wait() in Thread 1, which was waiting for buf_pool.done_flush_list. A little later, the assertion failure in Thread 1 would trip, because the page latch had not yet been released. There should be no actual issue; this should be just an overzealous debug check.

            marko Marko Mäkelä added a comment - The following shows that there is no bug in buf_flush_wait() . Right before buf_page_t::write_complete() released the write fix on the block descriptor, it had cleared the buf_page_t::oldest_modification_ field to 1, indicating that the block is actually clean but residing in the buf_pool.flush_list because we want to avoid too frequent operations on buf_pool.flush_list_mutex : buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:292 292 zip.fix.fetch_sub((state >= WRITE_FIX_REINIT) (rr) rn 0x00005654f1f4b046 292 zip.fix.fetch_sub((state >= WRITE_FIX_REINIT) (rr) 292 zip.fix.fetch_sub((state >= WRITE_FIX_REINIT) (rr) 290 oldest_modification_.store(persistent, std::memory_order_release); (rr) thread apply 1 bt … #14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f3090af0 <buf_pool+17520>, mp=0x5654f3090980 <buf_pool+17152>, file=0x5654f28719a0 "/data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc", line=2060) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494 #15 0x00005654f1f4a8b2 in buf_flush_wait (lsn=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2060 #16 0x00005654f1f4aa68 in buf_flush_sync_batch (lsn=<optimized out>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2736 #17 0x00005654f20fd58f in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3996 … #22 0x00005654f21037e0 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4678 #23 0x00005654f2270fb2 in srv_start (create_new_db=create_new_db@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/srv/srv0start.cc:1458 #24 0x00005654f16f9935 in innodb_init () at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:2627 #25 0x00005654f170b768 in xtrabackup_prepare_func (argv=argv@entry=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:6819 #26 0x00005654f17135c6 in main_low (argv=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7754 #27 0x00005654f1713819 in main (argc=10, argv=0x7ffdabaf0688) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7538 (rr) thread apply 3 bt … #14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f4b6d240, mp=0x5654f4b6d190, file=0x5654f28d3068 "/data/Server/10.11-MDEV-34062/tpool/tpool_structs.h", line=140) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494 #15 0x00005654f213e542 in tpool::cache<tpool::aiocb>::get (this=0x5654f4b6d190) at /data/Server/10.11-MDEV-34062/tpool/tpool_structs.h:140 #16 io_slots::acquire (this=0x5654f4b6d190) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:93 #17 os_aio (type=..., buf=buf@entry=0x285575d48000, offset=offset@entry=20512768, n=n@entry=32768) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3655 #18 0x00005654f1fa6ca9 in fil_space_t::io (this=this@entry=0x5654f4b9c190, type=..., offset=20512768, len=32768, buf=buf@entry=0x285575d48000, bpage=bpage@entry=0x285575b28700) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2826 #19 0x00005654f1f4ce3e in buf_page_t::flush (this=this@entry=0x285575b28700, space=space@entry=0x5654f4b9c190) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:851 #20 0x00005654f1f4fa40 in buf_do_flush_list_batch (max_n=max_n@entry=2000, lsn=lsn@entry=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:1474 #21 0x00005654f1f5030f in buf_flush_list_holding_mutex (max_n=max_n@entry=2000, lsn=lsn@entry=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:1548 #22 0x00005654f1f52d5c in buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2618 (rr) break buf_pool_t::get_oldest_modification Breakpoint 2 at 0x5654f1714538: file /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h, line 1393. (rr) continue … Continuing. [Switching to Thread 3423413.3423967]   Thread 4 hit Hardware watchpoint 1: -location $1.page.zip.fix.m._M_i   Old value = 2684354560 New value = 536870912 buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295 295 lock.u_unlock(true); (rr) Continuing. [Switching to Thread 3423413.3423425]   Thread 3 hit Breakpoint 2, buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393 1393 lsn_t get_oldest_modification(lsn_t empty_lsn) (rr) bt #0 buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393 #1 0x00005654f1f52775 in buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2548 … (rr) finish Run till exit from #0 buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393 buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2549 2549 if (!oldest_lsn) Value returned is $3 = 0 (rr) n 2482 buf_flush_sync_lsn= 0; (rr) 2484 buf_pool.page_cleaner_set_idle(true); (rr) 2486 pthread_cond_broadcast(&buf_pool.done_flush_LRU); (rr) 2487 pthread_cond_broadcast(&buf_pool.done_flush_list); At this point, the buf_pool.flush_list was cleared of the ‘garbage’ (blocks that are actually clean) and the buf_flush_page_cleaner will mark itself as idle and wake up buf_flush_wait() in Thread 1, which was waiting for buf_pool.done_flush_list . A little later, the assertion failure in Thread 1 would trip, because the page latch had not yet been released. There should be no actual issue; this should be just an overzealous debug check.

            Thanks marko for explaining. I see the atomics during IO completion marks the LSN and would let buf_flush_wait return while the other actions in IO completion callback is in progress. This should be ok as far as buf_flush_wait is concerned as the page is already written. The design has changed somewhat from the earlier days where we would remove the page from the flush list.

            debarun Debarun Banerjee added a comment - Thanks marko for explaining. I see the atomics during IO completion marks the LSN and would let buf_flush_wait return while the other actions in IO completion callback is in progress. This should be ok as far as buf_flush_wait is concerned as the page is already written. The design has changed somewhat from the earlier days where we would remove the page from the flush list.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.