Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6, 10.11, 11.4
Description
In MDEV-34062, mleich reported an assertion failure in mariadb-backup --prepare:
10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c |
2024-05-31 13:24:00 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=622] still fixed or dirty
|
Based on analyzing the rr replay trace, it initially looked rather harmless, because the data page had actually been written back, and we were waiting for that write-back, by waiting for the buf_pool.flush_list to be cleared and the checkpoint LSN to be advanced.
ssh pluto
|
rr replay /data/results/1717145612/MB-2134/1_clone/rr/mariadb-backup-1
|
10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c |
Continuing.
|
|
Thread 4 hit Hardware watchpoint 1: -location block.page.zip.fix.m._M_i
|
|
Old value = 2684354560
|
New value = 536870912
|
buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
|
295 lock.u_unlock(true);
|
(rr) bt
|
#0 buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560)
|
at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:292
|
#1 0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
|
#2 0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
|
#3 0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
|
#4 0x00005654f239a7a0 in tpool::task_group::execute (this=0x5654f4b6d2f0, t=t@entry=0x5654f4b6d5e0) at /data/Server/10.11-MDEV-34062/tpool/task_group.cc:70
|
(rr) thread apply 1 bt
|
#14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f3090af0 <buf_pool+17520>, mp=0x5654f3090980 <buf_pool+17152>, file=0x5654f28719a0 "/data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc", line=2060) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
|
#15 0x00005654f1f4a8b2 in buf_flush_wait (lsn=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2060
|
#16 0x00005654f1f4aa68 in buf_flush_sync_batch (lsn=<optimized out>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2736
|
#17 0x00005654f20fd58f in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3996
|
#18 0x00005654f2118199 in recv_sys_t::parse<recv_buf, true> (this=this@entry=0x5654f3098080 <recv_sys>, l=..., if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:2944
|
#19 0x00005654f211894e in recv_sys_t::parse_mtr<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3112
|
#20 0x00005654f211aea2 in recv_sys_t::parse_mmap<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3123
|
#21 0x00005654f21029cc in recv_scan_log (last_phase=last_phase@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4161
|
#22 0x00005654f21037e0 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4678
|
#23 0x00005654f2270fb2 in srv_start (create_new_db=create_new_db@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/srv/srv0start.cc:1458
|
#24 0x00005654f16f9935 in innodb_init () at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:2627
|
#25 0x00005654f170b768 in xtrabackup_prepare_func (argv=argv@entry=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:6819
|
#26 0x00005654f17135c6 in main_low (argv=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7754
|
#27 0x00005654f1713819 in main (argc=10, argv=0x7ffdabaf0688) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7538
|
The state of the block was changed to UNFIXED, with no buffer-fix or io-fix. Thread 1 above would be the thread where the assertion fails a little later. This block is returned by buf_pool_t::chunk_t::not_freed() because the page latch had not been released:
10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c |
#0 buf_pool_t::chunk_t::not_freed (this=this@entry=0x5654f4b77810) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:1163
|
#1 0x00005654f1f21381 in buf_pool_t::assert_all_freed (this=this@entry=0x5654f308c680 <buf_pool>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:3985
|
#2 0x00005654f1f21470 in buf_pool_invalidate () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:4009
|
#3 0x00005654f20fd594 in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3997
|
(rr) thread apply 4 bt
|
Thread 4 (Thread 3423413.3423967 (mariadb-backup)):
|
#0 buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
|
#1 0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
|
#2 0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
|
#3 0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
|
(rr) p write_slots.m_cache.m_pos
|
$6 = 1
|
The following should fix this:
diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
|
index 49f73105f99..2d997d4c750 100644
|
--- a/storage/innobase/buf/buf0buf.cc
|
+++ b/storage/innobase/buf/buf0buf.cc
|
@@ -3999,15 +3999,13 @@ void buf_refresh_io_stats()
|
All pages must be in a replaceable state (not modified or latched). */
|
void buf_pool_invalidate()
|
{
|
- mysql_mutex_lock(&buf_pool.mutex);
|
-
|
/* It is possible that a write batch that has been posted
|
earlier is still not complete. For buffer pool invalidation to
|
proceed we must ensure there is NO write activity happening. */
|
|
- ut_d(mysql_mutex_unlock(&buf_pool.mutex));
|
+ os_aio_wait_until_no_pending_writes(false);
|
ut_d(buf_pool.assert_all_freed());
|
- ut_d(mysql_mutex_lock(&buf_pool.mutex));
|
+ mysql_mutex_lock(&buf_pool.mutex);
|
|
while (UT_LIST_GET_LEN(buf_pool.LRU)) {
|
buf_LRU_scan_and_free_block(); |
It is possible that enabling the memory-mapped log writes on HDD storage (MDEV-34062) made it much easier to reproduce this race condition. The above fix should be applicable to 10.6 as well. In 10.5 the code is different; it had been refactored in MDEV-26827.
I was worried if this could cause a hang in a non-debug build, because buf_LRU_free_page(), which would be invoked in the while loop in the above patch snippet, is not guaranteed to release and re-acquire buf_pool.mutex. But, it looks like we are safe. The very last thing that buf_page_t::write_complete() does is releasing the page latch, and it was blocked exactly there in the above rr replay trace. It is called by the following:
if (UNIV_UNLIKELY(!persistent) && UNIV_LIKELY(!error)) |
{
|
/* We must hold buf_pool.mutex while releasing the block, so that |
no other thread can access it before we have freed it. */
|
mysql_mutex_lock(&buf_pool.mutex);
|
bpage->write_complete(persistent, error, state);
|
buf_LRU_free_page(bpage, true); |
mysql_mutex_unlock(&buf_pool.mutex);
|
}
|
else |
{
|
bpage->write_complete(persistent, error, state);
|
if (state < buf_page_t::WRITE_FIX_REINIT && |
request.node->space->use_doublewrite())
|
{
|
ut_ad(persistent);
|
buf_dblwr.write_completed();
|
}
|
}
|
Obviously, the problematic call must be the else branch, because the branch above that is holding buf_pool.mutex and therefore would block buf_pool_invalidate() from executing. The call to buf_dblwr is not accessing anything in buf_pool; it would basically just release a further doublewrite batch (which are impossible in this scenario).
The function buf_pool_invalidate() is also being invoked from buf_dblwr::create() during database creation, so the crash could theoretically also occur at that point. Normally, it would occur on recovery or mariadb-backup --prepare when multiple recovery batches are needed.
This bug seems to affect debug builds only. Nevertheless, I would add the call to os_aio_wait_until_no_pending_writes() in all builds.
Attachments
Issue Links
- is caused by
-
MDEV-26827 Make page flushing even faster
-
- Closed
-
- relates to
-
MDEV-34062 mariadb-backup --backup is extremely slow at copying ib_logfile0
-
- Closed
-
The following shows that there is no bug in buf_flush_wait(). Right before buf_page_t::write_complete() released the write fix on the block descriptor, it had cleared the buf_page_t::oldest_modification_ field to 1, indicating that the block is actually clean but residing in the buf_pool.flush_list because we want to avoid too frequent operations on buf_pool.flush_list_mutex:
buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:292
292 zip.fix.fetch_sub((state >= WRITE_FIX_REINIT)
(rr) rn
0x00005654f1f4b046 292 zip.fix.fetch_sub((state >= WRITE_FIX_REINIT)
(rr)
292 zip.fix.fetch_sub((state >= WRITE_FIX_REINIT)
(rr)
290 oldest_modification_.store(persistent, std::memory_order_release);
(rr) thread apply 1 bt
…
#14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f3090af0 <buf_pool+17520>, mp=0x5654f3090980 <buf_pool+17152>, file=0x5654f28719a0 "/data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc", line=2060) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
#15 0x00005654f1f4a8b2 in buf_flush_wait (lsn=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2060
#16 0x00005654f1f4aa68 in buf_flush_sync_batch (lsn=<optimized out>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2736
#17 0x00005654f20fd58f in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3996
…
#22 0x00005654f21037e0 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4678
#23 0x00005654f2270fb2 in srv_start (create_new_db=create_new_db@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/srv/srv0start.cc:1458
#24 0x00005654f16f9935 in innodb_init () at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:2627
#25 0x00005654f170b768 in xtrabackup_prepare_func (argv=argv@entry=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:6819
#26 0x00005654f17135c6 in main_low (argv=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7754
#27 0x00005654f1713819 in main (argc=10, argv=0x7ffdabaf0688) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7538
(rr) thread apply 3 bt
…
#14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f4b6d240, mp=0x5654f4b6d190, file=0x5654f28d3068 "/data/Server/10.11-MDEV-34062/tpool/tpool_structs.h", line=140) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
#15 0x00005654f213e542 in tpool::cache<tpool::aiocb>::get (this=0x5654f4b6d190) at /data/Server/10.11-MDEV-34062/tpool/tpool_structs.h:140
#16 io_slots::acquire (this=0x5654f4b6d190) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:93
#17 os_aio (type=..., buf=buf@entry=0x285575d48000, offset=offset@entry=20512768, n=n@entry=32768) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3655
#18 0x00005654f1fa6ca9 in fil_space_t::io (this=this@entry=0x5654f4b9c190, type=..., offset=20512768, len=32768, buf=buf@entry=0x285575d48000, bpage=bpage@entry=0x285575b28700) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2826
#19 0x00005654f1f4ce3e in buf_page_t::flush (this=this@entry=0x285575b28700, space=space@entry=0x5654f4b9c190) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:851
#20 0x00005654f1f4fa40 in buf_do_flush_list_batch (max_n=max_n@entry=2000, lsn=lsn@entry=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:1474
#21 0x00005654f1f5030f in buf_flush_list_holding_mutex (max_n=max_n@entry=2000, lsn=lsn@entry=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:1548
#22 0x00005654f1f52d5c in buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2618
(rr) break buf_pool_t::get_oldest_modification
Breakpoint 2 at 0x5654f1714538: file /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h, line 1393.
(rr) continue
…
Continuing.
[Switching to Thread 3423413.3423967]
Thread 4 hit Hardware watchpoint 1: -location $1.page.zip.fix.m._M_i
Old value = 2684354560
New value = 536870912
buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
295 lock.u_unlock(true);
(rr)
Continuing.
[Switching to Thread 3423413.3423425]
Thread 3 hit Breakpoint 2, buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393
1393 lsn_t get_oldest_modification(lsn_t empty_lsn)
(rr) bt
#0 buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393
#1 0x00005654f1f52775 in buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2548
…
(rr) finish
Run till exit from #0 buf_pool_t::get_oldest_modification (this=0x5654f308c680 <buf_pool>, empty_lsn=0) at /data/Server/10.11-MDEV-34062/storage/innobase/include/buf0buf.h:1393
buf_flush_page_cleaner () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2549
2549 if (!oldest_lsn)
Value returned is $3 = 0
(rr) n
2482 buf_flush_sync_lsn= 0;
(rr)
2484 buf_pool.page_cleaner_set_idle(true);
(rr)
2486 pthread_cond_broadcast(&buf_pool.done_flush_LRU);
(rr)
2487 pthread_cond_broadcast(&buf_pool.done_flush_list);
At this point, the buf_pool.flush_list was cleared of the ‘garbage’ (blocks that are actually clean) and the buf_flush_page_cleaner will mark itself as idle and wake up buf_flush_wait() in Thread 1, which was waiting for buf_pool.done_flush_list. A little later, the assertion failure in Thread 1 would trip, because the page latch had not yet been released. There should be no actual issue; this should be just an overzealous debug check.