Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6, 10.11, 11.4
Description
In MDEV-34062, mleich reported an assertion failure in mariadb-backup --prepare:
10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c |
2024-05-31 13:24:00 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=622] still fixed or dirty
|
Based on analyzing the rr replay trace, it initially looked rather harmless, because the data page had actually been written back, and we were waiting for that write-back, by waiting for the buf_pool.flush_list to be cleared and the checkpoint LSN to be advanced.
ssh pluto
|
rr replay /data/results/1717145612/MB-2134/1_clone/rr/mariadb-backup-1
|
10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c |
Continuing.
|
|
Thread 4 hit Hardware watchpoint 1: -location block.page.zip.fix.m._M_i
|
|
Old value = 2684354560
|
New value = 536870912
|
buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
|
295 lock.u_unlock(true);
|
(rr) bt
|
#0 buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560)
|
at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:292
|
#1 0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
|
#2 0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
|
#3 0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
|
#4 0x00005654f239a7a0 in tpool::task_group::execute (this=0x5654f4b6d2f0, t=t@entry=0x5654f4b6d5e0) at /data/Server/10.11-MDEV-34062/tpool/task_group.cc:70
|
(rr) thread apply 1 bt
|
#14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f3090af0 <buf_pool+17520>, mp=0x5654f3090980 <buf_pool+17152>, file=0x5654f28719a0 "/data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc", line=2060) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
|
#15 0x00005654f1f4a8b2 in buf_flush_wait (lsn=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2060
|
#16 0x00005654f1f4aa68 in buf_flush_sync_batch (lsn=<optimized out>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2736
|
#17 0x00005654f20fd58f in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3996
|
#18 0x00005654f2118199 in recv_sys_t::parse<recv_buf, true> (this=this@entry=0x5654f3098080 <recv_sys>, l=..., if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:2944
|
#19 0x00005654f211894e in recv_sys_t::parse_mtr<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3112
|
#20 0x00005654f211aea2 in recv_sys_t::parse_mmap<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3123
|
#21 0x00005654f21029cc in recv_scan_log (last_phase=last_phase@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4161
|
#22 0x00005654f21037e0 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4678
|
#23 0x00005654f2270fb2 in srv_start (create_new_db=create_new_db@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/srv/srv0start.cc:1458
|
#24 0x00005654f16f9935 in innodb_init () at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:2627
|
#25 0x00005654f170b768 in xtrabackup_prepare_func (argv=argv@entry=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:6819
|
#26 0x00005654f17135c6 in main_low (argv=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7754
|
#27 0x00005654f1713819 in main (argc=10, argv=0x7ffdabaf0688) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7538
|
The state of the block was changed to UNFIXED, with no buffer-fix or io-fix. Thread 1 above would be the thread where the assertion fails a little later. This block is returned by buf_pool_t::chunk_t::not_freed() because the page latch had not been released:
10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c |
#0 buf_pool_t::chunk_t::not_freed (this=this@entry=0x5654f4b77810) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:1163
|
#1 0x00005654f1f21381 in buf_pool_t::assert_all_freed (this=this@entry=0x5654f308c680 <buf_pool>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:3985
|
#2 0x00005654f1f21470 in buf_pool_invalidate () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:4009
|
#3 0x00005654f20fd594 in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3997
|
(rr) thread apply 4 bt
|
Thread 4 (Thread 3423413.3423967 (mariadb-backup)):
|
#0 buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
|
#1 0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
|
#2 0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
|
#3 0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
|
(rr) p write_slots.m_cache.m_pos
|
$6 = 1
|
The following should fix this:
diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
|
index 49f73105f99..2d997d4c750 100644
|
--- a/storage/innobase/buf/buf0buf.cc
|
+++ b/storage/innobase/buf/buf0buf.cc
|
@@ -3999,15 +3999,13 @@ void buf_refresh_io_stats()
|
All pages must be in a replaceable state (not modified or latched). */
|
void buf_pool_invalidate()
|
{
|
- mysql_mutex_lock(&buf_pool.mutex);
|
-
|
/* It is possible that a write batch that has been posted
|
earlier is still not complete. For buffer pool invalidation to
|
proceed we must ensure there is NO write activity happening. */
|
|
- ut_d(mysql_mutex_unlock(&buf_pool.mutex));
|
+ os_aio_wait_until_no_pending_writes(false);
|
ut_d(buf_pool.assert_all_freed());
|
- ut_d(mysql_mutex_lock(&buf_pool.mutex));
|
+ mysql_mutex_lock(&buf_pool.mutex);
|
|
while (UT_LIST_GET_LEN(buf_pool.LRU)) {
|
buf_LRU_scan_and_free_block(); |
It is possible that enabling the memory-mapped log writes on HDD storage (MDEV-34062) made it much easier to reproduce this race condition. The above fix should be applicable to 10.6 as well. In 10.5 the code is different; it had been refactored in MDEV-26827.
I was worried if this could cause a hang in a non-debug build, because buf_LRU_free_page(), which would be invoked in the while loop in the above patch snippet, is not guaranteed to release and re-acquire buf_pool.mutex. But, it looks like we are safe. The very last thing that buf_page_t::write_complete() does is releasing the page latch, and it was blocked exactly there in the above rr replay trace. It is called by the following:
if (UNIV_UNLIKELY(!persistent) && UNIV_LIKELY(!error)) |
{
|
/* We must hold buf_pool.mutex while releasing the block, so that |
no other thread can access it before we have freed it. */
|
mysql_mutex_lock(&buf_pool.mutex);
|
bpage->write_complete(persistent, error, state);
|
buf_LRU_free_page(bpage, true); |
mysql_mutex_unlock(&buf_pool.mutex);
|
}
|
else |
{
|
bpage->write_complete(persistent, error, state);
|
if (state < buf_page_t::WRITE_FIX_REINIT && |
request.node->space->use_doublewrite())
|
{
|
ut_ad(persistent);
|
buf_dblwr.write_completed();
|
}
|
}
|
Obviously, the problematic call must be the else branch, because the branch above that is holding buf_pool.mutex and therefore would block buf_pool_invalidate() from executing. The call to buf_dblwr is not accessing anything in buf_pool; it would basically just release a further doublewrite batch (which are impossible in this scenario).
The function buf_pool_invalidate() is also being invoked from buf_dblwr::create() during database creation, so the crash could theoretically also occur at that point. Normally, it would occur on recovery or mariadb-backup --prepare when multiple recovery batches are needed.
This bug seems to affect debug builds only. Nevertheless, I would add the call to os_aio_wait_until_no_pending_writes() in all builds.
Attachments
Issue Links
- is caused by
-
MDEV-26827 Make page flushing even faster
- Closed
- relates to
-
MDEV-34062 mariadb-backup --backup is extremely slow at copying ib_logfile0
- Closed