[MDEV-34307] Recovery or mariadb-backup --prepare debug failure [FATAL] InnoDB: Page ... still fixed or dirty - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.6, 10.11, 11.4
Fix Version/s: 10.6.19, 10.11.9, 11.1.6, 11.2.5, 11.4.3
Component/s: Backup, Storage Engine - InnoDB
Labels:

Description

In ~~MDEV-34062~~, mleich reported an assertion failure in mariadb-backup --prepare:

10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c
2024-05-31 13:24:00 0 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=622] still fixed or dirty

Based on analyzing the rr replay trace, it initially looked rather harmless, because the data page had actually been written back, and we were waiting for that write-back, by waiting for the buf_pool.flush_list to be cleared and the checkpoint LSN to be advanced.

ssh pluto

rr replay /data/results/1717145612/MB-2134/1_clone/rr/mariadb-backup-1

10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c
Continuing.

Thread 4 hit Hardware watchpoint 1: -location block.page.zip.fix.m._M_i

Old value = 2684354560
New value = 536870912
buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
295 lock.u_unlock(true);
(rr) bt
#0 buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560)
at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:292
#1 0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
#2 0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
#3 0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
#4 0x00005654f239a7a0 in tpool::task_group::execute (this=0x5654f4b6d2f0, t=t@entry=0x5654f4b6d5e0) at /data/Server/10.11-MDEV-34062/tpool/task_group.cc:70
(rr) thread apply 1 bt
#14 0x00005654f2407d1b in safe_cond_wait (cond=0x5654f3090af0 <buf_pool+17520>, mp=0x5654f3090980 <buf_pool+17152>, file=0x5654f28719a0 "/data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc", line=2060) at /data/Server/10.11-MDEV-34062/mysys/thr_mutex.c:494
#15 0x00005654f1f4a8b2 in buf_flush_wait (lsn=38717368) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2060
#16 0x00005654f1f4aa68 in buf_flush_sync_batch (lsn=<optimized out>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:2736
#17 0x00005654f20fd58f in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3996
#18 0x00005654f2118199 in recv_sys_t::parse<recv_buf, true> (this=this@entry=0x5654f3098080 <recv_sys>, l=..., if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:2944
#19 0x00005654f211894e in recv_sys_t::parse_mtr<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3112
#20 0x00005654f211aea2 in recv_sys_t::parse_mmap<true> (if_exists=if_exists@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3123
#21 0x00005654f21029cc in recv_scan_log (last_phase=last_phase@entry=true) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4161
#22 0x00005654f21037e0 in recv_recovery_from_checkpoint_start () at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:4678
#23 0x00005654f2270fb2 in srv_start (create_new_db=create_new_db@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/srv/srv0start.cc:1458
#24 0x00005654f16f9935 in innodb_init () at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:2627
#25 0x00005654f170b768 in xtrabackup_prepare_func (argv=argv@entry=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:6819
#26 0x00005654f17135c6 in main_low (argv=0x5654f4a7ab98) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7754
#27 0x00005654f1713819 in main (argc=10, argv=0x7ffdabaf0688) at /data/Server/10.11-MDEV-34062/extra/mariabackup/xtrabackup.cc:7538

The state of the block was changed to UNFIXED, with no buffer-fix or io-fix. Thread 1 above would be the thread where the assertion fails a little later. This block is returned by buf_pool_t::chunk_t::not_freed() because the page latch had not been released:

10.11-MDEV-34062 3d2ec55fd92078b4023ec39af50fcd30aaf8667c
#0 buf_pool_t::chunk_t::not_freed (this=this@entry=0x5654f4b77810) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:1163
#1 0x00005654f1f21381 in buf_pool_t::assert_all_freed (this=this@entry=0x5654f308c680 <buf_pool>) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:3985
#2 0x00005654f1f21470 in buf_pool_invalidate () at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0buf.cc:4009
#3 0x00005654f20fd594 in recv_sys_t::apply (this=this@entry=0x5654f3098080 <recv_sys>, last_batch=last_batch@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/log/log0recv.cc:3997
(rr) thread apply 4 bt
Thread 4 (Thread 3423413.3423967 (mariadb-backup)):
#0 buf_page_t::write_complete (this=this@entry=0x285575b28a40, persistent=persistent@entry=true, error=error@entry=false, state=state@entry=2684354560) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:295
#1 0x00005654f1f4b590 in buf_page_write_complete (request=..., error=error@entry=false) at /data/Server/10.11-MDEV-34062/storage/innobase/buf/buf0flu.cc:352
#2 0x00005654f1fa7128 in IORequest::write_complete (this=this@entry=0x5654f4b6d600, io_error=0) at /data/Server/10.11-MDEV-34062/storage/innobase/fil/fil0fil.cc:2865
#3 0x00005654f213efa0 in write_io_callback (c=0x5654f4b6d588) at /data/Server/10.11-MDEV-34062/storage/innobase/os/os0file.cc:3267
(rr) p write_slots.m_cache.m_pos
$6 = 1

The following should fix this:

diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc

index 49f73105f99..2d997d4c750 100644

--- a/storage/innobase/buf/buf0buf.cc

+++ b/storage/innobase/buf/buf0buf.cc

@@ -3999,15 +3999,13 @@ void buf_refresh_io_stats()

 All pages must be in a replaceable state (not modified or latched). */

 void buf_pool_invalidate()

-	mysql_mutex_lock(&buf_pool.mutex);

 	/* It is possible that a write batch that has been posted

 	earlier is still not complete. For buffer pool invalidation to

 	proceed we must ensure there is NO write activity happening. */

-	ut_d(mysql_mutex_unlock(&buf_pool.mutex));

+	os_aio_wait_until_no_pending_writes(false);

 	ut_d(buf_pool.assert_all_freed());

-	ut_d(mysql_mutex_lock(&buf_pool.mutex));

+	mysql_mutex_lock(&buf_pool.mutex);

 	while (UT_LIST_GET_LEN(buf_pool.LRU)) {

 		buf_LRU_scan_and_free_block();

It is possible that enabling the memory-mapped log writes on HDD storage (~~MDEV-34062~~) made it much easier to reproduce this race condition. The above fix should be applicable to 10.6 as well. In 10.5 the code is different; it had been refactored in ~~MDEV-26827~~.

I was worried if this could cause a hang in a non-debug build, because buf_LRU_free_page(), which would be invoked in the while loop in the above patch snippet, is not guaranteed to release and re-acquire buf_pool.mutex. But, it looks like we are safe. The very last thing that buf_page_t::write_complete() does is releasing the page latch, and it was blocked exactly there in the above rr replay trace. It is called by the following:

  if (UNIV_UNLIKELY(!persistent) && UNIV_LIKELY(!error))

    /* We must hold buf_pool.mutex while releasing the block, so that

    no other thread can access it before we have freed it. */

    mysql_mutex_lock(&buf_pool.mutex);

    bpage->write_complete(persistent, error, state);

    buf_LRU_free_page(bpage, true);

    mysql_mutex_unlock(&buf_pool.mutex);

  else

    bpage->write_complete(persistent, error, state);

    if (state < buf_page_t::WRITE_FIX_REINIT &&

        request.node->space->use_doublewrite())

      ut_ad(persistent);

      buf_dblwr.write_completed();

Obviously, the problematic call must be the else branch, because the branch above that is holding buf_pool.mutex and therefore would block buf_pool_invalidate() from executing. The call to buf_dblwr is not accessing anything in buf_pool; it would basically just release a further doublewrite batch (which are impossible in this scenario).

The function buf_pool_invalidate() is also being invoked from buf_dblwr::create() during database creation, so the crash could theoretically also occur at that point. Normally, it would occur on recovery or mariadb-backup --prepare when multiple recovery batches are needed.

This bug seems to affect debug builds only. Nevertheless, I would add the call to os_aio_wait_until_no_pending_writes() in all builds.

Attachments

Issue Links

is caused by

MDEV-26827 Make page flushing even faster

Closed

relates to

MDEV-34062 mariadb-backup --backup is extremely slow at copying ib_logfile0

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2024-06-05 12:35

Updated:: 2024-06-06 07:57

Resolved:: 2024-06-06 07:57