[MDEV-32511] Race condition between page write completion and log checkpoint - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.5, 10.6, 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Fix Version/s: 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3, 11.2.2
Component/s: Backup, Storage Engine - InnoDB
Labels:
- MSAN
- corruption
- race

Description

~~MDEV-26827~~ removed duplicated bookkeeping for io_slots::pending_io_count(). The merge of that to 10.8 introduced a debug assertion in mariadb-backup, which would fail rather often in a MemorySanitizer build.

For some reason, I did not manage to get a core dump for this problem, even after I removed calls to sigaction(2). Finally, I was able to figure out the cause by applying the following patch:

diff --git a/extra/mariabackup/xtrabackup.cc b/extra/mariabackup/xtrabackup.cc

index a9891444eaf..3329be7a6df 100644

--- a/extra/mariabackup/xtrabackup.cc

+++ b/extra/mariabackup/xtrabackup.cc

@@ -2396,7 +2396,7 @@ static bool innodb_init()

   buf_flush_sync();

   recv_sys.debug_free();

   ut_ad(!os_aio_pending_reads());

-  ut_ad(!os_aio_pending_writes());

+  if (os_aio_pending_writes()) kill(0, SIGSTOP);

   ut_d(mysql_mutex_lock(&buf_pool.flush_list_mutex));

   ut_ad(!buf_pool.get_oldest_modification(0));

   ut_d(mysql_mutex_unlock(&buf_pool.flush_list_mutex));

With this patch and the following invocation:

LD_LIBRARY_PATH=/home/marko/libmsan-15 MSAN_OPTIONS=abort_on_error=1 MSAN_SYMBOLIZER_PATH=/home/marko/bin/llvm-symbolizer-msan nice ./mtr --repeat=10 --parallel=90 mariabackup.alter_copy_race{,,,,,,,,,}{,,,,,,,,}

I got 4 hung mariabackup processes near the end of the run. I attached gdb to one of them. Sure enough, one of the threads was executing the kill(). We had write_slots.m_cache.m_pos=1, corresponding to the following thread:

10.11 2ecc0443ec62b64cdbf9867e560ac3b40341a570
Thread 5 (Thread 0x7fc442bec6c0 (LWP 1295375) "mariabackup"):
#0 0x000055d34d0ee935 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) ()
#1 0x000055d34d0f09f9 in __sanitizer::MmapFixedImpl(unsigned long, unsigned long, bool, char const*) ()
#2 0x000055d34d110144 in __sanitizer::SizeClassAllocator64<__msan::AP64>::PopulateFreeArray(__sanitizer::AllocatorStats, unsigned long, __sanitizer::SizeClassAllocator64<__msan::AP64>::RegionInfo, unsigned long) ()
#3 0x000055d34d10ff81 in __sanitizer::SizeClassAllocator64<__msan::AP64>::GetFromAllocator(__sanitizer::AllocatorStats, unsigned long, unsigned int, unsigned long) ()
#4 0x000055d34d10fb1a in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__msan::AP64> >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__msan::AP64> >::PerClass, __sanitizer::SizeClassAllocator64<__msan::AP64>, unsigned long) ()
#5 0x000055d34d10f7f8 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__msan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__s--Type <RET> for more, q to quit, c to continue without paging--c
4<__msan::AP64> >*, unsigned long, unsigned long) ()
#6 0x000055d34d10ebf9 in __msan::MsanAllocate(__sanitizer::StackTrace*, unsigned long, unsigned long, bool) ()
#7 0x000055d34d10eb5a in __msan::msan_malloc(unsigned long, __sanitizer::StackTrace*) ()
#8 0x000055d34d119023 in malloc ()
#9 0x000055d3505522b7 in DbugMalloc (size=568) at /mariadb/10.11/dbug/dbug.c:2074
#10 PushState (cs=0x704000008000) at /mariadb/10.11/dbug/dbug.c:1599
#11 0x000055d350555989 in _db_push_ (control=0x55d34c788b7d "") at /mariadb/10.11/dbug/dbug.c:867
#12 0x000055d3504e02ef in safe_mutex_lock (mp=0x55d3533a8fd8 <fil_system+40>, my_flags=0, file=0x55d34c784541 "/mariadb/10.11/storage/innobase/fil/fil0fil.cc", line=2682) at /mariadb/10.11/mysys/thr_mutex.c:236
#13 0x000055d34f76acc6 in inline_mysql_mutex_lock (src_line=2682, that=<optimized out>, src_file=<optimized out>) at /mariadb/10.11/include/mysql/psi/mysql_thread.h:750
#14 fil_node_t::complete_write (this=0x707000000328) at /mariadb/10.11/storage/innobase/fil/fil0fil.cc:2682
#15 0x000055d34f75917f in IORequest::write_complete (this=<optimized out>, io_error=<optimized out>) at /mariadb/10.11/storage/innobase/fil/fil0fil.cc:2819
#16 0x000055d34fcfb791 in write_io_callback (c=0x7fc44d450000) at /mariadb/10.11/storage/innobase/os/os0file.cc:3465
#17 0x000055d350267d46 in tpool::task_group::execute (this=0x7130000002e0, t=0x7fc44d450048) at /mariadb/10.11/tpool/task_group.cc:70
#18 0x000055d3502580ef in tpool::thread_pool_generic::worker_main (this=0x718000000000, thread_var=0x730000010080) at /mariadb/10.11/tpool/tpool_generic.cc:583

This thread had already released the page latch, which allowed the buf_flush_page_cleaner() to finish. It had not yet invoked the following:

  node->complete_write();

  node->space->release();

This could mean that during a normal operation, a log checkpoint would fail to invoke fdatasync() or fsync() to make durable the last writes on a file. The two calls should probably be made part of buf_page_write_complete() right before the page latch release.

The assertion expression would fail until write_slots->release(cb) was called in write_io_callback(). Perhaps we should replace the debug assertions with some wait loops.

Attachments

Issue Links

relates to

MDEV-26827 Make page flushing even faster

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2023-10-18 12:46

Updated:: 2024-08-07 13:26

Resolved:: 2023-10-18 15:34

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server