Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32511

Race condition between page write completion and log checkpoint

    XMLWordPrintable

Details

    Description

      MDEV-26827 removed duplicated bookkeeping for io_slots::pending_io_count(). The merge of that to 10.8 introduced a debug assertion in mariadb-backup, which would fail rather often in a MemorySanitizer build.

      For some reason, I did not manage to get a core dump for this problem, even after I removed calls to sigaction(2). Finally, I was able to figure out the cause by applying the following patch:

      diff --git a/extra/mariabackup/xtrabackup.cc b/extra/mariabackup/xtrabackup.cc
      index a9891444eaf..3329be7a6df 100644
      --- a/extra/mariabackup/xtrabackup.cc
      +++ b/extra/mariabackup/xtrabackup.cc
      @@ -2396,7 +2396,7 @@ static bool innodb_init()
         buf_flush_sync();
         recv_sys.debug_free();
         ut_ad(!os_aio_pending_reads());
      -  ut_ad(!os_aio_pending_writes());
      +  if (os_aio_pending_writes()) kill(0, SIGSTOP);
         ut_d(mysql_mutex_lock(&buf_pool.flush_list_mutex));
         ut_ad(!buf_pool.get_oldest_modification(0));
         ut_d(mysql_mutex_unlock(&buf_pool.flush_list_mutex));
      

      With this patch and the following invocation:

      LD_LIBRARY_PATH=/home/marko/libmsan-15 MSAN_OPTIONS=abort_on_error=1 MSAN_SYMBOLIZER_PATH=/home/marko/bin/llvm-symbolizer-msan nice ./mtr --repeat=10 --parallel=90 mariabackup.alter_copy_race{,,,,,,,,,}{,,,,,,,,}
      

      I got 4 hung mariabackup processes near the end of the run. I attached gdb to one of them. Sure enough, one of the threads was executing the kill(). We had write_slots.m_cache.m_pos=1, corresponding to the following thread:

      10.11 2ecc0443ec62b64cdbf9867e560ac3b40341a570

      Thread 5 (Thread 0x7fc442bec6c0 (LWP 1295375) "mariabackup"):
      #0  0x000055d34d0ee935 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) ()
      #1  0x000055d34d0f09f9 in __sanitizer::MmapFixedImpl(unsigned long, unsigned long, bool, char const*) ()
      #2  0x000055d34d110144 in __sanitizer::SizeClassAllocator64<__msan::AP64>::PopulateFreeArray(__sanitizer::AllocatorStats*, unsigned long, __sanitizer::SizeClassAllocator64<__msan::AP64>::RegionInfo*, unsigned long) ()
      #3  0x000055d34d10ff81 in __sanitizer::SizeClassAllocator64<__msan::AP64>::GetFromAllocator(__sanitizer::AllocatorStats*, unsigned long, unsigned int*, unsigned long) ()
      #4  0x000055d34d10fb1a in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__msan::AP64> >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__msan::AP64> >::PerClass*, __sanitizer::SizeClassAllocator64<__msan::AP64>*, unsigned long) ()
      #5  0x000055d34d10f7f8 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__msan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__s--Type <RET> for more, q to quit, c to continue without paging--c
      4<__msan::AP64> >*, unsigned long, unsigned long) ()
      #6  0x000055d34d10ebf9 in __msan::MsanAllocate(__sanitizer::StackTrace*, unsigned long, unsigned long, bool) ()
      #7  0x000055d34d10eb5a in __msan::msan_malloc(unsigned long, __sanitizer::StackTrace*) ()
      #8  0x000055d34d119023 in malloc ()
      #9  0x000055d3505522b7 in DbugMalloc (size=568) at /mariadb/10.11/dbug/dbug.c:2074
      #10 PushState (cs=0x704000008000) at /mariadb/10.11/dbug/dbug.c:1599
      #11 0x000055d350555989 in _db_push_ (control=0x55d34c788b7d "") at /mariadb/10.11/dbug/dbug.c:867
      #12 0x000055d3504e02ef in safe_mutex_lock (mp=0x55d3533a8fd8 <fil_system+40>, my_flags=0, file=0x55d34c784541 "/mariadb/10.11/storage/innobase/fil/fil0fil.cc", line=2682) at /mariadb/10.11/mysys/thr_mutex.c:236
      #13 0x000055d34f76acc6 in inline_mysql_mutex_lock (src_line=2682, that=<optimized out>, src_file=<optimized out>) at /mariadb/10.11/include/mysql/psi/mysql_thread.h:750
      #14 fil_node_t::complete_write (this=0x707000000328) at /mariadb/10.11/storage/innobase/fil/fil0fil.cc:2682
      #15 0x000055d34f75917f in IORequest::write_complete (this=<optimized out>, io_error=<optimized out>) at /mariadb/10.11/storage/innobase/fil/fil0fil.cc:2819
      #16 0x000055d34fcfb791 in write_io_callback (c=0x7fc44d450000) at /mariadb/10.11/storage/innobase/os/os0file.cc:3465
      #17 0x000055d350267d46 in tpool::task_group::execute (this=0x7130000002e0, t=0x7fc44d450048) at /mariadb/10.11/tpool/task_group.cc:70
      #18 0x000055d3502580ef in tpool::thread_pool_generic::worker_main (this=0x718000000000, thread_var=0x730000010080) at /mariadb/10.11/tpool/tpool_generic.cc:583
      

      This thread had already released the page latch, which allowed the buf_flush_page_cleaner() to finish. It had not yet invoked the following:

        node->complete_write();
        node->space->release();
      

      This could mean that during a normal operation, a log checkpoint would fail to invoke fdatasync() or fsync() to make durable the last writes on a file. The two calls should probably be made part of buf_page_write_complete() right before the page latch release.

      The assertion expression would fail until write_slots->release(cb) was called in write_io_callback(). Perhaps we should replace the debug assertions with some wait loops.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.