Type:
Bug
Priority:
Blocker
Resolution:
Fixed
Affects Version/s:
10.5 , 10.6 , 10.8(EOL) , 10.9(EOL) , 10.10(EOL) , 10.11 , 11.0(EOL) , 11.1(EOL) , 11.2(EOL)
MDEV-26827 removed duplicated bookkeeping for io_slots::pending_io_count() . The merge of that to 10.8 introduced a debug assertion in mariadb-backup , which would fail rather often in a MemorySanitizer build.
For some reason, I did not manage to get a core dump for this problem, even after I removed calls to sigaction(2) . Finally, I was able to figure out the cause by applying the following patch:
diff --git a/extra/mariabackup/xtrabackup.cc b/extra/mariabackup/xtrabackup.cc
index a9891444eaf..3329be7a6df 100644
--- a/extra/mariabackup/xtrabackup.cc
+++ b/extra/mariabackup/xtrabackup.cc
@@ -2396,7 +2396,7 @@ static bool innodb_init()
buf_flush_sync();
recv_sys.debug_free();
ut_ad(!os_aio_pending_reads());
- ut_ad(!os_aio_pending_writes());
+ if (os_aio_pending_writes()) kill(0, SIGSTOP);
ut_d(mysql_mutex_lock(&buf_pool.flush_list_mutex));
ut_ad(!buf_pool.get_oldest_modification(0));
ut_d(mysql_mutex_unlock(&buf_pool.flush_list_mutex));
With this patch and the following invocation:
LD_LIBRARY_PATH= /home/marko/libmsan-15 MSAN_OPTIONS=abort_on_error=1 MSAN_SYMBOLIZER_PATH= /home/marko/bin/llvm-symbolizer-msan nice . /mtr --repeat=10 --parallel=90 mariabackup.alter_copy_race{,,,,,,,,,}{,,,,,,,,}
I got 4 hung mariabackup processes near the end of the run. I attached gdb to one of them. Sure enough, one of the threads was executing the kill() . We had write_slots.m_cache.m_pos=1 , corresponding to the following thread:
10.11 2ecc0443ec62b64cdbf9867e560ac3b40341a570
Thread 5 (Thread 0x7fc442bec6c0 (LWP 1295375) "mariabackup"):
#0 0x000055d34d0ee935 in __sanitizer::internal_mmap(void*, unsigned long, int, int, int, unsigned long long) ()
#1 0x000055d34d0f09f9 in __sanitizer::MmapFixedImpl(unsigned long, unsigned long, bool, char const*) ()
#2 0x000055d34d110144 in __sanitizer::SizeClassAllocator64<__msan::AP64>::PopulateFreeArray(__sanitizer::AllocatorStats*, unsigned long, __sanitizer::SizeClassAllocator64<__msan::AP64>::RegionInfo*, unsigned long) ()
#3 0x000055d34d10ff81 in __sanitizer::SizeClassAllocator64<__msan::AP64>::GetFromAllocator(__sanitizer::AllocatorStats*, unsigned long, unsigned int*, unsigned long) ()
#4 0x000055d34d10fb1a in __sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__msan::AP64> >::Refill(__sanitizer::SizeClassAllocator64LocalCache<__sanitizer::SizeClassAllocator64<__msan::AP64> >::PerClass*, __sanitizer::SizeClassAllocator64<__msan::AP64>*, unsigned long) ()
#5 0x000055d34d10f7f8 in __sanitizer::CombinedAllocator<__sanitizer::SizeClassAllocator64<__msan::AP64>, __sanitizer::LargeMmapAllocatorPtrArrayDynamic>::Allocate(__sanitizer::SizeClassAllocator64LocalCache<__s--Type <RET> for more, q to quit, c to continue without paging--c
4<__msan::AP64> >*, unsigned long, unsigned long) ()
#6 0x000055d34d10ebf9 in __msan::MsanAllocate(__sanitizer::StackTrace*, unsigned long, unsigned long, bool) ()
#7 0x000055d34d10eb5a in __msan::msan_malloc(unsigned long, __sanitizer::StackTrace*) ()
#8 0x000055d34d119023 in malloc ()
#9 0x000055d3505522b7 in DbugMalloc (size=568) at /mariadb/10.11/dbug/dbug.c:2074
#10 PushState (cs=0x704000008000) at /mariadb/10.11/dbug/dbug.c:1599
#11 0x000055d350555989 in _db_push_ (control=0x55d34c788b7d "") at /mariadb/10.11/dbug/dbug.c:867
#12 0x000055d3504e02ef in safe_mutex_lock (mp=0x55d3533a8fd8 <fil_system+40>, my_flags=0, file=0x55d34c784541 "/mariadb/10.11/storage/innobase/fil/fil0fil.cc", line=2682) at /mariadb/10.11/mysys/thr_mutex.c:236
#13 0x000055d34f76acc6 in inline_mysql_mutex_lock (src_line=2682, that=<optimized out>, src_file=<optimized out>) at /mariadb/10.11/include/mysql/psi/mysql_thread.h:750
#14 fil_node_t::complete_write (this=0x707000000328) at /mariadb/10.11/storage/innobase/fil/fil0fil.cc:2682
#15 0x000055d34f75917f in IORequest::write_complete (this=<optimized out>, io_error=<optimized out>) at /mariadb/10.11/storage/innobase/fil/fil0fil.cc:2819
#16 0x000055d34fcfb791 in write_io_callback (c=0x7fc44d450000) at /mariadb/10.11/storage/innobase/os/os0file.cc:3465
#17 0x000055d350267d46 in tpool::task_group::execute (this=0x7130000002e0, t=0x7fc44d450048) at /mariadb/10.11/tpool/task_group.cc:70
#18 0x000055d3502580ef in tpool::thread_pool_generic::worker_main (this=0x718000000000, thread_var=0x730000010080) at /mariadb/10.11/tpool/tpool_generic.cc:583
This thread had already released the page latch, which allowed the buf_flush_page_cleaner() to finish. It had not yet invoked the following:
node->complete_write();
node->space->release();
This could mean that during a normal operation, a log checkpoint would fail to invoke fdatasync() or fsync() to make durable the last writes on a file. The two calls should probably be made part of buf_page_write_complete() right before the page latch release.
The assertion expression would fail until write_slots->release(cb) was called in write_io_callback() . Perhaps we should replace the debug assertions with some wait loops.
relates to
MDEV-26827
Make page flushing even faster
Closed
{"report":{"fcp":1203.6000000238419,"ttfb":470.5,"pageVisibility":"visible","entityId":125846,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"9fdffbb6-0e05-4906-a828-e92359758d4f","navigationType":0,"readyForUser":1296.2000000476837,"redirectCount":0,"resourceLoadedEnd":1416.1000000238419,"resourceLoadedStart":477.2000000476837,"resourceTiming":[{"duration":136.5,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":477.2000000476837,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":477.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":613.7000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":136.80000007152557,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":477.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":477.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":614.3000000715256,"responseStart":0,"secureConnectionStart":0},{"duration":193.60000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":477.7000000476837,"connectEnd":477.7000000476837,"connectStart":477.7000000476837,"domainLookupEnd":477.7000000476837,"domainLookupStart":477.7000000476837,"fetchStart":477.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":477.7000000476837,"responseEnd":671.3000000715256,"responseStart":671.3000000715256,"secureConnectionStart":477.7000000476837},{"duration":296.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":477.89999997615814,"connectEnd":477.89999997615814,"connectStart":477.89999997615814,"domainLookupEnd":477.89999997615814,"domainLookupStart":477.89999997615814,"fetchStart":477.89999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":477.89999997615814,"responseEnd":774.3000000715256,"responseStart":774.3000000715256,"secureConnectionStart":477.89999997615814},{"duration":300.2000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":478.10000002384186,"connectEnd":478.10000002384186,"connectStart":478.10000002384186,"domainLookupEnd":478.10000002384186,"domainLookupStart":478.10000002384186,"fetchStart":478.10000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":478.10000002384186,"responseEnd":778.3000000715256,"responseStart":778.2000000476837,"secureConnectionStart":478.10000002384186},{"duration":300.8000000715256,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":478.39999997615814,"connectEnd":478.39999997615814,"connectStart":478.39999997615814,"domainLookupEnd":478.39999997615814,"domainLookupStart":478.39999997615814,"fetchStart":478.39999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":478.39999997615814,"responseEnd":779.2000000476837,"responseStart":779.2000000476837,"secureConnectionStart":478.39999997615814},{"duration":301.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":478.60000002384186,"connectEnd":478.60000002384186,"connectStart":478.60000002384186,"domainLookupEnd":478.60000002384186,"domainLookupStart":478.60000002384186,"fetchStart":478.60000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":478.60000002384186,"responseEnd":780.1000000238419,"responseStart":780,"secureConnectionStart":478.60000002384186},{"duration":364.39999997615814,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":478.7000000476837,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":478.7000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":843.1000000238419,"responseStart":0,"secureConnectionStart":0},{"duration":301.60000002384186,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":478.89999997615814,"connectEnd":478.89999997615814,"connectStart":478.89999997615814,"domainLookupEnd":478.89999997615814,"domainLookupStart":478.89999997615814,"fetchStart":478.89999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":478.89999997615814,"responseEnd":780.5,"responseStart":780.5,"secureConnectionStart":478.89999997615814},{"duration":364.10000002384186,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":479.10000002384186,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":479.10000002384186,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":843.2000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":301.89999997615814,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":479.3000000715256,"connectEnd":479.3000000715256,"connectStart":479.3000000715256,"domainLookupEnd":479.3000000715256,"domainLookupStart":479.3000000715256,"fetchStart":479.3000000715256,"redirectEnd":0,"redirectStart":0,"requestStart":479.3000000715256,"responseEnd":781.2000000476837,"responseStart":781.2000000476837,"secureConnectionStart":479.3000000715256},{"duration":675.1999999284744,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":480.3000000715256,"connectEnd":480.3000000715256,"connectStart":480.3000000715256,"domainLookupEnd":480.3000000715256,"domainLookupStart":480.3000000715256,"fetchStart":480.3000000715256,"redirectEnd":0,"redirectStart":0,"requestStart":480.3000000715256,"responseEnd":1155.5,"responseStart":1155.5,"secureConnectionStart":480.3000000715256},{"duration":935.7000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":480.39999997615814,"connectEnd":480.39999997615814,"connectStart":480.39999997615814,"domainLookupEnd":480.39999997615814,"domainLookupStart":480.39999997615814,"fetchStart":480.39999997615814,"redirectEnd":0,"redirectStart":0,"requestStart":480.39999997615814,"responseEnd":1416.1000000238419,"responseStart":1416.1000000238419,"secureConnectionStart":480.39999997615814},{"duration":285.8000000715256,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":877.5,"connectEnd":877.5,"connectStart":877.5,"domainLookupEnd":877.5,"domainLookupStart":877.5,"fetchStart":877.5,"redirectEnd":0,"redirectStart":0,"requestStart":877.5,"responseEnd":1163.3000000715256,"responseStart":1163.3000000715256,"secureConnectionStart":877.5}],"fetchStart":1,"domainLookupStart":1,"domainLookupEnd":1,"connectStart":1,"connectEnd":1,"requestStart":305,"responseStart":471,"responseEnd":474,"domLoading":475,"domInteractive":1447,"domContentLoadedEventStart":1447,"domContentLoadedEventEnd":1499,"domComplete":3074,"loadEventStart":3074,"loadEventEnd":3076,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1418.7000000476837},{"name":"bigPipe.sidebar-id.end","time":1419.6000000238419},{"name":"bigPipe.activity-panel-pipe-id.start","time":1419.7000000476837},{"name":"bigPipe.activity-panel-pipe-id.end","time":1422.2000000476837},{"name":"activityTabFullyLoaded","time":1509.1000000238419}],"measures":[],"correlationId":"eda06cee2ade00","effectiveType":"4g","downlink":9.4,"rtt":0,"serverDuration":100,"dbReadsTimeInMs":12,"dbConnsTimeInMs":21,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}