2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/tt14.ibd' with space ID 100. Another data file called ./test/tt9.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-afc9-23#P#p0.ibd' with space ID 105. Another data file called ./test/table1000_innodb_varchar_378_not_null_key_pk_parts_2#P#p0.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-afc9-23#P#p1.ibd' with space ID 106. Another data file called ./test/table1000_innodb_varchar_378_not_null_key_pk_parts_2#P#p1.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/tt14.ibd' with space ID 100. Another data file called ./test/tt9.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-afc9-23#P#p0.ibd' with space ID 105. Another data file called ./test/table1000_innodb_varchar_378_not_null_key_pk_parts_2#P#p0.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-afc9-23#P#p1.ibd' with space ID 106. Another data file called ./test/table1000_innodb_varchar_378_not_null_key_pk_parts_2#P#p1.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/tt14.ibd' with space ID 100. Another data file called ./test/tt9.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-afc9-23#P#p0.ibd' with space ID 105. Another data file called ./test/table1000_innodb_varchar_378_not_null_key_pk_parts_2#P#p0.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-afc9-23#P#p1.ibd' with space ID 106. Another data file called ./test/table1000_innodb_varchar_378_not_null_key_pk_parts_2#P#p1.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Ignoring data file './test/#sql-alter-afc9-23#P#p0.ibd' with space ID 105. Another data file called ./test/table1000_innodb_varchar_378_not_null_key_pk_parts_2#P#p0.ibd exists with the same space ID.
2021-05-11 18:19:17 0 [Note] InnoDB: Starting a batch to recover 4665 pages from redo log.
2021-05-11 18:19:18 0 [ERROR] InnoDB: Cannot apply log to [page id: space=110, page number=1] of corrupted file './test/t15.ibd'
double free or corruption (fasttop)
#1 0x00001e360fcdc921 in __GI_abort () at abort.c:79
#2 0x00001e360fd25967 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x1e360fe52b0d "%s\n") at ../sysdeps/posix/libc_fatal.c:181
#3 0x00001e360fd2c9da in malloc_printerr (str=str@entry=0x1e360fe547d0 "double free or corruption (fasttop)") at malloc.c:5342
#4 0x00001e360fd340f4 in _int_free (have_lock=0, p=0x560ad9a5da90, av=0x1e3610087c40 <main_arena>) at malloc.c:4260
#5 __GI___libc_free (mem=0x560ad9a5daa0) at malloc.c:3134
#13 0x0000560ad5ffb06b in recv_sys_t::apply (this=0x560ad6ffc8c0 <recv_sys>, last_batch=false) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/log/log0recv.cc:2980
#14 0x0000560ad5ffd00c in recv_group_scan_log_recs (checkpoint_lsn=544370707, contiguous_lsn=0x7ffcfba516d8, last_phase=true) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/log/log0recv.cc:3485
#15 0x0000560ad5fff138 in recv_recovery_from_checkpoint_start (flush_lsn=54297) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/log/log0recv.cc:3939
#16 0x0000560ad6148c9c in srv_start (create_new_db=false) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/srv/srv0start.cc:1426
#17 0x0000560ad5f23b71 in innodb_init (p=0x560ad9930478) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/handler/ha_innodb.cc:3824
#18 0x0000560ad5b7d394 in ha_initialize_handlerton (plugin=0x560ad990df90) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/sql/handler.cc:660
#19 0x0000560ad5844d8b in plugin_initialize (tmp_root=0x7ffcfba523a0, plugin=0x560ad990df90, argc=0x560ad71c6e88 <remaining_argc>, argv=0x560ad98c8988, options_only=false)
at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/sql/sql_plugin.cc:1463
#20 0x0000560ad5845ac9 in plugin_init (argc=0x560ad71c6e88 <remaining_argc>, argv=0x560ad98c8988, flags=0) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/sql/sql_plugin.cc:1756
#21 0x0000560ad56f5c5b in init_server_components () at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/sql/mysqld.cc:5041
#22 0x0000560ad56f6c13 in mysqld_main (argc=31, argv=0x560ad98c8988) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/sql/mysqld.cc:5649
#23 0x0000560ad56ed84a in main (argc=30, argv=0x7ffcfba54708) at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/sql/main.cc:34
The test was running a mix of DDL, locking statements and XA statements in 8 threads when it was sigkilled. The above happened upon crash recovery.
The logs, data directories and rr profiles are available.
This was a one-time occurrence so far in tests on the above-mentioned revision.
Attachments
Issue Links
relates to
MDEV-24626Remove synchronous write of page0 and flushing file during file creation
The root cause is the same as in MDEV-25656: I did not realize that it->flags are not always valid. Hence, recv_sys_t::recover_deferred() must not insist that flags == it->second.flags hold.
Marko Mäkelä
added a comment - The root cause is the same as in MDEV-25656 : I did not realize that it->flags are not always valid. Hence, recv_sys_t::recover_deferred() must not insist that flags == it->second.flags hold.
at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/log/log0recv.cc:750
#10 0x0000560ad5ffafef in recv_sys_t::apply (this=0x560ad6ffc8c0 <recv_sys>, last_batch=false)
at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/log/log0recv.cc:2972
When I updated MDEV-24626 yesterday, I overlooked the fact that recv_sys_t::recover_low() will advance the iterator and free the records. There would also have been a heap-use-after-free at the end of recv_sys_t::recover_deferred(), for dereferencing the potentially freed iterator p.
I think that we must pass the iterator to recv_sys_t::recover_deferred(), so that recv_sys_t::apply() will avoid the double-freeing when discarding all buffered records for that tablespace.
However, the root cause of the failure needs to be addressed as well: Why did we fail to recover in the first place? As far as I can tell, the doublewrite buffer was enabled before the server was intentionally killed and the recovery failed.
Marko Mäkelä
added a comment - The first freeing occurs here:
#8 0x0000560ad60090ef in recv_sys_t::recover_low (this=0x560ad6ffc8c0 <recv_sys>, page_id=..., p=
{first = {m_id = 472446402561}, second = {state = page_recv_t::RECV_WILL_NOT_READ, last_offset = 1, log = {head = 0x3b0b23f474e0, tail = 0x3b0b23f47500}}}, mtr=..., b=0x3b0b23867b30)
at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/log/log0recv.cc:2862
#9 0x0000560ad5ff2271 in recv_sys_t::recover_deferred (this=0x560ad6ffc8c0 <recv_sys>, space_id=110,
name="./test/t15.ibd", free_block=@0x7ffcfba50df0: 0x3b0b23867b30)
at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/log/log0recv.cc:750
#10 0x0000560ad5ffafef in recv_sys_t::apply (this=0x560ad6ffc8c0 <recv_sys>, last_batch=false)
at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/log/log0recv.cc:2972
When I updated MDEV-24626 yesterday, I overlooked the fact that recv_sys_t::recover_low() will advance the iterator and free the records. There would also have been a heap-use-after-free at the end of recv_sys_t::recover_deferred() , for dereferencing the potentially freed iterator p .
I think that we must pass the iterator to recv_sys_t::recover_deferred() , so that recv_sys_t::apply() will avoid the double-freeing when discarding all buffered records for that tablespace.
However, the root cause of the failure needs to be addressed as well: Why did we fail to recover in the first place? As far as I can tell, the doublewrite buffer was enabled before the server was intentionally killed and the recovery failed.
I will try to diagnose the memory corruption first, using rr replay on the non-ASAN build that reported it.
Marko Mäkelä
added a comment - A local ASAN build repeats the recovery failure but does not trigger any ASAN message:
ASAN_OPTIONS=abort_on_error=1 ../sql/mariadbd --innodb-log-file-size=96m --datadir /dev/shm/data_before_recovery
I will try to diagnose the memory corruption first, using rr replay on the non-ASAN build that reported it.
People
Marko Mäkelä
Elena Stepanova
Votes:
0Vote for this issue
Watchers:
2Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
{"report":{"fcp":1152.5,"ttfb":313.40000009536743,"pageVisibility":"visible","entityId":99571,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"c58a1d31-5d13-4c3f-858b-bafd645613f6","navigationType":0,"readyForUser":1237.5,"redirectCount":0,"resourceLoadedEnd":1433.8000001907349,"resourceLoadedStart":320.1000003814697,"resourceTiming":[{"duration":329.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":320.1000003814697,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":320.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":649.2000002861023,"responseStart":0,"secureConnectionStart":0},{"duration":329.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":320.40000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":320.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":649.5,"responseStart":0,"secureConnectionStart":0},{"duration":338.3999996185303,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":320.6000003814697,"connectEnd":320.6000003814697,"connectStart":320.6000003814697,"domainLookupEnd":320.6000003814697,"domainLookupStart":320.6000003814697,"fetchStart":320.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":320.6000003814697,"responseEnd":659,"responseStart":659,"secureConnectionStart":320.6000003814697},{"duration":372.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":320.80000019073486,"connectEnd":320.80000019073486,"connectStart":320.80000019073486,"domainLookupEnd":320.80000019073486,"domainLookupStart":320.80000019073486,"fetchStart":320.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":320.80000019073486,"responseEnd":693.2000002861023,"responseStart":693.2000002861023,"secureConnectionStart":320.80000019073486},{"duration":376.2000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":321,"connectEnd":321,"connectStart":321,"domainLookupEnd":321,"domainLookupStart":321,"fetchStart":321,"redirectEnd":0,"redirectStart":0,"requestStart":321,"responseEnd":697.2000002861023,"responseStart":697.2000002861023,"secureConnectionStart":321},{"duration":376.7999997138977,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":321.1000003814697,"connectEnd":321.1000003814697,"connectStart":321.1000003814697,"domainLookupEnd":321.1000003814697,"domainLookupStart":321.1000003814697,"fetchStart":321.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":321.1000003814697,"responseEnd":697.9000000953674,"responseStart":697.9000000953674,"secureConnectionStart":321.1000003814697},{"duration":376.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":321.40000009536743,"connectEnd":321.40000009536743,"connectStart":321.40000009536743,"domainLookupEnd":321.40000009536743,"domainLookupStart":321.40000009536743,"fetchStart":321.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":321.40000009536743,"responseEnd":698.3000001907349,"responseStart":698.3000001907349,"secureConnectionStart":321.40000009536743},{"duration":489,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":321.6000003814697,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":321.6000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":810.6000003814697,"responseStart":0,"secureConnectionStart":0},{"duration":377.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":321.7000002861023,"connectEnd":321.7000002861023,"connectStart":321.7000002861023,"domainLookupEnd":321.7000002861023,"domainLookupStart":321.7000002861023,"fetchStart":321.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":321.7000002861023,"responseEnd":699.3000001907349,"responseStart":699.3000001907349,"secureConnectionStart":321.7000002861023},{"duration":488.90000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":321.90000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":321.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":810.8000001907349,"responseStart":0,"secureConnectionStart":0},{"duration":377.8999996185303,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":322.1000003814697,"connectEnd":322.1000003814697,"connectStart":322.1000003814697,"domainLookupEnd":322.1000003814697,"domainLookupStart":322.1000003814697,"fetchStart":322.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":322.1000003814697,"responseEnd":700,"responseStart":700,"secureConnectionStart":322.1000003814697},{"duration":601.0999999046326,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":328.1000003814697,"connectEnd":328.1000003814697,"connectStart":328.1000003814697,"domainLookupEnd":328.1000003814697,"domainLookupStart":328.1000003814697,"fetchStart":328.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":328.1000003814697,"responseEnd":929.2000002861023,"responseStart":929.2000002861023,"secureConnectionStart":328.1000003814697},{"duration":1105.6999998092651,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":328.1000003814697,"connectEnd":328.1000003814697,"connectStart":328.1000003814697,"domainLookupEnd":328.1000003814697,"domainLookupStart":328.1000003814697,"fetchStart":328.1000003814697,"redirectEnd":0,"redirectStart":0,"requestStart":328.1000003814697,"responseEnd":1433.8000001907349,"responseStart":1433.8000001907349,"secureConnectionStart":328.1000003814697},{"duration":164.59999990463257,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":828.7000002861023,"connectEnd":828.7000002861023,"connectStart":828.7000002861023,"domainLookupEnd":828.7000002861023,"domainLookupStart":828.7000002861023,"fetchStart":828.7000002861023,"redirectEnd":0,"redirectStart":0,"requestStart":828.7000002861023,"responseEnd":993.3000001907349,"responseStart":993.3000001907349,"secureConnectionStart":828.7000002861023},{"duration":451.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/e65b778d185daf5aee24936755b43da6/_/download/contextbatch/js/browser-metrics-plugin.contrib,-_super,-project.issue.navigator,-jira.view.issue,-atl.general/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":1084.4000000953674,"connectEnd":1084.4000000953674,"connectStart":1084.4000000953674,"domainLookupEnd":1084.4000000953674,"domainLookupStart":1084.4000000953674,"fetchStart":1084.4000000953674,"redirectEnd":0,"redirectStart":0,"requestStart":1084.4000000953674,"responseEnd":1536,"responseStart":1536,"secureConnectionStart":1084.4000000953674}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":130,"responseStart":313,"responseEnd":323,"domLoading":318,"domInteractive":1528,"domContentLoadedEventStart":1528,"domContentLoadedEventEnd":1591,"domComplete":2101,"loadEventStart":2101,"loadEventEnd":2101,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1480.6000003814697},{"name":"bigPipe.sidebar-id.end","time":1481.5},{"name":"bigPipe.activity-panel-pipe-id.start","time":1481.7000002861023},{"name":"bigPipe.activity-panel-pipe-id.end","time":1484.1000003814697},{"name":"activityTabFullyLoaded","time":1606.5}],"measures":[],"correlationId":"c1661848b19907","effectiveType":"4g","downlink":9.5,"rtt":0,"serverDuration":117,"dbReadsTimeInMs":20,"dbConnsTimeInMs":31,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
This error was introduced by my refactoring yesterday in an
MDEV-24626development branch.