Index corruption found during RQG testing
|
Some CHECK TABLE t2 EXTENDED harvested
|
test.t2 check Warning InnoDB: Index 'Marvão_idx3' contains 36 entries, should be 37.
|
And the server error log contains a
|
[rr 76197 241886]2020-06-16 18:31:54 33 [ERROR] InnoDB: Index `Marvão_idx3` of table `test`.`t2` is corrupted
|
[rr 76197 241890]2020-06-16 18:31:54 33 [Note] InnoDB: Load corrupted index `Marvão_idx3` of table `test`.`t2`
|
|
Work flow:
|
1. Start the server and generate some initial data
|
2. Several concurrent sessions runs a DDL/DML mix
|
and sometimes CHECK TABLE EXTENDED
|
Please note critical events like killing the server process followed by Restart
|
with CrashRecovery are not involved..
|
|
I have some "rr" trace for the tree
|
origin/bb-10.5-marko 8cad9bc9805ff058e8493d92d4915ca437d16b08
|
which is based on actual 10.5 but contains some non final changes of Marko.
|
Per Marko the failure cannot be caused by his changes.
|
|
My test machinery has seen the same problem also on
|
10.3.24 origin/10.3 294ac1fbab306e97dad3588a00e3527f2056e7e4 2020-05-19T15:27:22+00:00
|
but the corresponding details are lost.
|
And so its unknown if that was during concurrent load or after crash with recovery.
|
|
RQG
|
git clone https://github.com/mleich1/rqg --branch experimental RQG
|
origin/experimental 611a31e38a3bc157392c0748c95991b6a248bb3b 2020-06-09T15:31:46+02:00
|
|
perl rqg.pl \
|
--grammar=conf/mariadb/table_stress_innodb_nocopy.yy \
|
--gendata=conf/mariadb/table_stress.zz \
|
--gendata_sql=conf/mariadb/table_stress.sql \
|
--mysqld=--innodb_use_native_aio=1 \
|
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
|
--mysqld=--loose-idle_write_transaction_timeout=0 \
|
--mysqld=--loose-idle_transaction_timeout=0 \
|
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
|
--mysqld=--connect_timeout=60 \
|
--mysqld=--interactive_timeout=28800 \
|
--mysqld=--slave_net_timeout=60 \
|
--mysqld=--net_read_timeout=30 \
|
--mysqld=--net_write_timeout=60 \
|
--mysqld=--loose-table_lock_wait_timeout=50 \
|
--mysqld=--wait_timeout=28800 \
|
--mysqld=--lock-wait-timeout=86400 \
|
--mysqld=--innodb-lock-wait-timeout=50 \
|
--no-mask \
|
--queries=10000000 \
|
--seed=random \
|
--reporters=Backtrace \
|
--reporters=ErrorLog \
|
--reporters=Deadlock1 \
|
--validators=None \
|
--mysqld=--log_output=none \
|
--mysqld=--log-bin \
|
--mysqld=--log_bin_trust_function_creators=1 \
|
--mysqld=--loose-max-statement-time=30 \
|
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
|
--engine=InnoDB \
|
--restart_timeout=120 \
|
--duration=300 \
|
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
|
--mysqld=--innodb_stats_persistent=off \
|
--threads=33 \
|
--mysqld=--innodb_page_size=4K \
|
--mysqld=--innodb-buffer-pool-size=256M \
|
--duration=300 \
|
--no_mask \
|
... certain local settings ...
|
|
If looking into the attached RQG protocol 002082.log and the RQG call mentioned there please be aware that the "--reporters=CrashRecovery1" just tells
|
that running a work flow invoking kill+restart was planned.
|
But the data corruption was hit before the RQG reporter "CrashRecovery1"
|
would kill the server process.
|
{"report":{"fcp":2346.5999999046326,"ttfb":777.1999998092651,"pageVisibility":"visible","entityId":88434,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"71a4ffb1-8cff-41e4-a72b-1a716075e020","navigationType":0,"readyForUser":2399.7999997138977,"redirectCount":0,"resourceLoadedEnd":2148.5999999046326,"resourceLoadedStart":867.7999997138977,"resourceTiming":[{"duration":227.09999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":867.7999997138977,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":867.7999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1094.8999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":227.2999997138977,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":868,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":868,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1095.2999997138977,"responseStart":0,"secureConnectionStart":0},{"duration":816.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":868.1999998092651,"connectEnd":868.1999998092651,"connectStart":868.1999998092651,"domainLookupEnd":868.1999998092651,"domainLookupStart":868.1999998092651,"fetchStart":868.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":1100.1999998092651,"responseEnd":1684.6999998092651,"responseStart":1225.5,"secureConnectionStart":868.1999998092651},{"duration":1055.1000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":868.3999996185303,"connectEnd":868.3999996185303,"connectStart":868.3999996185303,"domainLookupEnd":868.3999996185303,"domainLookupStart":868.3999996185303,"fetchStart":868.3999996185303,"redirectEnd":0,"redirectStart":0,"requestStart":1100.2999997138977,"responseEnd":1923.5,"responseStart":1218.0999999046326,"secureConnectionStart":868.3999996185303},{"duration":383.90000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":868.5999999046326,"connectEnd":868.5999999046326,"connectStart":868.5999999046326,"domainLookupEnd":868.5999999046326,"domainLookupStart":868.5999999046326,"fetchStart":868.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":1100.3999996185303,"responseEnd":1252.5,"responseStart":1249,"secureConnectionStart":868.5999999046326},{"duration":379.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":868.6999998092651,"connectEnd":868.6999998092651,"connectStart":868.6999998092651,"domainLookupEnd":868.6999998092651,"domainLookupStart":868.6999998092651,"fetchStart":868.6999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":1100.3999996185303,"responseEnd":1248.1999998092651,"responseStart":1224,"secureConnectionStart":868.6999998092651},{"duration":383.7000002861023,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":868.8999996185303,"connectEnd":868.8999996185303,"connectStart":868.8999996185303,"domainLookupEnd":868.8999996185303,"domainLookupStart":868.8999996185303,"fetchStart":868.8999996185303,"redirectEnd":0,"redirectStart":0,"requestStart":1100.5,"responseEnd":1252.5999999046326,"responseStart":1249.5,"secureConnectionStart":868.8999996185303},{"duration":227.2999997138977,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":869,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":869,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1096.2999997138977,"responseStart":0,"secureConnectionStart":0},{"duration":383.5,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":869.2999997138977,"connectEnd":869.2999997138977,"connectStart":869.2999997138977,"domainLookupEnd":869.2999997138977,"domainLookupStart":869.2999997138977,"fetchStart":869.2999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":1100.6999998092651,"responseEnd":1252.7999997138977,"responseStart":1249.8999996185303,"secureConnectionStart":869.2999997138977},{"duration":226.89999961853027,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":869.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":869.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1096.3999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":458.59999990463257,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":869.5999999046326,"connectEnd":869.5999999046326,"connectStart":869.5999999046326,"domainLookupEnd":869.5999999046326,"domainLookupStart":869.5999999046326,"fetchStart":869.5999999046326,"redirectEnd":0,"redirectStart":0,"requestStart":1100.6999998092651,"responseEnd":1328.1999998092651,"responseStart":1250.2999997138977,"secureConnectionStart":869.5999999046326},{"duration":1265.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":882.6999998092651,"connectEnd":882.6999998092651,"connectStart":882.6999998092651,"domainLookupEnd":882.6999998092651,"domainLookupStart":882.6999998092651,"fetchStart":882.6999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":1458.3999996185303,"responseEnd":2148.199999809265,"responseStart":2141.199999809265,"secureConnectionStart":882.6999998092651},{"duration":1265.8000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":882.7999997138977,"connectEnd":882.7999997138977,"connectStart":882.7999997138977,"domainLookupEnd":882.7999997138977,"domainLookupStart":882.7999997138977,"fetchStart":882.7999997138977,"redirectEnd":0,"redirectStart":0,"requestStart":1595.5,"responseEnd":2148.5999999046326,"responseStart":2141.8999996185303,"secureConnectionStart":882.7999997138977},{"duration":316.19999980926514,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1889.5,"connectEnd":1889.5,"connectStart":1889.5,"domainLookupEnd":1889.5,"domainLookupStart":1889.5,"fetchStart":1889.5,"redirectEnd":0,"redirectStart":0,"requestStart":2147.2999997138977,"responseEnd":2205.699999809265,"responseStart":2183.8999996185303,"secureConnectionStart":1889.5}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":68,"responseStart":777,"responseEnd":868,"domLoading":816,"domInteractive":2542,"domContentLoadedEventStart":2542,"domContentLoadedEventEnd":2625,"domComplete":3770,"loadEventStart":3770,"loadEventEnd":3771,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":2457.5999999046326},{"name":"bigPipe.sidebar-id.end","time":2458.3999996185303},{"name":"bigPipe.activity-panel-pipe-id.start","time":2458.699999809265},{"name":"bigPipe.activity-panel-pipe-id.end","time":2462.5999999046326},{"name":"activityTabFullyLoaded","time":2646.5}],"measures":[],"correlationId":"f260c3dba99ace","effectiveType":"4g","downlink":9.1,"rtt":0,"serverDuration":142,"dbReadsTimeInMs":28,"dbConnsTimeInMs":39,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
All records in the clustered index page carry DB_TRX_ID=2009=0x7d9. Also the PAGE_MAX_TRX_ID in the secondary index is at that value. The next available transaction identifier (trx_sys.m_max_trx_id) is 2012. The read view of the CHECK TABLE excludes transaction 2009.
Transaction 2009 modified all 6 rows of the table by executing the statement
The transaction was started at the event time (when) 4387662 and committed at 4257184. At the time the transaction started, the clustered index leaf pages were 13,14,10,11.
The CHECK TABLE had started executing on the primary key at when 4255465. At the time the UPDATE transaction is committing, it was executing inside btr_validate_index(), trying to access our secondary index page 5.
It looks like we must ignore the 6 first records in the secondary index page (which must be excluded from the read view) and concentrate on the delete-marked ones, to see which record is missing and why. Here are all 15 secondary index records that were present when CHECK TABLE counted the records:
KEY(col_longtext_ucs2_key)
(col_longtext_ucs2_key,pk)=
page 5:
('',1),('',2),('',3),('',4),('',5),('',6),
delete-marked(repeat(0x744,242),1),
delete-marked(repeat(0x744,242),2),
delete-marked(repeat(0x744,242),3),
delete-marked(repeat(0x744,242),4),
delete-marked(repeat(0x744,242),5),
delete-marked(repeat(0x744,242),6),
delete-marked(repeat(0xf76,85),1),
delete-marked(repeat(0xf76,85),3),
delete-marked(repeat(0xf76,85),6),
I noticed that also purge is running concurrently with the CHECK TABLE. That should explain why there exist only 3 keys with the value 0xf76.
I hope to complete the analysis of this trace next week. It is rather challenging due to the amount of concurrency, including page splits and merges in the clustered index.