We noticed that after switching parallel replication mode of our replicas from conservative to optimistic, the restore almost always fails with the infamous error:
Found 5 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
|
This only happens when the replica we take a backup from has optimistic replication enabled. Needless to say we aren't using XA transactions, we do however use prepared statements on the upstairs master.
I was able to reproduce this more or less consistently (meaning it will pass about only one out of 10 times I try, while normally failing with the above error).
How to reproduce:
On one node, install MariaDB and sysbench. Use sysbench cmd line below.
sysbench --threads=32 --db-ps-mode=auto --tables=8 --time=3600 --mysql-host='localhost' --mysql-user='root' /usr/share/sysbench/oltp_read_write.lua run
|
Master's non-default MariaDB config :
transaction_isolation='READ-COMMITTED'
|
On another node, set up a replica.
Non-default config:
slave_parallel_threads=16
|
transaction_isolation='READ-COMMITTED'
|
log_bin='binlog'
|
log_slave_updates
|
While sysbench runs on master, take a streaming backup from the replica to a 3rd server where you want to restore:
mariabackup --backup --stream mbstream --slave-info --parallel 8 | ssh -q 10.211.55.17 mbstream -x -v -C /var/lib/mysql
|
On the target server, try to restore the backup:
mariabackup --prepare --use-memory=400M --target-dir=/var/lib/mysql
|
chown -R mysql:mysql /var/lib/mysql/
|
systemctl start mariadb.service
|
Startup fails.
Example:
2025-02-04 19:34:56 0 [Note] Starting MariaDB 10.6.20-MariaDB source revision f00711bba2cd383825d0be1867f7d7d7f641c9e4 server_uid wv/6js4vd2mPJW/WB5ScVJ/uod0= as process 5714
|
2025-02-04 19:34:56 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
|
2025-02-04 19:34:56 0 [Note] InnoDB: Number of pools: 1
|
2025-02-04 19:34:56 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
|
2025-02-04 19:34:56 0 [Note] InnoDB: Using Linux native AIO
|
2025-02-04 19:34:56 0 [Note] InnoDB: Initializing buffer pool, total size = 134217728, chunk size = 134217728
|
2025-02-04 19:34:56 0 [Note] InnoDB: Completed initialization of buffer pool
|
2025-02-04 19:34:56 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 100663296 bytes
|
2025-02-04 19:34:56 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
|
2025-02-04 19:34:56 0 [Note] InnoDB: New log file created, LSN=52293475656
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471403 was in the XA prepared state.
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471419 was in the XA prepared state.
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471422 was in the XA prepared state.
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471423 was in the XA prepared state.
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471426 was in the XA prepared state.
|
2025-02-04 19:34:56 0 [Note] InnoDB: 14 transaction(s) which must be rolled back or cleaned up in total 45 row operations to undo
|
2025-02-04 19:34:56 0 [Note] InnoDB: Trx id counter is 25471435
|
2025-02-04 19:34:56 0 [Note] InnoDB: 128 rollback segments are active.
|
2025-02-04 19:34:56 0 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2025-02-04 19:34:56 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2025-02-04 19:34:56 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
|
2025-02-04 19:34:56 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2025-02-04 19:34:56 0 [Note] InnoDB: 10.6.20 started; log sequence number 0; transaction id 25471436
|
2025-02-04 19:34:56 0 [Note] Plugin 'FEEDBACK' is disabled.
|
2025-02-04 19:34:56 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
|
2025-02-04 19:34:56 0 [Note] InnoDB: Cannot open '/var/lib/mysql/ib_buffer_pool' for reading: No such file or directory
|
2025-02-04 19:34:56 0 [Note] InnoDB: Rolled back recovered transaction 25471404
|
2025-02-04 19:34:56 0 [Note] InnoDB: Rolled back recovered transaction 25471420
|
2025-02-04 19:34:56 0 [Note] InnoDB: Starting recovery for XA transactions...
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471403 in prepared state after recovery
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction contains changes to 5 rows
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471423 in prepared state after recovery
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction contains changes to 5 rows
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471419 in prepared state after recovery
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction contains changes to 5 rows
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471426 in prepared state after recovery
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction contains changes to 5 rows
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction 25471422 in prepared state after recovery
|
2025-02-04 19:34:56 0 [Note] InnoDB: Transaction contains changes to 5 rows
|
2025-02-04 19:34:56 0 [Note] InnoDB: 5 transactions in prepared state after recovery
|
2025-02-04 19:34:56 0 [Note] Found 5 prepared transaction(s) in InnoDB
|
2025-02-04 19:34:56 0 [ERROR] Found 5 prepared transactions! It means that server was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start server with --tc-heuristic-recover switch to commit or rollback pending transactions.
|
2025-02-04 19:34:56 0 [ERROR] Aborting
|
{"report":{"fcp":949.9000000953674,"ttfb":247,"pageVisibility":"visible","entityId":132700,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"97c3837a-030f-4b14-8574-0e6729d8b6f0","navigationType":0,"readyForUser":1032.8000001907349,"redirectCount":0,"resourceLoadedEnd":773.2000000476837,"resourceLoadedStart":254.40000009536743,"resourceTiming":[{"duration":226.40000009536743,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":254.40000009536743,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":254.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":480.80000019073486,"responseStart":0,"secureConnectionStart":0},{"duration":226.59999990463257,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true&whisper-enabled=true","startTime":254.60000014305115,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":254.60000014305115,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":481.2000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":235.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/0917945aaa57108d00c5076fea35e069-CDN/lu2cib/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":254.80000019073486,"connectEnd":254.80000019073486,"connectStart":254.80000019073486,"domainLookupEnd":254.80000019073486,"domainLookupStart":254.80000019073486,"fetchStart":254.80000019073486,"redirectEnd":0,"redirectStart":0,"requestStart":254.80000019073486,"responseEnd":490.30000019073486,"responseStart":490.30000019073486,"secureConnectionStart":254.80000019073486},{"duration":283.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/2d8175ec2fa4c816e8023260bd8c1786-CDN/lu2cib/820016/12ta74/494e4c556ecbb29f90a3d3b4f09cb99c/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true&whisper-enabled=true","startTime":255,"connectEnd":255,"connectStart":255,"domainLookupEnd":255,"domainLookupStart":255,"fetchStart":255,"redirectEnd":0,"redirectStart":0,"requestStart":255,"responseEnd":538.5,"responseStart":538.5,"secureConnectionStart":255},{"duration":287.10000014305115,"initiatorType":"script","name":"https://jira.mariadb.org/s/a9324d6758d385eb45c462685ad88f1d-CDN/lu2cib/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":255.20000004768372,"connectEnd":255.20000004768372,"connectStart":255.20000004768372,"domainLookupEnd":255.20000004768372,"domainLookupStart":255.20000004768372,"fetchStart":255.20000004768372,"redirectEnd":0,"redirectStart":0,"requestStart":255.20000004768372,"responseEnd":542.3000001907349,"responseStart":542.3000001907349,"secureConnectionStart":255.20000004768372},{"duration":287.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":255.40000009536743,"connectEnd":255.40000009536743,"connectStart":255.40000009536743,"domainLookupEnd":255.40000009536743,"domainLookupStart":255.40000009536743,"fetchStart":255.40000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":255.40000009536743,"responseEnd":542.8000001907349,"responseStart":542.8000001907349,"secureConnectionStart":255.40000009536743},{"duration":287.7000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":255.5,"connectEnd":255.5,"connectStart":255.5,"domainLookupEnd":255.5,"domainLookupStart":255.5,"fetchStart":255.5,"redirectEnd":0,"redirectStart":0,"requestStart":255.5,"responseEnd":543.2000000476837,"responseStart":543.2000000476837,"secureConnectionStart":255.5},{"duration":376.39999985694885,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2cib/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":255.60000014305115,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":255.60000014305115,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":632,"responseStart":0,"secureConnectionStart":0},{"duration":287.7000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":255.90000009536743,"connectEnd":255.90000009536743,"connectStart":255.90000009536743,"domainLookupEnd":255.90000009536743,"domainLookupStart":255.90000009536743,"fetchStart":255.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":255.90000009536743,"responseEnd":543.6000001430511,"responseStart":543.6000001430511,"secureConnectionStart":255.90000009536743},{"duration":376.2000000476837,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":256,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":256,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":632.2000000476837,"responseStart":0,"secureConnectionStart":0},{"duration":288,"initiatorType":"script","name":"https://jira.mariadb.org/s/5d5e8fe91fbc506585e83ea3b62ccc4b-CDN/lu2cib/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":256.2000000476837,"connectEnd":256.2000000476837,"connectStart":256.2000000476837,"domainLookupEnd":256.2000000476837,"domainLookupStart":256.2000000476837,"fetchStart":256.2000000476837,"redirectEnd":0,"redirectStart":0,"requestStart":256.2000000476837,"responseEnd":544.2000000476837,"responseStart":544.2000000476837,"secureConnectionStart":256.2000000476837},{"duration":410.40000009536743,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":256.90000009536743,"connectEnd":256.90000009536743,"connectStart":256.90000009536743,"domainLookupEnd":256.90000009536743,"domainLookupStart":256.90000009536743,"fetchStart":256.90000009536743,"redirectEnd":0,"redirectStart":0,"requestStart":256.90000009536743,"responseEnd":667.3000001907349,"responseStart":667.3000001907349,"secureConnectionStart":256.90000009536743},{"duration":516.2000000476837,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2cib/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":257,"connectEnd":257,"connectStart":257,"domainLookupEnd":257,"domainLookupStart":257,"fetchStart":257,"redirectEnd":0,"redirectStart":0,"requestStart":257,"responseEnd":773.2000000476837,"responseStart":773.2000000476837,"secureConnectionStart":257},{"duration":112.59999990463257,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":660.1000001430511,"connectEnd":660.1000001430511,"connectStart":660.1000001430511,"domainLookupEnd":660.1000001430511,"domainLookupStart":660.1000001430511,"fetchStart":660.1000001430511,"redirectEnd":0,"redirectStart":0,"requestStart":660.1000001430511,"responseEnd":772.7000000476837,"responseStart":772.7000000476837,"secureConnectionStart":660.1000001430511}],"fetchStart":0,"domainLookupStart":0,"domainLookupEnd":0,"connectStart":0,"connectEnd":0,"requestStart":67,"responseStart":247,"responseEnd":251,"domLoading":251,"domInteractive":1138,"domContentLoadedEventStart":1138,"domContentLoadedEventEnd":1195,"domComplete":1583,"loadEventStart":1583,"loadEventEnd":1584,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1072.8000001907349},{"name":"bigPipe.sidebar-id.end","time":1073.6000001430511},{"name":"bigPipe.activity-panel-pipe-id.start","time":1073.7000000476837},{"name":"bigPipe.activity-panel-pipe-id.end","time":1075.9000000953674},{"name":"activityTabFullyLoaded","time":1210.8000001907349}],"measures":[],"correlationId":"c582443e45a5c","effectiveType":"4g","downlink":9.3,"rtt":0,"serverDuration":113,"dbReadsTimeInMs":14,"dbConnsTimeInMs":23,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}
It is normal to have prepared transactions pending inside InnoDB when the binlog is enabled. This happens because the server uses internal XA transactions to do two-phase commit between InnoDB and the binlog to ensure consistency in case of a server crash.
Thus, this is probably not directly related to optimistic parallel replication, except that optimistic parallel replication runs a lot more transactions in parallel and can thus have more transactions in the "prepared" state at the time of a crash (or mariabackup snapshot)
During normal crash recovery, the binlog is consulted to decide how to handle these transactions. But since the binlogs are not included in the mariabackup, the server cannot use this method and gives the described error instead.
A work-around is to start the restored server with --tc-heuristic-recover=rollback (on first start only).
There used to be an option --rollback-xa for mariabackup that could be used to handle this, but it is removed since 10.5. Which version of mariabackup are you using to take the backup?
marko, wlad, does one of you know if mariabackup is supposed to handle this automatically (ie. roll back any internal XA transactions in InnoDB during --prepare), or how else the issue of pending internal XA transactions are supposed to be handled in the restored server which is missing the binlogs to do crash recovery?