Aria crash recovery failures (MDEV-19813)

[MDEV-19980] Aria crash recovery fails with "Got error 22 when executing undo undo_row_insert" Created: 2019-07-07  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Encryption, Storage Engine - Aria
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.4

Type: Technical task Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Unresolved Votes: 1
Labels: None


 Description   

10.2 c17b0b73

2019-07-08  1:39:49 139741663909696 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (1.2 seconds); transactions to roll back: 1
Got error 22 when executing undo undo_row_insert
2019-07-08  1:39:51 139741663909696 [ERROR] mysqld: Aria engine: Undo phase failed
tables to flush: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
2019-07-08  1:39:51 139741663909696 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files
2019-07-08  1:39:51 139741663909696 [ERROR] Plugin 'Aria' init function returned error.
2019-07-08  1:39:51 139741663909696 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
2019-07-08  1:39:51 139741663909696 [Note] Plugin 'FEEDBACK' is disabled.
2019-07-08  1:39:51 139741663909696 [ERROR] Aria engine is not enabled or did not start. The Aria engine must be enabled to continue as mysqld was configured with --with-aria-tmp-tables
2019-07-08  1:39:51 139741663909696 [ERROR] Aborting

The test below mostly hits error 192, like in 90% cases; error 126 or error 22 (for which this report is devoted) are more rare.
Error 22 is currently best reproducible on 10.2, but it has been observed on all 10.2+ versions.

To run the test:

git clone https://github.com/MariaDB/randgen --branch mdev19980 rqg-mdev19980
cd rqg-mdev19980
. ./mdev19980.cmd <your basedir> <your vardir>

Important note: The command above will re-run the same test 10 times, regardless the outcome, and for each failure it will create <your vardir>_trialX directory, which you can later inspect. If you want it to stop on first failure, remove -force from the command line inside mdev19980.cmd, but keep in mind the note above, that error 22 represents the minority of failures.

I think that using on-disk vardir location gives better chances for reproducing (any of the errors), but it's unproved, I don't have statistics for this.


Generated at Thu Feb 08 08:55:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.