Aria crash recovery failures (MDEV-19813)

[MDEV-17912] Aria with encryption fails upon crash recovery: Redo phase failed, "Got error 192" Created: 2018-12-05  Updated: 2023-12-07

Status: Stalled
Project: MariaDB Server
Component/s: Encryption, Storage Engine - Aria
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.4, 10.5, 10.6

Type: Technical task Priority: Critical
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Unresolved Votes: 0
Labels: affects-tests

Issue Links:
Relates
relates to MDEV-17864 Run regular crash recovery tests for ... Closed
relates to MDEV-17913 Encrypted transactional Aria tables r... Closed

 Description   

10.4 470e9a9fb6

2018-12-05 18:38:27 0 [Note] /home/travis/server/10.4/bin/mysqld (mysqld 10.4.1-MariaDB-debug-log) starting as process 21315 ...
2018-12-05 18:38:27 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 10% 9% 19% 29%
2018-12-05 18:38:33 0 [ERROR] mysqld: failed to decrypt './test/oltp46'  rc: -1  dstlen: 0  size: 8172
 
Got error 192 when executing record redo_new_row_head
2018-12-05 18:38:33 0 [ERROR] mysqld: Aria engine: Redo phase failed
tables to flush: 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
2018-12-05 18:38:33 0 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files
2018-12-05 18:38:33 0 [ERROR] Plugin 'Aria' init function returned error.
2018-12-05 18:38:33 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
2018-12-05 18:38:33 0 [Note] InnoDB: Using Linux native AIO
2018-12-05 18:38:33 0 [Note] InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2018-12-05 18:38:33 0 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2018-12-05 18:38:33 0 [Note] InnoDB: Uses event mutexes
2018-12-05 18:38:33 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2018-12-05 18:38:33 0 [Note] InnoDB: Number of pools: 1
2018-12-05 18:38:33 0 [Note] InnoDB: Using SSE2 crc32 instructions
2018-12-05 18:38:33 0 [Note] mysqld: O_TMPFILE is not supported on /home/travis/logs/vardir/tmp (disabling future attempts)
2018-12-05 18:38:33 0 [Note] InnoDB: Initializing buffer pool, total size = 128M, instances = 1, chunk size = 128M
2018-12-05 18:38:33 0 [Note] InnoDB: Completed initialization of buffer pool
2018-12-05 18:38:33 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2018-12-05 18:38:33 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=139812
2018-12-05 18:38:34 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
2018-12-05 18:38:34 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2018-12-05 18:38:34 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2018-12-05 18:38:34 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2018-12-05 18:38:34 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2018-12-05 18:38:34 0 [Note] InnoDB: Waiting for purge to start
2018-12-05 18:38:34 0 [Note] InnoDB: 10.4.1 started; log sequence number 139821; transaction id 21
2018-12-05 18:38:34 0 [Note] InnoDB: Loading buffer pool(s) from /dev/shm/vardir/data/ib_buffer_pool
2018-12-05 18:38:34 0 [Note] Plugin 'FEEDBACK' is disabled.
2018-12-05 18:38:34 0 [ERROR] Could not open mysql.plugin table. Some plugins may be not loaded
2018-12-05 18:38:34 0 [ERROR] Failed to initialize plugins.
2018-12-05 18:38:34 0 [ERROR] Aborting

The datadir after the initial crash, before any attempt to recover, is here:
ftp://ftp.askmonty.org/public/mdev17912-data.tar.gz

Unpack and start the server with

--aria-encrypt-tables=1 --plugin-load-add=file_key_management --file-key-management-filename=`pwd`/mysql-test/std_data/keys.txt

Adjust the path to keys.txt if needed.
The general log for the test flow prior to the initial crash is in mysql.log.


The test to run the complete flow (might require several trials):

https://github.com/MariaDB/randgen --branch elenst-jira-refs b1facb0360

perl ./runall-new.pl  --basedir=/data/bld/10.2 --vardir=/dev/shm/vardir --duration=250 --threads=6  --scenario=CrashUpgrade   --redefine=conf/mariadb/bulk_insert.yy  --engine=Aria --grammar=conf/mariadb/generic-dml.yy --gendata=conf/mariadb/oltp-aria.zz --mysqld=--aria-encrypt-tables=1 --grammar=conf/mariadb/generic-dml.yy --gendata=conf/mariadb/oltp-aria.zz --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=/data/bld/keys.txt

RQG branch and revision are important here, you might not have some of the required files in other branches.
Adjust the paths to basedir and keys file on the command line.

Couldn't reproduce on 10.1 with the given test, but possibly there is some difference in the test flow.
Couldn't reproduce so far without encryption.



 Comments   
Comment by Michael Widenius [ 2021-10-25 ]

This looks very like something that is related to MDEV-17912.
The data files has the same kind of corruption that would happen if one did a repair on encrypted tables before MDEV-17912. (Either manual or atomic).

When testing with 10.7, I get the following new warnings, which where added to detect the MDEV-17912 issue:

2021-10-25 19:01:36 0 [ERROR] mariadbd: Table ./test/oltp33.MAI has wrong LSN: (2694473,0x2da478dd) on page: 1

Elenst, any chance you could do a new run of the test case on the latest 10.6 or 10.7 to check if this bug still exists?

Comment by Elena Stepanova [ 2021-10-31 ]

monty,

The same test still fails the same way on current 10.6/10.7. I modified the test a little bit just to avoid false positives (because when the recovery does not fail, it can take longer on 10.6+ than it used to on earlier versions, and the test wouldn't wait that long), but it's just a technicality.
Please note that as the description says, it doesn't fail every time. You don't need to restart it, the command line below runs it up to 5 times (--trials=5) unless it fails earlier, and you can increase the value as needed. It usually fails for me within 2-3 attempts.
IMPORTANT: The command line uses `pwd` which known to cause trouble on certain systems. Replace it by the actual path if needed.
IMPORTANT: The test expects that the server would use the file_key_management plugin, so it must be available in the plugin directory. Unless <basedir>/lib/plugin contains it in your build, point the test/server to it via --mysqld=--plugin-dir=<path>.
IMPORTANT: Remember to change the path to the basedir.

git clone https://github.com/MariaDB/randgen --branch mdev17912 rqg-mdev17912
cd rqg-mdev17912
perl ./runall-trials.pl --trials=5  --basedir=/data/bld/10.6 --vardir=/dev/shm/vardir --duration=250 --threads=6  --scenario=CrashUpgrade   --redefine=conf/mariadb/bulk_insert.yy  --engine=Aria --grammar=conf/mariadb/generic-dml.yy --gendata=conf/mariadb/oltp-aria.zz --mysqld=--aria-encrypt-tables=1 --grammar=conf/mariadb/generic-dml.yy --gendata=conf/mariadb/oltp-aria.zz --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/data/file_key_management_keys.txt 

10.6 fdc4c3a2

2021-10-31 21:11:26 0 [Note] /data/bld/10.6/bin/mysqld (server 10.6.5-MariaDB-debug-log) starting as process 985080 ...
2021-10-31 21:11:26 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 10% 20% 30% 40% 50%
2021-10-31 21:11:47 0 [ERROR] mysqld: failed to decrypt './test/oltp88'  rc: -1  dstlen: 0  size: 8172
 
Got error 192 when executing record redo_new_row_head
2021-10-31 21:11:47 0 [ERROR] mysqld: Aria engine: Redo phase failed
tables to flush: 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
2021-10-31 21:11:47 0 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables (*.MAI) and delete all aria_log.######## files
2021-10-31 21:11:48 0 [ERROR] Plugin 'Aria' init function returned error.

Generated at Thu Feb 08 08:40:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.