Aria crash recovery failures (MDEV-19813)

[MDEV-18461] Aria crash recovery failures on the same data: 2 different assertions, error 176, refusal to start Created: 2019-02-03  Updated: 2022-09-06

Status: Stalled
Project: MariaDB Server
Component/s: Storage Engine - Aria
Affects Version/s: 10.4, 10.5, 10.6
Fix Version/s: 10.4, 10.5, 10.6

Type: Technical task Priority: Critical
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Unresolved Votes: 1
Labels: None

Attachments: File data.tar.gz     File data2.tar.gz    
Issue Links:
Duplicate
duplicates MDEV-727 LP:625865 - ma_loghandler.c:3723: tra... Closed
Relates
relates to MDEV-19718 Assertion `rownr == 0 && new_page' fa... Closed

 Description   

The attached datadir data.tar.gz was created by intentionally crashing 10.4 server (4b3656a) during concurrent DML.

The server was run with

--aria-encrypt-tables=1 --plugin-load-add=file_key_management --file-key-management-filename=/data/keys.txt

(note that the path to the encryption key file might need to be modified).

While trying to start the server again with the same options on this data directory, various failures have been observed.

Most often this one:

10.4 74eb4fc9

mysqld: /data/src/10.4/storage/maria/ma_loghandler.c:3862: translog_init_with_table: Assertion `sure_page <= last_page' failed.
190203 18:47:13 [ERROR] mysqld got signal 6 ;
 
#7  0x00007fa473202ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000055679969ce16 in translog_init_with_table (directory=0x55679a983cc0 <mysql_real_data_home> "/data/bld/10.4/data/", log_file_max_size=1073741824, server_version=100403, server_id=0, pagecache=0x55679b2289c0 <maria_log_pagecache_var>, flags=0, readonly=0 '\000', init_table_func=0x556799693718 <translog_table_init>, no_errors=0 '\000') at /data/src/10.4/storage/maria/ma_loghandler.c:3862
#9  0x000055679968c828 in ha_maria_init (p=0x55679b321700) at /data/src/10.4/storage/maria/ha_maria.cc:3686
#10 0x000055679945e612 in ha_initialize_handlerton (plugin=0x55679b313fb0) at /data/src/10.4/sql/handler.cc:530
#11 0x000055679914881b in plugin_initialize (tmp_root=0x7ffcb6b7a790, plugin=0x55679b313fb0, argc=0x55679a989eb8 <remaining_argc>, argv=0x55679b2c1968, options_only=false) at /data/src/10.4/sql/sql_plugin.cc:1437
#12 0x00005567991494b0 in plugin_init (argc=0x55679a989eb8 <remaining_argc>, argv=0x55679b2c1968, flags=0) at /data/src/10.4/sql/sql_plugin.cc:1720
#13 0x0000556799001b3f in init_server_components () at /data/src/10.4/sql/mysqld.cc:5153
#14 0x0000556799002d25 in mysqld_main (argc=14, argv=0x55679b2c1968) at /data/src/10.4/sql/mysqld.cc:5680
#15 0x0000556798ff74d0 in main (argc=14, argv=0x7ffcb6b7b1d8) at /data/src/10.4/sql/main.cc:25

This failure was previously mentioned in MDEV-727 (description) and MDEV-583 (comments).

Further attempts to start the server after it end the same way.


Another one has been observed once so far:

10.4 4b3656a

recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (7.6 seconds); transactions to roll back: 2mysqld: /home/travis/src/storage/maria/ma_blockrec.c:2879: write_block_record: Assertion `undo_lsn == ((LSN)1) || head_length == row_pos->length' failed.
190130  3:32:01 [ERROR] mysqld got signal 6 ;
 
#7  0x00007fc88c48ec82 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000055aa69ad035d in write_block_record (info=0x55aa6c7de8b0, old_record=0x55aa6c7eaea0 "\341\017", record=0x55aa6c7eaec5 "\377\017", row=0x55aa6c7de9e0, bitmap_blocks=0x55aa6c7de920, head_block_is_read=1 '\001', row_pos=0x7ffd72ad8bc0, undo_lsn=4326780525, old_record_checksum=0) at /home/travis/src/storage/maria/ma_blockrec.c:2879
#9  0x000055aa69ad3314 in _ma_update_at_original_place (info=0x55aa6c7de8b0, page=1, rownr=50, length_on_head_page=36, extent_count=0, extent_info=0x55aa6c7182a3 "", oldrec=0x55aa6c7eaea0 "\341\017", record=0x55aa6c7eaec5 "\377\017", undo_lsn=4326780525) at /home/travis/src/storage/maria/ma_blockrec.c:4002
#10 0x000055aa69adc100 in _ma_apply_undo_row_update (info=0x55aa6c7de8b0, undo_lsn=4326780525, header=0x55aa6c7182a5 "\002", header_length=11) at /home/travis/src/storage/maria/ma_blockrec.c:7449
#11 0x000055aa69aa199f in exec_UNDO_LOGREC_UNDO_ROW_UPDATE (rec=0x7ffd72ad8dc0, trn=0x55aa6c71b260) at /home/travis/src/storage/maria/ma_recovery.c:2387
#12 0x000055aa69aa3257 in run_undo_phase (uncommitted=1) at /home/travis/src/storage/maria/ma_recovery.c:2964
#13 0x000055aa69a9cf72 in maria_apply_log (from_lsn=4317903329, end_lsn=0, apply=MARIA_LOG_APPLY, trace_file=0x0, should_run_undo_phase=1 '\001', skip_DDLs_arg=1 '\001', take_checkpoints=1 '\001', warnings_count=0x7ffd72ad92d0) at /home/travis/src/storage/maria/ma_recovery.c:422
#14 0x000055aa69a9ca46 in maria_recovery_from_log () at /home/travis/src/storage/maria/ma_recovery.c:240
#15 0x000055aa69a6eadf in ha_maria_init (p=0x55aa6c6fbe20) at /home/travis/src/storage/maria/ha_maria.cc:3689
#16 0x000055aa69466d0b in ha_initialize_handlerton (plugin=0x55aa6c6eca28) at /home/travis/src/sql/handler.cc:530
#17 0x000055aa69150208 in plugin_initialize (tmp_root=0x7ffd72ad9700, plugin=0x55aa6c6eca28, argc=0x55aa6a974eb8 <remaining_argc>, argv=0x55aa6c69a638, options_only=false) at /home/travis/src/sql/sql_plugin.cc:1437
#18 0x000055aa69150ebc in plugin_init (argc=0x55aa6a974eb8 <remaining_argc>, argv=0x55aa6c69a638, flags=0) at /home/travis/src/sql/sql_plugin.cc:1719
#19 0x000055aa69009015 in init_server_components () at /home/travis/src/sql/mysqld.cc:5151
#20 0x000055aa6900a1f9 in mysqld_main (argc=22, argv=0x55aa6c69a638) at /home/travis/src/sql/mysqld.cc:5680
#21 0x000055aa68ffe950 in main (argc=22, argv=0x7ffd72ada148) at /home/travis/src/sql/main.cc:25

This failure was mentioned in comments to MDEV-15256.

The datadir after this crash is attached as data2.tar.gz

Further attempts to start the server after it end with

10.4

2019-02-03 18:53:47 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0%
Got error 176 when executing record redo_insert_row_head
2019-02-03 18:53:47 0 [ERROR] mysqld: Aria engine: Redo phase failed
tables to flush: 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
2019-02-03 18:53:49 0 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files
2019-02-03 18:53:49 0 [ERROR] Plugin 'Aria' init function returned error.
2019-02-03 18:53:49 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.


Finally, non-debug server fails to start on data.tar.gz without any meaningful explanation:

10.4 74eb4fc9

2019-02-03 18:59:16 0 [ERROR] mysqld: Aria engine: log initialization failed
2019-02-03 18:59:16 0 [ERROR] Plugin 'Aria' init function returned error.
2019-02-03 18:59:16 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.

Given that the assertion failures have been seen long time ago, it's very likely that other versions are also affected. But since the attached data directories were created on 10.4, I can't check other versions with it, and thus filing it as a 10.4 bug. Please feel free to adjust affected/fix versions accordingly.



 Comments   
Comment by Michael Widenius [ 2019-06-13 ]

I was able to repeat the error with data.tar.gz. The problem was a miss match between the last checkpoint lsn in
aria_log_control and aria_log.### files.

Unfortunately there is not enough information of what could have caused this. I did check all writes to the aria_log_control file and as far as I could find, the log is always flushed before the control file which should make this thing impossible. I need more data or a way to repeat the issue to be able to find out what is going on.
Being able to get a debug trace file for the crash and for recovery would enable me to find the issue.

For now, I have at changed the code so that instead of an assert we get a readable error message that shows the miss match.

Comment by Michael Widenius [ 2019-06-14 ]

Not much I can do about this bug report as the data uploaded is encrypted and the the keys aren't in the test case.

However, I have found a lot of bugs when trying to use encrypted tables without keys and I have fixed all that I can find. I have also cleaned up the progress reporting in the log file to be more readable.

Comment by Elena Stepanova [ 2019-06-15 ]

Standard MTR keys from mysql-test/std_data/keys.txt are used.

Comment by Marko Mäkelä [ 2019-06-17 ]

I am wonder if this assertion failure on Windows is related:

10.4 bff7cf9df8349a80a337c94ea55ceaa3daf3d72f

CURRENT_TEST: main.delayed
 
 
Server [mysqld.1 - pid: 8408, winpid: 8408, exit: 65280] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2019-06-15 19:31:46 707 [Warning] Aborted connection 707 to db: 'test' user: 'root' host: 'localhost' (Got an error reading communication packets)
2019-06-15 19:31:47 0 [Note] D:/winx64-debug/build/sql//Debug/mysqld.exe (mysqld 10.4.6-MariaDB-debug-log) starting as process 5140 ...
Assertion failed: (uint32) ((scanner->horizon) >> 32) > (uint32) ((scanner->page_addr) >> 32), file D:\winx64-debug\build\src\storage\maria\ma_loghandler.c, line 6898

Comment by Elena Stepanova [ 2019-06-20 ]

main.delayed doesn't perform any server restarts, so I don't know if it can be related to recovery problems.

Comment by Marko Mäkelä [ 2021-11-12 ]

For the record, I tried the recovery of data.tar.gz as follows:

sql/mariadbd --plugin-dir $(pwd)/plugin/file_key_management --plugin-load-add=file_key_management.so --file-key-management-filename /mariadb/10.6/mysql-test/std_data/keys.txt --datadir /dev/shm/data

It failed to recover:

10.6 ba4f8e317de33330c882b5de14979685a0216ce0

2021-11-12 18:25:00 0 [ERROR] mariadbd: Aria engine: log data error
last_log_page:   (1,0x3c000) is less than
checkpoint page: (1,0x1b52000)
2021-11-12 18:25:00 0 [ERROR] mariadbd: Aria engine: log initialization failed
2021-11-12 18:25:00 0 [ERROR] Plugin 'Aria' init function returned error.
2021-11-12 18:25:00 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.

(Note: The data directory also includes an InnoDB log file, and InnoDB recovery is supposed to be refused with MariaDB 10.5 or later.)

Comment by Marko Mäkelä [ 2022-09-06 ]

The test main.grant_kill joins the club:

10.5 c0470caf5a80e69ad7d855a871c62cf72dc03b05

main.grant_kill                          w14 [ fail ]
        Test ended at 2022-09-06 08:42:12
CURRENT_TEST: main.grant_kill
Failed to start mysqld.1
mysqltest failed but provided no output
 - saving '/home/buildbot/amd64-fedora-35/build/mysql-test/var/14/log/main.grant_kill/' to '/home/buildbot/amd64-fedora-35/build/mysql-test/var/log/main.grant_kill/'
Retrying test main.grant_kill, attempt(2/3)...
worker[14] > Restart  - not started
***Warnings generated in error logs during shutdown after running tests: main.grant_kill
2022-09-06  8:42:12 0 [ERROR] mariadbd: Aria recovery failed. Please run aria_chk -r on all Aria tables (*.MAI) and delete all aria_log.######## files
2022-09-06  8:42:12 0 [ERROR] Plugin 'Aria' init function returned error.
2022-09-06  8:42:12 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
2022-09-06  8:42:12 0 [ERROR] Could not open mysql.plugin table: "Unknown storage engine 'Aria'". Some plugins may be not loaded
2022-09-06  8:42:12 0 [ERROR] Failed to initialize plugins.
2022-09-06  8:42:12 0 [ERROR] Aborting
main.func_like                           w9 [ pass ]     16
main.default_storage_engine              w5 [ pass ]    472
main.grant_kill                          w14 [ retry-pass ]      9

Generated at Thu Feb 08 08:44:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.