[MDEV-18187] Aria engine: Redo phase failed with "error 192 when executing record redo_index_new_page" upon startup on a restored datadir - Jira

Details

Type: Technical task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.4(EOL)
Fix Version/s: 10.4.23, 10.5.14, 10.6.6
Component/s: mariabackup, Storage Engine - Aria
Labels:
None

Description

10.4 30da40bb8c
2019-01-09 16:00:40 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0%
2019-01-09 16:00:40 0 [ERROR] mysqld: failed to decrypt './test/t7' rc: -1 dstlen: 0 size: 4294967275

Got error 192 when executing record redo_index_new_page
2019-01-09 16:00:40 0 [ERROR] mysqld: Aria engine: Redo phase failed
tables to flush: 9 8 7 6 5 4 3 2 1 0
2019-01-09 16:00:40 0 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files
2019-01-09 16:00:40 0 [ERROR] Plugin 'Aria' init function returned error.
2019-01-09 16:00:40 0 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.

The backup dir (stored after prepare but before any server startup was attempted) is here:
ftp://ftp.askmonty.org/public/mdev18187-backup.tar.gz

To reproduce the failure, restore it to the datadir and start the server with

--innodb-encrypt-tables --innodb-encrypt-log --innodb-encryption-threads=4 --aria-encrypt-tables=1 --encrypt-tmp-

disk-tables=1 --plugin-load-add=file_key_management --file-key-management --file-key-management-filename=`pwd`/mysql-test/std_data/keys.txt

(modify file-key-management-filename value as needed).

Notes on the origin of the report. Please be aware it's not necessarily reproducible the same way.
https://travis-ci.org/elenst/travis-tests/jobs/477279304

The problem initially happened by running the concurrent test with MariaBackupFull scenario:

elenst-jira-refs 5342873c9, mariadb-toolbox 6ded4b71568

perl ./runall-new.pl --basedir=/home/travis/server --vardir=/home/travis/logs/vardir --duration=350 --threads=4 --seed=1547032472 --reporters=Backtrace,ErrorLog,Deadlock --skip-gendata --gendata-advanced --views --grammar=conf/mariadb/generic-dml.yy --redefine=conf/mariadb/bulk_insert.yy --mysqld=--log_output=FILE --mysqld=--max-statement-time=20 --mysqld=--lock-wait-timeout=10 --mysqld=--loose-innodb-lock-wait-timeout=5 --mysqld=--loose-debug_assert_on_not_freed_memory=0 --filter=/home/travis/mariadb-toolbox/travis/10.4-combo-filter.ff --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--aria-encrypt-tables=1 --mysqld=--encrypt-tmp-disk-tables=1 --mysqld=--file-key-management --mysqld=--file-key-management-filename=/home/travis/mariadb-toolbox/data/keys.txt --mysqld=--plugin-load-add=file_key_management --engine=Aria --scenario=MariaBackupFull

The test performs the following steps:

start the server with

--no-defaults --basedir=/home/travis/server --lc-messages-dir=/home/travis/server/share --character-sets-dir=/home/travis/server/share/charsets --tmpdir=/home/travis/logs/vardir/tmp --core-file --datadir=/home/travis/logs/vardir/data --max-allowed-packet=128Mb --port=19300 --socket=/home/travis/logs/vardir/mysql.sock --pid-file=/home/travis/logs/vardir/mysql.pid --general-log --general-log-file=/home/travis/logs/vardir/mysql.log --log_output=FILE --max-statement-time=20 --lock-wait-timeout=10 --loose-innodb-lock-wait-timeout=5 --loose-debug_assert_on_not_freed_memory=0 --innodb-encrypt-tables --innodb-encrypt-log --innodb-encryption-threads=4 --aria-encrypt-tables=1 --encrypt-tmp-disk-tables=1 --file-key-management --file-key-management-filename=/home/travis/mariadb-toolbox/data/keys.txt --plugin-load-add=file_key_management

run concurrent DML/DDL flow on the server;
after ~100 seconds, take the full backup from the running server (succeeded, the log attached as mbackup_backup.log;
after another ~100 seconds, shut down the server gracefully (succeeded, the error log is attached as mysql.err_orig;
run mariabackup --prepare (succeded, the log attached as mbackup_prepare.log;
cleanup server datadir and restore the backup to it (succeeded, the log attached as mbackup_restore.log;
try to start the server with the same options as before.

I have no information yet whether the problem happens on earlier versions.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

mbackup_backup.log
21 kB
2019-01-09 14:17
mbackup_prepare.log
2 kB
2019-01-09 14:17
mbackup_restore.log
18 kB
2019-01-09 14:17
mysql.err
11 kB
2019-01-09 14:18
mysql.err_orig
4 kB
2019-01-09 14:17

Issue Links

is caused by

MDEV-17913 Encrypted transactional Aria tables remain corrupt after crash recovery, automatic repairment does not work

Closed

Activity

Ascending order - Click to sort in descending order

Alexander added a comment - 2021-08-09 21:43

I made a fix, please review:
https://github.com/MariaDB/server/pull/1888

Alexander added a comment - 2021-08-09 21:43 I made a fix, please review: https://github.com/MariaDB/server/pull/1888

Michael Widenius added a comment - 2021-12-15 13:03

This issue was resolved a long time ago in
81258f14323e1d1ad0203bae93bc55a30d47c1b3

The issue was that the encryption code wrote wrong data on page headers, which made it
impossible for recovery to decrypt the data.
Even after the fix the data in this bug report will not work as the data is already corrupted.

Michael Widenius added a comment - 2021-12-15 13:03 This issue was resolved a long time ago in 81258f14323e1d1ad0203bae93bc55a30d47c1b3 The issue was that the encryption code wrote wrong data on page headers, which made it impossible for recovery to decrypt the data. Even after the fix the data in this bug report will not work as the data is already corrupted.

Michael Widenius added a comment - 2021-12-15 13:57

New test case shows another issue

Michael Widenius added a comment - 2021-12-15 13:57 New test case shows another issue

Michael Widenius added a comment - 2021-12-15 14:02

The fix in https://github.com/MariaDB/server/pull/1888 is slight wrong.
The issue is that when recovery is about to create a new data or index page, it check if the page already exits.
If the page does not exists (file is too short) or contains wrong checksum, then the recovery code will recreate the old page.
The bug was that the code that checked if the page existed didn't take into account encrypted pages.
Adding a check if page could not be encrypted solved the issue.

Michael Widenius added a comment - 2021-12-15 14:02 The fix in https://github.com/MariaDB/server/pull/1888 is slight wrong. The issue is that when recovery is about to create a new data or index page, it check if the page already exits. If the page does not exists (file is too short) or contains wrong checksum, then the recovery code will recreate the old page. The bug was that the code that checked if the page existed didn't take into account encrypted pages. Adding a check if page could not be encrypted solved the issue.

Michael Widenius added a comment - 2021-12-15 14:33

Fix pushed to bb-10.4-monty for testing

Michael Widenius added a comment - 2021-12-15 14:33 Fix pushed to bb-10.4-monty for testing

Elena Stepanova added a comment - 2023-06-08 18:26

There is a patch for this issue in 10.4+, so closing it.

Elena Stepanova added a comment - 2023-06-08 18:26 There is a patch for this issue in 10.4+, so closing it.

MariaDB Server

Aria engine: Redo phase failed with "error 192 when executing record redo_index_new_page" upon startup on a restored datadir

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration