[MDEV-20578] Got error 126 when executing undo undo_key_delete upon Aria crash recovery Created: 2019-09-12  Updated: 2020-05-26  Resolved: 2020-05-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Aria
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
PartOf
is part of MDEV-19813 Aria crash recovery failures Open
Relates
relates to MDEV-18203 Aria engine: Undo phase failed with "... Closed
relates to MDEV-18310 Aria engine: Undo phase failed with "... Closed

 Description   

The failure looks somewhat similar to MDEV-18203, but this one is reproducible on all 10.2+ and does not require encryption (and it's undo_key_delete instead of undo_key_insert), so I'll file it separately.

10.2 5ec4efb7

2019-09-13  2:23:05 140564631615296 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% (18.4 seconds); transactions to roll back: 1
Got error 126 when executing undo undo_key_delete
2019-09-13  2:23:23 140564631615296 [ERROR] mysqld: Aria engine: Undo phase failed
tables to flush: 9 8 7 6 5 4 3 2 1 0
2019-09-13  2:23:23 140564631615296 [ERROR] mysqld: Aria recovery failed. Please run aria_chk -r on all Aria tables and delete all aria_log.######## files
2019-09-13  2:23:23 140564631615296 [ERROR] Plugin 'Aria' init function returned error.
2019-09-13  2:23:23 140564631615296 [ERROR] Plugin 'Aria' registration as a STORAGE ENGINE failed.
2019-09-13  2:23:23 140564631615296 [Note] Plugin 'FEEDBACK' is disabled.
2019-09-13  2:23:23 140564631615296 [ERROR] Aria engine is not enabled or did not start. The Aria engine must be enabled to continue as mysqld was configured with --with-aria-tmp-tables
2019-09-13  2:23:23 140564631615296 [ERROR] Aborting

Attention: use the RQG branch as described below, it is tuned for reproducing the failure
Remember to use the correct basedir, and modify vardir if necessary.
The test fails almost every time for me, but sometimes it misses the mark, just re-run it if it didn't fail.

git clone https://github.com/MariaDB/randgen --branch mdev20578 rqg-mdev20578
cd rqg-mdev20578
perl runall-new.pl --basedir=<basedir> --grammar=./mdev20578.yy --skip-gendata --gendata-advanced --duration=60 --threads=1 --seed=1 --reporter=CrashRecovery --vardir=/dev/shm/mdev20578



 Comments   
Comment by Elena Stepanova [ 2019-09-12 ]

On 10.5 built with S3, in addition to the above (which is still reproducible), the following assertion failure happens after the failed recovery:

10.5 46553c25

mysqld: /data/src/10.5/storage/maria/ha_s3.cc:681: int ha_s3_init(void*): Assertion `maria_hton' failed.
190913  2:48:15 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f18dcf4ff12 in __GI___assert_fail (assertion=0x55eae119b75b "maria_hton", file=0x55eae119b678 "/data/src/10.5/storage/maria/ha_s3.cc", line=681, function=0x55eae119b850 <ha_s3_init(void*)::__PRETTY_FUNCTION__> "int ha_s3_init(void*)") at assert.c:101
#8  0x000055eae0c78976 in ha_s3_init (p=0x55eae2fef500) at /data/src/10.5/storage/maria/ha_s3.cc:681
#9  0x000055eae04e6633 in ha_initialize_handlerton (plugin=0x55eae2f23988) at /data/src/10.5/sql/handler.cc:550
#10 0x000055eae01e2417 in plugin_initialize (tmp_root=0x7ffd0ca4dda0, plugin=0x55eae2f23988, argc=0x55eae1a75318 <remaining_argc>, argv=0x55eae2ebaae8, options_only=false) at /data/src/10.5/sql/sql_plugin.cc:1437
#11 0x000055eae01e30bf in plugin_init (argc=0x55eae1a75318 <remaining_argc>, argv=0x55eae2ebaae8, flags=0) at /data/src/10.5/sql/sql_plugin.cc:1719
#12 0x000055eae0098a28 in init_server_components () at /data/src/10.5/sql/mysqld.cc:5106
#13 0x000055eae0099bf3 in mysqld_main (argc=14, argv=0x55eae2ebaae8) at /data/src/10.5/sql/mysqld.cc:5629
#14 0x000055eae008e7d0 in main (argc=14, argv=0x7ffd0ca4e7e8) at /data/src/10.5/sql/main.cc:25

Comment by Michael Widenius [ 2020-05-22 ]

The issue with S3 is fixed a long time ago (just checked).

Comment by Elena Stepanova [ 2020-05-22 ]

An example of the datadir stored after the server was killed, before a recovery attempt was made: ftp://ftp.askmonty.org/public/mdev20578_data.tar.gz
It's a big archive, almost 1Gb.

Comment by Michael Widenius [ 2020-05-26 ]

The crash happens in this scenario:

  • Table with unique keys and non unique keys
  • Batch insert (LOAD DATA or INSERT ... SELECT) with REPLACE
  • Some insert succeeds followed by duplicate key error

In the above scenario the table gets corrupted.

The bug was that we don't generate any undo entry for the
failed insert as the whole insert can be ignored by undo.
The code did however not take into account that when bulk
insert is used, we would write cached keys to the file on
failure and undo would wrongly ignore these.

Fixed by moving the writing of the cache keys after we write
the aborted-insert event to the log.

Generated at Thu Feb 08 09:00:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.