[MDEV-600] LP:982872 - Aria recovery crash, or assertion `!new_page' failure in _ma_apply_redo_insert_row_head_or_tail, or assertion `page_offset >= keypage_header && page_offset <= page_length' failure in _ma_apply_redo_index Created: 2012-04-16  Updated: 2019-06-18  Resolved: 2019-06-18

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Aria
Affects Version/s: 5.2.14, 5.3.12, 5.5, 10.0, 10.1, 10.2, 10.3, 10.4
Fix Version/s: N/A

Type: Bug Priority: Minor
Reporter: Dreas van Donselaar (Inactive) Assignee: Michael Widenius
Resolution: Cannot Reproduce Votes: 0
Labels: Launchpad

Attachments: XML File LPexportBug982872.xml    
Issue Links:
Relates
relates to MDEV-19718 Assertion `rownr == 0 && new_page' fa... Closed

 Description   

One of our servers that was recently upgraded to MariaDB 5.3 crashed at startup:

Apr 16 10:12:04 server1 mysqld_safe: Starting mysqld daemon with databases from /var/lib/mysql
Apr 16 10:12:04 server1 mysqld: 120416 10:12:04 [Warning] 'master-host' is deprecated and will be removed in a future release. Please use ''CHANGE MASTER'' instead.
Apr 16 10:12:04 server1 mysqld: 120416 10:12:04 [Note] Plugin 'BLACKHOLE' is disabled.
Apr 16 10:12:04 server1 mysqld: 120416 10:12:04 [Note] Plugin 'FEDERATED' is disabled.
Apr 16 10:12:04 server1 mysqld: 120416 10:12:04 [Note] mysqld: Aria engine: starting recovery
Apr 16 10:12:09 server1 mysqld: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
Apr 16 10:12:09 server1 mysqld: This could be because you hit a bug. It is also possible that this binary
Apr 16 10:12:09 server1 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Apr 16 10:12:09 server1 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Apr 16 10:12:09 server1 mysqld: 
Apr 16 10:12:09 server1 mysqld: To report this bug, see http://kb.askmonty.org/en/reporting-bugs
Apr 16 10:12:09 server1 mysqld: 
Apr 16 10:12:09 server1 mysqld: We will try our best to scrape up some info that will hopefully help
Apr 16 10:12:09 server1 mysqld: diagnose the problem, but since we have already crashed, 
Apr 16 10:12:09 server1 mysqld: something is definitely wrong and this may fail.
Apr 16 10:12:09 server1 mysqld: 
Apr 16 10:12:09 server1 mysqld: Server version: 5.3.5-MariaDB-ga-mariadb113~squeeze-log
Apr 16 10:12:09 server1 mysqld: key_buffer_size=268435456
Apr 16 10:12:09 server1 mysqld: read_buffer_size=131072
Apr 16 10:12:09 server1 mysqld: max_used_connections=0
Apr 16 10:12:09 server1 mysqld: max_threads=251
Apr 16 10:12:09 server1 mysqld: thread_count=0
Apr 16 10:12:09 server1 mysqld: connection_count=0
Apr 16 10:12:09 server1 mysqld: It is possible that mysqld could use up to 
Apr 16 10:12:09 server1 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 811573 K  bytes of memory
Apr 16 10:12:09 server1 mysqld: Hope that's ok; if not, decrease some variables in the equation.
Apr 16 10:12:09 server1 mysqld: 
Apr 16 10:12:09 server1 mysqld: Thread pointer: 0x0
Apr 16 10:12:09 server1 mysqld: Attempting backtrace. You can use the following information to find out
Apr 16 10:12:09 server1 mysqld: where mysqld died. If you see no messages after this, something went
Apr 16 10:12:09 server1 mysqld: terribly wrong...
Apr 16 10:12:09 server1 mysqld: stack_bottom = 0 thread_stack 0x40000
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x7f5dc446cd4e]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x494) [0x7f5dc4197e14]
Apr 16 10:12:09 server1 mysqld: /lib/libpthread.so.0(+0xeff0) [0x7f5dc25dcff0]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(bmove_upp+0x10) [0x7f5dc447cc00]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(_ma_apply_redo_index+0x5b9) [0x7f5dc42bf069]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(+0x63fd9d) [0x7f5dc42ccd9d]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(+0x63bde1) [0x7f5dc42c8de1]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(+0x63f415) [0x7f5dc42cc415]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(maria_apply_log+0x1b4) [0x7f5dc42cdbf4]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(maria_recovery_from_log+0x4e) [0x7f5dc42ce5de]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(+0x5f4b46) [0x7f5dc4281b46]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(ha_initialize_handlerton(st_plugin_int*)+0x38) [0x7f5dc418a1c8]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(+0x58bbec) [0x7f5dc4218bec]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(plugin_init(int*, char**, int)+0x52d) [0x7f5dc421af0d]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(+0x3e098a) [0x7f5dc406d98a]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(main+0x1ee) [0x7f5dc40706ae]
Apr 16 10:12:09 server1 mysqld: /lib/libc.so.6(__libc_start_main+0xfd) [0x7f5dc1aecc8d]
Apr 16 10:12:09 server1 mysqld: /usr/sbin/mysqld(+0x2fe669) [0x7f5dc3f8b669]
Apr 16 10:12:09 server1 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
Apr 16 10:12:09 server1 mysqld: information that should help you find out what is causing the crash.
Apr 16 10:12:09 server1 mysqld: Writing a core file
Apr 16 10:12:09 server1 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
Apr 16 10:12:18 server1 /etc/init.d/mysql[3844]: 0 processes alive and '/usr/bin/mysqladmin --defaults-file=/etc/mysql/debian.cnf ping' resulted in
Apr 16 10:12:18 server1 /etc/init.d/mysql[3844]: #007/usr/bin/mysqladmin: connect to server at 'localhost' failed
Apr 16 10:12:18 server1 /etc/init.d/mysql[3844]: error: 'Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)'
Apr 16 10:12:18 server1 /etc/init.d/mysql[3844]: Check that mysqld is running and that the socket: '/var/run/mysqld/mysqld.sock' exists!
Apr 16 10:12:18 server1 /etc/init.d/mysql[3844]: 

Removing /var/lib/mysql/aria* "fixed" this. I'll upload the aria* files to FTP so this can be analyzed.



 Comments   
Comment by Elena Stepanova [ 2012-04-17 ]

Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
Hi Dreas,

I've tried to "recover" with 5.3.5 and your Aria logs on top of an empty database, and didn't get the crash so far. Could you please provide the schema (for starters, at least DDL, if not the data), or, if you already did it earlier in previous bug reports, point at one that we can use?

Thanks.

Comment by Dreas van Donselaar (Inactive) [ 2012-04-17 ]

Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
I uploaded the scheme (launchpad_982872_by_Dreas_scheme.sql.gz). Does that help?

Comment by Elena Stepanova [ 2012-04-18 ]

Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
Hi Dreas,

The SQL you provided relies on a default engine for all tables. Do you have default engine Aria?

When I load the scheme with default engine MyISAM and then initiate Aria recovery, it goes smoothly (not surprisingly as there is no single Aria table in the database). When I load it with default engine Aria and then attempt recovery, it fails, but in a different way – no crash, it just aborts right away on 0%, prints an error message, and the server shuts down.

Comment by Dreas van Donselaar (Inactive) [ 2012-04-26 ]

Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
I've uploaded a complete database dump (including data) to FTP. Hopefully that allows to reproduce this.

Comment by Elena Stepanova [ 2012-04-26 ]

Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
Hi Dreas,

I am getting the crash on the new data.
I will try to create a test case or at least something of a reasonable size to reproduce it reliably, and will pass forward.

Thank you.

Comment by Elena Stepanova [ 2012-04-26 ]

Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
Reproducible on maria-5.2, 5.3, 5.5.

I removed unnecessary tables, but that's about all simplification I was able to perform. The data and aria logs are still big.
3 archive files are uploaded to FTP (in addition to what Dreas provided before):

982872_datadir.tar.gz - already created and populated table and offending aria logs placed into the datadir. To reproduce the problem, decompress the archive and start MariaDB server on the extracted 'data' as a datadir (or run aria_read_log -a inside the datadir).

982872_aria_logs.tar.gz - separately archived aria logs.
982872_sql.tar.gz - the dump of the table (structure and data).
To reproduce the problem using the dump, create a new empty datadir, start MariaDB server, load the dump, shutdown the server, place aria logs into the resulting datadir, then either start the server or run aria_read_log -a.

On release server, it crashes as described above.
On 5.2 debug server it aborts with the failing assertion:

[Note] mysqld: Aria engine: starting recovery
recovered pages: 0%mysqld: ma_blockrec.c:6355: _ma_apply_redo_insert_row_head_or_tail: Assertion `!new_page' failed.

#6 0x00007fe752c3bd4d in _GI__assert_fail (assertion=0xdb8d72 "!new_page", file=<optimized out>, line=6355,
function=<optimized out>) at assert.c:81
#7 0x0000000000a0f72a in _ma_apply_redo_insert_row_head_or_tail (info=0x25f0708, lsn=42966603572, page_type=1,
new_page=1 '\001', header=0x25e1cda "\236", <incomplete sequence \372>, data=0x25e1ce0 "", data_length=34)
at ma_blockrec.c:6355
#8 0x00000000009c9ba8 in exec_REDO_LOGREC_REDO_INSERT_ROW_HEAD (rec=0x7fff100855f0) at ma_recovery.c:1537
#9 0x00000000009c7619 in display_and_apply_record (log_desc=0x1a662b8, rec=0x7fff100855f0) at ma_recovery.c:587
#10 0x00000000009ccd30 in run_redo_phase (lsn=42966567214, lsn_end=0, apply=MARIA_LOG_APPLY) at ma_recovery.c:2695
#11 0x00000000009c6ec6 in maria_apply_log (from_lsn=42966567214, end_lsn=0, apply=MARIA_LOG_APPLY, trace_file=0x25e1a00,
should_run_undo_phase=1 '\001', skip_DDLs_arg=1 '\001', take_checkpoints=1 '\001', warnings_count=0x7fff10085b08)
at ma_recovery.c:349
#12 0x00000000009c6b44 in maria_recovery_from_log () at ma_recovery.c:239
#13 0x0000000000956b2f in ha_maria_init (p=0x25c7718) at ha_maria.cc:3366
#14 0x00000000007d2209 in ha_initialize_handlerton (plugin=0x25c6cf0) at handler.cc:432
#15 0x00000000008a361a in plugin_initialize (plugin=0x25c6cf0) at sql_plugin.cc:1263
#16 0x00000000008a3c88 in plugin_init (argc=0x1246a78, argv=0x25988e0, flags=0) at sql_plugin.cc:1451
#17 0x0000000000680146 in init_server_components () at mysqld.cc:4089
#18 0x0000000000680cb9 in main (argc=8, argv=0x7fff100863f8) at mysqld.cc:4560

aria_read_log -a (debug version) also aborts with an assertion:

aria_read_log: ma_key_recover.c:988: _ma_apply_redo_index: Assertion `page_offset >= keypage_header && page_offset <= page_length' failed.

A release version of aria_read_log -a crashes with segmentation fault.

Comment by Elena Stepanova [ 2012-04-26 ]

Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ;
Since the crash is reproducible on MariaDB 5.2.10, apparently it is not related to the upgrade, but either caused by a corrupted Aria log, or some tricky workflow which produced the unreadable log. Unfortunately, without the server logs it's hard to guess what kind of workflow it could be – if the contents of Aria logs gives any hints, please let me know, and I'll try to generate something.

Comment by Rasmus Johansson (Inactive) [ 2012-04-30 ]

Launchpad bug id: 982872

Comment by Elena Stepanova [ 2019-05-01 ]

Still reproducible on 10.4 0cbc9306:

2019-05-02  2:13:42 0 [Note] mysqld: Aria engine: starting recovery
recovered pages: 0%mysqld: /data/src/10.4/storage/maria/ma_blockrec.c:6437: _ma_apply_redo_insert_row_head_or_tail: Assertion `!new_page' failed.
190502  2:13:43 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f04c5ccbee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000055f4a8fc2742 in _ma_apply_redo_insert_row_head_or_tail (info=0x55f4aad40260, lsn=42966603572, page_type=1, new_page=1 '\001', header=0x55f4aacf5e52 "\236", <incomplete sequence \372>, data=0x55f4aacf5e58 "", data_length=34) at /data/src/10.4/storage/maria/ma_blockrec.c:6437
#9  0x000055f4a8f88fa3 in exec_REDO_LOGREC_REDO_INSERT_ROW_HEAD (rec=0x7ffd5c4a9940) at /data/src/10.4/storage/maria/ma_recovery.c:1566
#10 0x000055f4a8f86b2d in display_and_apply_record (log_desc=0x55f4aa6e5ed8 <log_record_type_descriptor+216>, rec=0x7ffd5c4a9940) at /data/src/10.4/storage/maria/ma_recovery.c:595
#11 0x000055f4a8f8bd87 in run_redo_phase (lsn=42966567214, lsn_end=0, apply=MARIA_LOG_APPLY) at /data/src/10.4/storage/maria/ma_recovery.c:2723
#12 0x000055f4a8f8633d in maria_apply_log (from_lsn=42966567214, end_lsn=0, apply=MARIA_LOG_APPLY, trace_file=0x0, should_run_undo_phase=1 '\001', skip_DDLs_arg=1 '\001', take_checkpoints=1 '\001', warnings_count=0x7ffd5c4adec0) at /data/src/10.4/storage/maria/ma_recovery.c:350
#13 0x000055f4a8f85fab in maria_recovery_from_log () at /data/src/10.4/storage/maria/ma_recovery.c:240
#14 0x000055f4a8f5802f in ha_maria_init (p=0x55f4aad1dad0) at /data/src/10.4/storage/maria/ha_maria.cc:3689
#15 0x000055f4a89043be in ha_initialize_handlerton (plugin=0x55f4aaceb370) at /data/src/10.4/sql/handler.cc:531
#16 0x000055f4a85e6573 in plugin_initialize (tmp_root=0x7ffd5c4ae2f0, plugin=0x55f4aaceb370, argc=0x55f4a9e460b8 <remaining_argc>, argv=0x55f4aac88d78, options_only=false) at /data/src/10.4/sql/sql_plugin.cc:1437
#17 0x000055f4a85e7208 in plugin_init (argc=0x55f4a9e460b8 <remaining_argc>, argv=0x55f4aac88d78, flags=2) at /data/src/10.4/sql/sql_plugin.cc:1720
#18 0x000055f4a849c42d in init_server_components () at /data/src/10.4/sql/mysqld.cc:5176
#19 0x000055f4a849d613 in mysqld_main (argc=11, argv=0x55f4aac88d78) at /data/src/10.4/sql/mysqld.cc:5703
#20 0x000055f4a8491ca0 in main (argc=11, argv=0x7ffd5c4aed38) at /data/src/10.4/sql/main.cc:25

Comment by Michael Widenius [ 2019-06-18 ]

The problem here is that the aria_log.# files contains wrong information or more likely, something is causing a table to have the wrong skip_redo_lsn. MDEV-19718 had exactly the same issue.

I have examined all related code and can't find anything obvious how this could happen. I have created some new code for 10.4 to harden some code that didn't look fail-proof, which don't think that is the definitive fix.

To find the bug we should try to reproduce this problem with RQG. The problem is very likely to do with
LOAD DATA INFILE or INSERT ... SELECT on empty Aria tables.

Comment by Michael Widenius [ 2019-06-18 ]

Unfortunately we can't reproduce the original problem that caused the aria_log files to contain wrong information. The log files has helped us to find the end result of the problem but not the original cause.

However, the log files have helped us determinate that MDEV-19718 is probably caused by the same problem and this MDEV we have been able to reproduce once. We are now actively working on this one. When that is fixed it's very likely that the cause of this problem is also fixed.

Generated at Thu Feb 08 06:29:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.