[MDEV-600] LP:982872 - Aria recovery crash, or assertion `!new_page' failure in _ma_apply_redo_insert_row_head_or_tail, or assertion `page_offset >= keypage_header && page_offset <= page_length' failure in _ma_apply_redo_index Created: 2012-04-16 Updated: 2019-06-18 Resolved: 2019-06-18 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - Aria |
| Affects Version/s: | 5.2.14, 5.3.12, 5.5, 10.0, 10.1, 10.2, 10.3, 10.4 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Minor |
| Reporter: | Dreas van Donselaar (Inactive) | Assignee: | Michael Widenius |
| Resolution: | Cannot Reproduce | Votes: | 0 |
| Labels: | Launchpad | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
One of our servers that was recently upgraded to MariaDB 5.3 crashed at startup:
Removing /var/lib/mysql/aria* "fixed" this. I'll upload the aria* files to FTP so this can be analyzed. |
| Comments |
| Comment by Elena Stepanova [ 2012-04-17 ] | ||||||||||||||||||
|
Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ; I've tried to "recover" with 5.3.5 and your Aria logs on top of an empty database, and didn't get the crash so far. Could you please provide the schema (for starters, at least DDL, if not the data), or, if you already did it earlier in previous bug reports, point at one that we can use? Thanks. | ||||||||||||||||||
| Comment by Dreas van Donselaar (Inactive) [ 2012-04-17 ] | ||||||||||||||||||
|
Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ; | ||||||||||||||||||
| Comment by Elena Stepanova [ 2012-04-18 ] | ||||||||||||||||||
|
Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ; The SQL you provided relies on a default engine for all tables. Do you have default engine Aria? When I load the scheme with default engine MyISAM and then initiate Aria recovery, it goes smoothly (not surprisingly as there is no single Aria table in the database). When I load it with default engine Aria and then attempt recovery, it fails, but in a different way – no crash, it just aborts right away on 0%, prints an error message, and the server shuts down. | ||||||||||||||||||
| Comment by Dreas van Donselaar (Inactive) [ 2012-04-26 ] | ||||||||||||||||||
|
Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ; | ||||||||||||||||||
| Comment by Elena Stepanova [ 2012-04-26 ] | ||||||||||||||||||
|
Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ; I am getting the crash on the new data. Thank you. | ||||||||||||||||||
| Comment by Elena Stepanova [ 2012-04-26 ] | ||||||||||||||||||
|
Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ; I removed unnecessary tables, but that's about all simplification I was able to perform. The data and aria logs are still big. 982872_datadir.tar.gz - already created and populated table and offending aria logs placed into the datadir. To reproduce the problem, decompress the archive and start MariaDB server on the extracted 'data' as a datadir (or run aria_read_log -a inside the datadir). 982872_aria_logs.tar.gz - separately archived aria logs. On release server, it crashes as described above. [Note] mysqld: Aria engine: starting recovery #6 0x00007fe752c3bd4d in _GI__assert_fail (assertion=0xdb8d72 "!new_page", file=<optimized out>, line=6355, aria_read_log -a (debug version) also aborts with an assertion: aria_read_log: ma_key_recover.c:988: _ma_apply_redo_index: Assertion `page_offset >= keypage_header && page_offset <= page_length' failed. A release version of aria_read_log -a crashes with segmentation fault. | ||||||||||||||||||
| Comment by Elena Stepanova [ 2012-04-26 ] | ||||||||||||||||||
|
Re: recovered pages: 0% 10% 20% 30% 40% 50% 60% 70% 80% 90%2d2d2d d:2d:2d [ERROR] mysqld got signal 11 ; | ||||||||||||||||||
| Comment by Rasmus Johansson (Inactive) [ 2012-04-30 ] | ||||||||||||||||||
|
Launchpad bug id: 982872 | ||||||||||||||||||
| Comment by Elena Stepanova [ 2019-05-01 ] | ||||||||||||||||||
|
Still reproducible on 10.4 0cbc9306:
| ||||||||||||||||||
| Comment by Michael Widenius [ 2019-06-18 ] | ||||||||||||||||||
|
The problem here is that the aria_log.# files contains wrong information or more likely, something is causing a table to have the wrong skip_redo_lsn. I have examined all related code and can't find anything obvious how this could happen. I have created some new code for 10.4 to harden some code that didn't look fail-proof, which don't think that is the definitive fix. To find the bug we should try to reproduce this problem with RQG. The problem is very likely to do with | ||||||||||||||||||
| Comment by Michael Widenius [ 2019-06-18 ] | ||||||||||||||||||
|
Unfortunately we can't reproduce the original problem that caused the aria_log files to contain wrong information. The log files has helped us to find the end result of the problem but not the original cause. However, the log files have helped us determinate that |