[MDEV-30397] InnoDB crash due to DB_FAIL reported for a corrupted page Created: 2023-01-12 Updated: 2023-03-02 Resolved: 2023-02-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0 |
| Fix Version/s: | 10.11.3, 11.0.1, 10.6.13, 10.7.8, 10.8.8, 10.9.6, 10.10.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Richard Green | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
windows 10 pro 22H2 64-bit 19045.2364 |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
I can connect to MariaDB with HeidiSQL and MySqlConnector, but when I try to view table data (all tables InnoDB) or make a query the MariaDB service crashes. I get a popup: "SQL Error (2013): Lost connection to the MySQL server during query." From the .err log 2023-01-12 17:33:30 3 [ERROR] [FATAL] InnoDB: Unknown error Failed, retry may succeed |
| Comments |
| Comment by Marko Mäkelä [ 2023-01-13 ] | ||||||||||||||||
|
Thank you for the report. There is a nice stack trace in DX-APP06.err
The InnoDB error code is DB_FAIL. Also the SELECT statement is present. Is this reproducible when loading a copy of the table imcleandata.imcleansyncdataadv to a newly initialized database? Can you produce the minimal SQL statements (CREATE TABLE, INSERT, SELECT) for reproducing this crash? | ||||||||||||||||
| Comment by Richard Green [ 2023-01-13 ] | ||||||||||||||||
|
Hi, With DBeaver I was able to see the data in the table briefly before it crashed (service stopped). With HeidiSQL I cannot get that far it crashes before showing data. Any query from MySqlConnector in C# causes the MariaDB service to stop. This would be the minimal statement that does it: | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-01-14 ] | ||||||||||||||||
|
Can you provide the CREATE TABLE statement for this table? | ||||||||||||||||
| Comment by Vladislav Vaintroub [ 2023-01-14 ] | ||||||||||||||||
|
If it crashes with HeidiSQL, it might be that a LIMIT is involved, same as your C# example that contains LIMIT, and "ORDER BY DESC" | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-01-14 ] | ||||||||||||||||
|
The InnoDB error code DB_FAIL is usually associated with data modifications, not with reads, like in the stack trace. It is a long shot, but the error could be issued due to a failed change buffer merge (which would have crashed the server until | ||||||||||||||||
| Comment by Richard Green [ 2023-01-16 ] | ||||||||||||||||
|
Here is the CREATE statement: CREATE TABLE `imcleansyncdataadv` ( | ||||||||||||||||
| Comment by Richard Green [ 2023-01-16 ] | ||||||||||||||||
|
I'm afraid I have had to move on, I uninstalled MariaDB from the server. I am now trying mysql to see if I run into the same issues. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-01-17 ] | ||||||||||||||||
|
Because there is a secondary index on the column id_orig, corruption of the change buffer is a possible cause of this. It is unfortunate that we lost the data set. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-01-19 ] | ||||||||||||||||
|
I do not see how the change buffer code could return DB_FAIL up to the caller. The cause of this crash remains a mystery. | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-08 ] | ||||||||||||||||
|
I can reproduce this behaviour on 10.6 by modifying buf_LRU_free_page() so that it would attempt to evict modified pages of temporary tables. There obviously is some flaw with that modification, but it would reproduce this error in a couple of tests that exercise temporary tables. In other words, we are attempting some operation on a corrupted page, getting DB_FAIL and not handling it in a consistent fashion (I suppose, by initiating a rollback). | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-08 ] | ||||||||||||||||
|
In my case (caused by a buggy code change that I am working on), we are reading a page that has been filled by NUL bytes, and returning DB_FAIL due to that. Other corruption detected by that function results in a different error code:
| ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-08 ] | ||||||||||||||||
|
The DB_FAIL return value was added to that function in | ||||||||||||||||
| Comment by Marko Mäkelä [ 2023-02-09 ] | ||||||||||||||||
|
The special return value is needed so that fil_aio_callback() can avoid reporting an error when read-ahead is covering an unallocated page. The error code needs to be mapped for synchronous reads, in buf_read_page_low(). |