[MDEV-27765] MariaDB stopped to work randomly - misery started at "Unable to find a record to delete-mark" Created: 2022-02-07 Updated: 2024-01-16 Resolved: 2024-01-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | None |
| Affects Version/s: | 10.5.13, 10.7.1 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Tristan Kundrat | Assignee: | Marko Mäkelä |
| Resolution: | Incomplete | Votes: | 0 |
| Labels: | crash | ||
| Environment: |
Fedora 34, 64bit |
||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
I was using my nextcloud instance as normal, until I suddenly got a status 500 code in my browser. Went to investigate, as I had changed nothing about my setup or anything.
There are many other errors after that however, they are all the same (endless tries of starting mariadb):
I already tried upgrading MariaDB, but version 10.7 also doesn't work, so I reverted back to 10.5. If any more/different files/logs/etc are needed, just tell me. |
| Comments |
| Comment by Marko Mäkelä [ 2022-02-08 ] | ||||
|
Does this occur when you run the server with innodb_change_buffering=none (which we plan to set by default, in Note: You will have to rebuild the affected secondary indexes, by executing DROP INDEX and CREATE INDEX (or ALTER TABLE…ADD INDEX). Disabling the change buffering will only prevent further corruption from being introduced by it, but fix already caused corruption. | ||||
| Comment by Tristan Kundrat [ 2022-02-08 ] | ||||
|
Thanks for your comment.
But it still gives the same error in systemd:
The mariadb.log error is still the same. But it tries to roll back 3 transactions as before, isn't that what you wanted me to disable? | ||||
| Comment by Marko Mäkelä [ 2022-02-08 ] | ||||
|
Sorry, I forgot a "not" in my previous comment. Secondary indexes that are already corrupted will not be automatically fixed by disabling the change buffering. | ||||
| Comment by Tristan Kundrat [ 2022-02-08 ] | ||||
|
But how am I supposed to fix the db when I can't access it? | ||||
| Comment by Tristan Kundrat [ 2022-02-08 ] | ||||
|
Fixed it! | ||||
| Comment by Marko Mäkelä [ 2022-11-10 ] | ||||
|
TriKun, good for you. I think that innodb_force_recovery=3 would have been a safer way to achieve the same. innodb_force_recovery=6 (or deleting ib_logfile0 before A minimal work-around could have been
and a slightly more "overkill" fix would have been
(rebuilding the entire table, not just the secondary indexes).
| ||||
| Comment by Marko Mäkelä [ 2022-11-10 ] | ||||
|
I realize that the crash occurred on the rollback of recovered incomplete transactions, which was prompting a change buffer merge. innodb_force_recovery=3 prevents the rollback, but the locks held on the table should prevent DROP INDEX or ALTER TABLE from running. You could have taken an SQL dump of the table (with SELECT), shut down the server, deleted the file oc_filecache.ibd, restarted the server, and finally DROP TABLE nextcloud.oc_filecache; and restoring the SQL dump. I believe that this error is the first step towards the crash of | ||||
| Comment by Marko Mäkelä [ 2022-11-23 ] | ||||
|
While analyzing failures from a stress test of the fix
With ROW_FORMAT=COMPRESSED the impact should be more severe, because the estimates of page fullness (a prerequisite to ensure that all buffered inserts will fit into a page) are more pessimistic there, to ensure that no compression overflow will occur. Due to the bogus garbage entries that are being merged, the logic will be broken. As far as I can tell, all MySQL and MariaDB versions are affected by this. The code changes that were applied in The bottom line is: When the change buffer is enabled (which it was by default until | ||||
| Comment by Marko Mäkelä [ 2023-12-14 ] | ||||
|
TriKun, can you still reproduce this with a newer version of MariaDB Server? Note that you should rebuild the affected tables (OPTIMIZE TABLE or similar); otherwise you may see the effects of old corruption. For ROW_FORMAT=COMPRESSED tables it is known that ROLLBACK may cause corruption. See MDEV-32174. |