[MDEV-24412] Mariadb 10.5: InnoDB: Upgrade after a crash is not supported Created: 2020-12-15 Updated: 2023-02-22 Resolved: 2022-11-28 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB, Upgrades |
| Affects Version/s: | 10.5.6, 10.5.8 |
| Fix Version/s: | 10.5.19, 10.6.12, 10.7.8 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Xesh | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 1 |
| Labels: | not-10.8, upgrade | ||
| Environment: |
Centos 8.1.1911 |
||
| Attachments: |
|
||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||
| Description |
|
After upgrading MariaDB from 10.4.17 to 10.5.8, it did not start, it just crashed. 2020-12-15 14:08:31 0 [Warning] You need to use --log-bin to make --binlog-format work. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help Server version: 10.5.8-MariaDB-log Thread pointer: 0x0 CONFIG [mysqld] bind-address = 0.0.0.0 transaction-isolation = READ-COMMITTED binlog_format=ROW innodb_data_file_path=ibdata1:50M;ibdata2:50M:autoextend sql_mode=NO_ENGINE_SUBSTITUTION innodb_strict_mode=off innodb_flush_neighbors=0 |
| Comments |
| Comment by Xesh [ 2020-12-15 ] | |||||||||||||
|
I have just downgraded it to 10.5.4 and it is working without any changes. Just stopped 10.5.8, downgraded to 10.5.4, and start MariaDB. | |||||||||||||
| Comment by Marko Mäkelä [ 2020-12-15 ] | |||||||||||||
|
What happened before the failed restart attempt? Could the database have been corrupted by the bug in 10.5.7 that was fixed in | |||||||||||||
| Comment by Xesh [ 2020-12-16 ] | |||||||||||||
|
That bug has fixed in 10.5.8, right? I did not make the upgrade to 10.5.7, it was directly from 10.4.17 to 10.5.8. I restore the database from the backup with version 10.4.15, and it started without a problem. After that, I've made an upgrade to 10.5.6 and it started normally. | |||||||||||||
| Comment by Marko Mäkelä [ 2020-12-16 ] | |||||||||||||
|
I wonder why you are setting innodb_force_recovery at all. Its main purpose is allow data to be rescued in case the database is corrupted. In particular, setting innodb_force_recovery=6 will cause the write-ahead log to be skipped altogether. If the server was previously killed and not shut down cleanly, any kind of corruption will be possible. | |||||||||||||
| Comment by Xesh [ 2020-12-17 ] | |||||||||||||
|
It was left by mistake. | |||||||||||||
| Comment by Marko Mäkelä [ 2020-12-17 ] | |||||||||||||
|
I think that any database instance where innodb_force_recovery was set to 6 should be treated as seriously corrupted (only valid for extracting data with mysqldump or similar, and also that data dump should be carefully checked for any corruption). The only exception might be that it was known that the previous server runs ended in a clean shutdown. Also, before The redo log and the undo log pages are the glue that make atomic any changes that involve multiple index pages or multiple indexes of a table. If those mechanisms are disabled, anything can happen. For example, imagine what would happen if the server was killed and proper log recovery was disabled in the middle of a B-tree page split or merge. Another sign of this kind of ‘garbage in, garbage out’ situation would be error messages "log sequence number is in the future". I am sorry, but I do not think that this can be addressed by a code fix (other than converting crashes to nicer error messages, as in | |||||||||||||
| Comment by Xesh [ 2020-12-18 ] | |||||||||||||
|
1. Before I upgraded from 10.4.17 to 10.5.8, 10.4.17 is shutdown with only innodb_fast_shutdown = 0 option. None of the innodb_force_recovery options is used with this upgrade. During the start of 10.5.8, there was a message that ibtmp1 should be deleted with option innodb_fast_shutdown = 0 from the previous message. So 10.4.17 was stopped with this option. Your answer stands for "1." issue, upgrade right? | |||||||||||||
| Comment by Marko Mäkelä [ 2020-12-19 ] | |||||||||||||
|
shexphobos, can you provide a full sequence of steps that reproduce the problem? The option innodb_force_recovery is dangerous by design, and it is very well known that it may cause permanent, irrecoverable corruption. I am guessing that you set innodb_force_recovery=6 because the logic that was implemented in | |||||||||||||
| Comment by Xesh [ 2020-12-21 ] | |||||||||||||
|
systemctl stop mariadb (10.4.15) After 10.5.6 is started normally, I uncommented innodb_force_recovery=6 and after the next restart, it crashed. Yes, there is a message "[ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with Backup 10.4.15-MariaDB." when you try to start 10.5.6 without the previous shutdown of 10.4.15 without innodb_fast_shutdown=0. | |||||||||||||
| Comment by Xesh [ 2020-12-21 ] | |||||||||||||
| Comment by Marko Mäkelä [ 2020-12-21 ] | |||||||||||||
|
shexphobos, let me reopen this and change the title to something that is closer to the original root cause. I maintain that the consequences of setting innodb_force_recovery=6 are not a bug. You say that MariaDB 10.5.6 and 10.5.8 refused to start up with a log file that was apparently created with Mariabackup 10.4. How did you get into that situation? The normal workflow for restoring a backup should be something like this:
I think that the following shortcut might work for restoring a backup from 10.3 or later to a newer server:
By the way, I think that it is a good idea to run mariabackup --prepare always after mariabackup --backup, not only to save time when restoring the backup, but also to validate the backup (so that in case the backup was corrupted in some way, you will have a chance to get a valid backup). To avoid extra I/O workload on the database server, that command can be run on a separate system. Can you clarify what exactly you were doing, and whether my suggested procedures would work. Even if we had no code bug, our documentation might need some clarification. | |||||||||||||
| Comment by Xesh [ 2020-12-22 ] | |||||||||||||
|
I'll start from the beginning.
| |||||||||||||
| Comment by Marko Mäkelä [ 2021-01-13 ] | |||||||||||||
|
The lines in 10.5.8 crash.txt If I understood correctly, there is no actual problem with the 10.5 server, and the whole chain of events was caused by a crashing bug in 10.4 (which I hope has been filed in a separate ticket). Apparently, the problems in 10.5 were caused by accidentally forgetting to remove the innodb_force_recovery=6 setting from the configuration, and there really is not much that we can improve in the code. shexphobos, do you agree? Theoretically, we could change the startup so that when innodb_force_recovery=6 is specified, a ‘lecture’ will be written to the error log and startup will be refused. The (disastrous) effect of the setting could still be achieved by manually deleting the log files. This is something that we could do in the next development version. I suspect that we have a similar case of incorrectly performed upgrade in | |||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-13 ] | |||||||||||||
|
shexphobos The log you posted above, is it from the first crash? Now your submission includes the logs from when you tried to recover and are unable to do so. But the first time it crashed something else maybe happened that corrupted the files and got you into trying innodb_force_recovery? If so, please post the journald log from the first crash. | |||||||||||||
| Comment by Otto Kekäläinen [ 2021-01-13 ] | |||||||||||||
|
@Xesh See https://jira.mariadb.org/browse/MDEV-24578. Does your case seem similar? Do you get any results from running `journalctl -u mariadb | grep "InnoDB: Assertion failure"`? | |||||||||||||
| Comment by Marko Mäkelä [ 2022-11-25 ] | |||||||||||||
|
With ib_logfile0_000-7FF.bin
This attempted to parse the 512 bytes read from ib_logfile0 at byte offset 1,734,884,864.
This attempted to parse the 512 bytes read from ib_logfile1 at byte offset 6,029,852,160-5,368,709,120=661,143,040. I checked the file offsets by debugging tools. For a complete test, we must write a fake 512-byte empty log block at the correct offset of the file ib_logfile1. | |||||||||||||
| Comment by Marko Mäkelä [ 2022-11-25 ] | |||||||||||||
|
The 2 checkpoint blocks in ib_logfile0_000-7FF.bin
I believe that this affects all upgrades where innodb_log_files_in_group×innodb_log_file_size exceeds 4 gigabytes and the current log position is located more than 4 gigabytes from the start of ib_logfile0. Multiple log files (before innodb_log_files_in_group was hard-wired to 1 in MariaDB Server 10.5) were treated as if all files had been catenated together. | |||||||||||||
| Comment by Marko Mäkelä [ 2022-11-28 ] | |||||||||||||
|
In MariaDB 10.8, the upgrade check was refactored as part of
| |||||||||||||
| Comment by Delisson Silva [ 2023-02-22 ] | |||||||||||||
|
We just hit this bug and that caused our DB to fail upgrading to 10.6 when we tried to go to 10.6.11. Upgrading to 10.6.12 worked and we were able to reproduce this very easily by, like Marko mentioned, having innodb_log_files_in_group×innodb_log_file_size exceed 4G and have the log position be over 4G past the start of ib_logfile0. Shouldn't there be a warning on 10.6 versions' changelog page prior to 10.6.12 about this issue? It was quite jarring for us and the issue halted our upgrade process. |