[MDEV-23136] InnoDB init fail after upgrade from 10.4 to 10.5 Created: 2020-07-09 Updated: 2022-01-17 Resolved: 2021-09-13 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.5.4 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Christian Rishøj | Assignee: | Marko Mäkelä |
| Resolution: | Not a Bug | Votes: | 2 |
| Labels: | None | ||
| Environment: |
Ubuntu 20.04 / Linux 5.4.0 |
||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
After an upgrade from 10.4 to 10.5 (following the instructions on https://mariadb.com/kb/en/upgrading-from-mariadb-104-to-mariadb-105/), MariaDB fails to start:
Output from sudo journalctl -u mariadb.service indicates a problem with max_open_files, but this seems to be just a warning:
Downgrading to 10.4 works. InnoDB config:
How can get further insight into the problem with InnoDB initialization? |
| Comments |
| Comment by Eugene Kosov (Inactive) [ 2020-07-10 ] | ||||||||||||||||||||||||||||||||||||||||
|
It's not clear at all what happened inside InnoDB. As a wild guess could you try innodb-flush-method = fsync? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-10 ] | ||||||||||||||||||||||||||||||||||||||||
|
The clue is in the very first error message:
Could it be that the 10.4 server was not shut down normally, for example because the server hung during the shutdown process? Or did you attempt to start up 10.5 on a backup of a 10.4 data directory without first running mariabackup --prepare with the 10.4 executable? There also is a theoretical possibility is that the logic for detecting whether the old-format redo log file was cleanly shut down is faulty. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Rishøj [ 2020-07-10 ] | ||||||||||||||||||||||||||||||||||||||||
|
10.4 was shut down cleanly, with innodb_fast_shutdown = 1. Could 0 possibly help? I'm starting 10.5 on the data directory of 10.4 itself, not a backup. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Rishøj [ 2020-07-10 ] | ||||||||||||||||||||||||||||||||||||||||
|
Is innodb_fast_shutdown = 0 and removing ib_logfile* after shutdown worth a try? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-10 ] | ||||||||||||||||||||||||||||||||||||||||
|
Any other value than innodb_fast_shutdown=2 should be fine. I would like to get a copy of the ib_logfile* files for repeating and analyzing the failure. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Henrik Nicolaisen [ 2020-07-20 ] | ||||||||||||||||||||||||||||||||||||||||
|
I am getting the same error when updating some docker hosts from 10.4 to 10.5 I ran with the default settings, and tried setting innodb_fast_shutdown=0 I tried shutting down correctly on 10.4 but no mater what I do when switching to 10.5 I get this error. If I try removing ib_logfile* 10.5 boots but, I get errors like this :
| ||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-20 ] | ||||||||||||||||||||||||||||||||||||||||
|
hmn, can you please show the 10.4 server error log messages for the shutdown? While it is not recommended practice to remove the redo log files under any circumstances, it should be ‘safe’ to remove those files after a clean shutdown of the server up to 10.5. Any messages about the log sequence number being in the future on any data pages suggests that the log had not been shut down properly. (Note: When we finally implement MDEV-11633, there will be no other central place to store the latest log sequence number than the log itself. At that point we must refuse to start up InnoDB if the log files are missing.) One more thing that you could do for troubleshooting is to start up a debug version of the 10.4 server on a backup copy of the data where the ib_logfile* still exist, using the option --debug=d,ib_log. If you see any messages about log records being parsed or applied, then we know that the redo log was not actually logically empty, or in other words, the server had not been shut down cleanly. Even without a debug server, if you see a startup message
then you should know that the server had not been shut down cleanly. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Henrik Nicolaisen [ 2020-07-20 ] | ||||||||||||||||||||||||||||||||||||||||
|
My problem was definetly an unclean shutdown. We are running docker through nomad and it did not send stop signals correctly so after adding timeout and SIGTERM it managed to get a "Shutdown complete" in the log. Then I could switch to 10.5 and it started updating the files and bootet correctly. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-20 ] | ||||||||||||||||||||||||||||||||||||||||
|
Thank you, hmn! The 10.5 startup code is working as expected for you then. I do not expect future versions to remove crash-upgrade support from 10.5 any time soon. Also in the past, MariaDB 10.2 removed crash-upgrade support from earlier versions, and MariaDB 10.4 removed crash-upgrade support for the 10.2 innodb_safe_truncate=OFF format that was replaced in On a related note, when upgrading from 10.2 or earlier to 10.3 or later, a slow shutdown is recommended due to undo log format changes that were introduced in crishoj, can you please let us know if you have a genuine case where a log file from a clean 10.4 shutdown was incorrectly identified as a nonempty log file by 10.5 startup? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Christian Rishøj [ 2020-07-27 ] | ||||||||||||||||||||||||||||||||||||||||
|
An unclean shutdown may also have been the culprit in my case. On a second try, the upgrade went smoothly. 2020-07-28 1:36:47 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts) | ||||||||||||||||||||||||||||||||||||||||
| Comment by Olaf Buitelaar [ 2020-11-11 ] | ||||||||||||||||||||||||||||||||||||||||
|
I had a similar issue, i was running 10.4.14 and on some machines it refused to upgrade to 10.5.7 (running in docker). I was certain the containers did shutdown properly, and during startup it didn't report any redo log's needed to be applied; shutdown; startup: I also deployed all images with innodb-fast-shutdown=0, but to no avail. Intrestingly i tried to upgrade to 10.4.16 first, but the 10.5.7 image reads the redo logs as being from 10.4.14. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-04-15 ] | ||||||||||||||||||||||||||||||||||||||||
|
olafbuitelaar, sorry, I did not notice your message until now. Your log excerpt indicates that the 10.4 server was shut down properly, and yet the 10.5 server refused to start up. If you or anyone encounter this, please provide:
If the old server (such as 10.4) startup indicates that crash recovery was in fact needed, then that must be filed as a different bug, affecting the old server’s shutdown. If the old server startup shows that no recovery was needed, then the 10.5 code that attempts to detect whether the log files are logically empty is wrong. And for that we would need a copy of the ib_logfile* right after the 10.5 startup was refused. A start-up attempt with an older server may ‘ruin’ the evidence that we would need to fix this bug! If you are absolutely sure that the 10.4 or older server was shut down properly, you can delete the files and start up the 10.5 server. If you just blindly do it, you may end up losing all your InnoDB tables, as reported in | ||||||||||||||||||||||||||||||||||||||||
| Comment by Olaf Buitelaar [ 2021-04-15 ] | ||||||||||||||||||||||||||||||||||||||||
|
I'm sorry i don't have ib_logfiles any more, and all our are instances are upgraded to 10.5, so i cannot try to reproduce it easily anymore. I solved it by changing the innodb_log_file_size settings to force a rewrite of the logfile. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Michael Caplan [ 2021-08-25 ] | ||||||||||||||||||||||||||||||||||||||||
|
@marko similar issue, as described here (version numbers differ) Going through the process of creating a new slave including upgrade process from MariaDB 10.2.22 (serverOld) to 10.5.12 on a new server (serverNew) (as inspired by the brighter mind here )
In actual practice step #5 failed numerous times before what I think is now an okay serverNew Innodb settings between serverOld and serverNewserverOld
serverNew
The only difference is `innodb-log-files-in-group` setting is removed in serverNew, as it has been depricated and removed in 10.5 Upgrade after crash?When attempting to start serverNew, it would fail with the following error:
the log file on serverOld showed no evidence of crashing. None the less, I restarted serverOld, and reran the process above. Still the same issue step #5. Remove ib_logfile* filesI then tried forcing the recreation of the redo logs by removing them and then starting up serverNew. This resulted in the following scary errors numerous times:
innodb_fast_shutdown = 0The next attempt started with serverOld being restarted and innodb_fast_shutdown = 0 being set beforehand. I reran the above steps and still got stuck on step #5 with the following error:
This time, however, when I removed the ib_logfile* files, we seemed to have an okay startup:
Running mysql_upgrade completed without issue. I'm not feeling confident that this upgrade is actually okay. removing the ib_logfile* seems questionable as a requirement to get the upgrade done. What should I be doing different? | ||||||||||||||||||||||||||||||||||||||||
| Comment by Wesley Oliver [ 2021-09-04 ] | ||||||||||||||||||||||||||||||||||||||||
|
@Michael Caplan Your fix of removing ib_logfile* files and innodb_fast_shutdown = 0 worked for me too. No data loss as far as I tell. thanks for the workaround! | ||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-09-13 ] | ||||||||||||||||||||||||||||||||||||||||
|
michaelcaplan, sharpsounds, it is extremely dangerous to ever delete or rename the redo log files. That would remove any quarantee about any changes that may have been written since the latest log checkpoint. Furthermore, using regular copying methods such as rsync while InnoDB is running is not safe, except when you know that the buffer pool does not contain any pending changes and that no changes will be applied while copying. Maybe, if you ran another rsync after the first one and it did not report any changes, it might be somewhat safe, but only if you can guarantee that the redo log file is logically empty. Note that a command like FLUSH TABLES WITH READ LOCK will not suspend any InnoDB internal writes from background threads. These include merging the change buffer, purging transaction history, applying changes to fulltext indexes, rotating encryption keys, and updating persistent statistics. Completing a slow shutdown (shutdown with innodb_fast_shutdown=0) before the upgrade was never a real requirement. Completing a clean shutdown is. If you perform funny steps, you may get funny results, possibly in the form of weird crashes or corruption some time in the future. If CHECK TABLE does not report errors on any table after such an upgrade, you might be safe. If you do not want to shut down the old server before upgrading, you can use our backup tool to safely copy the data. Before starting the new server, be sure to run mariabackup --prepare on the backed up data directory. | ||||||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2021-12-02 ] | ||||||||||||||||||||||||||||||||||||||||
|
Note: A bug in the 10.5.12 release causes the upgrade wizard for Microsoft Windows to abruptly kill the 10.4 server, instead of shutting it down gracefully. This would then lead to the 10.5 server refusing to start up. |