Details
-
Bug
-
Status: Open (View Workflow)
-
Critical
-
Resolution: Unresolved
-
11.4
-
None
Description
I upgraded a MariaDB 10.11 node to 11.4.
On startup, the node crashed with:
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [ERROR] InnoDB: The change buffer is corrupted
|
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
|
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [Note] InnoDB: Starting shutdown...
|
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
|
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [Note] Plugin 'FEEDBACK' is disabled.
|
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [Warning] 'innodb-locks-unsafe-for-binlog' was removed. It does nothing now and exists only for compatibility with old my.cnf files.
|
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [Note] Using encryption key id 1 for temporary files
|
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [ERROR] Unknown/unsupported storage engine: innodb
|
Apr 10 20:07:15 http-srv02.ivt.ha.cyberfusion.cloud sh[3683009]: 2025-04-10 20:07:14 0 [ERROR] Aborting'
|
The change buffer was removed in https://jira.mariadb.org/browse/MDEV-29694
According to https://jira.mariadb.org/browse/MDEV-29694?focusedCommentId=269097&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-269097, if the change buffer was ever enabled in the past, it may be needed to 'flush' by shutting down with innodb_fast_shutdown=0 pre-upgrade.
Although the aforementioned comment says:
We might choose to revise the upgrade logic in 11.0 to work in a similar fashion. Until or unless that is done, I think that we must document this.
... this behaviour is not documented in the upgrade guide (https://mariadb.com/kb/en/upgrading-from-mariadb-10-11-to-mariadb-11-4/).
I followed the advice as follows:
- Downgrade MariaDB 11.4 to 10.11 (fresh data directory with SST)
- Set innodb_fast_shutdown: set global innodb_fast_shutdown=0;
- Stop MariaDB: systemctl stop mariadb
- Upgrade MariaDB 10.11 to 11.4
Unfortunately, that lead to the same error.
I have also tried setting innodb_fast_shutdown=0 on every node in the cluster pre-upgrade.
Finally, I considered upgrading not skipping versions (i.e. 10.11 -> 11.0 -> 11.1 -> 11.2 -> 11.3 -> 11.4), but only 10.11 and 11.4 (both LTS versions) are available in MariaDB's Debian repos. (Also, skipping versions shouldn't be a problem in general, and not with Galera either according to https://lists.mariadb.org/hyperkitty/list/discuss@lists.mariadb.org/thread/YGI7TEBXCLWW7ESOFRKWEMT6UY6S5S76/)
Relevant data points to this particular case:
- Galera cluster
- 4 nodes
- Cluster started on MariaDB 10.3 (when the change buffer was enabled by default)
- innodb_change_buffering=none at the time of upgrading (default changed in https://jira.mariadb.org/browse/MDEV-27734)
In summum, there are a few issues at hand:
- In https://jira.mariadb.org/browse/MDEV-29694?focusedCommentId=269097&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-269097, @marko mentioned that the innodb_fast_shutdown peculiarity should be documented. That hasn't been followed up AFAICS.
- In https://jira.mariadb.org/browse/MDEV-32044?focusedCommentId=269455&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-269455, @marko expressed the intention to "disregard any buffered changes for pages for which there are no buffered changes according to the change buffer bitmap.", meaning the innodb_fast_shutdown 'trick' shouldn't be needed in most cases IIUC. The issue seems to have been closed with no changes.
At the moment, my upgrade to 11.4 is stuck. Anyone with more knowledge of the change buffer change history? Opportunistically tagging @marko here as he seems to have done a lot of work on this.
Attachments
Activity
Transition | Time In Source Status | Execution Times |
---|
|
9h 27m | 1 |
|
31d 1h 56m | 1 |