[MDEV-28880] Upgrade from MariaDB 10.3.32 to MariaDB 10.6.8 has errors/issues/documentation Created: 2022-06-17  Updated: 2023-01-22

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.32, 10.6.8
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: F K Assignee: Thirunarayanan Balathandayuthapani
Resolution: Unresolved Votes: 0
Labels: Compatibility, innodb
Environment:

Red Hat Enterprise Linux release 8.6 (Ootpa)
Official MariaDB repo


Issue Links:
Relates
relates to MDEV-24412 Mariadb 10.5: InnoDB: Upgrade after a... Closed

 Description   

Hello,
We upgraded 5 slaves and 1 master from MariaDB 10.3.32 to MariaDB 10.6.8
Here are error, issue, documentation suggestions we encountered.

1.
4 of 5 of the slaves had the following error:

[ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.3.32.

preventing startup after upgrade. so mariadb_upgrade could not be run and server could not be started

Therefore we downgraded back to 10.3.32 and ran MySQL commands

SET GLOBAL innodb_fast_shutdown = 0;
SET GLOBAL innodb_buffer_pool_dump_at_shutdown=OFF;

then shutdown then upgraded to 10.6.8 again with the same problem.

We had to delete the ib_logfile0 file, upgrade to 10.6.8 and restart to upgrade.

Here are related logs:

2022-06-15 15:17:56 0 [Note] InnoDB: FTS optimize thread exiting.
2022-06-15 15:17:56 1 [Note] InnoDB: to purge 2 transactions
2022-06-15 15:17:56 0 [Note] InnoDB: Buffer pool(s) load aborted due to user instigated abort at 220615 15:17:56
2022-06-15 15:17:56 0 [Note] InnoDB: Starting shutdown...
2022-06-15 15:18:07 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-06-15 15:18:07 0 [Note] InnoDB: Shutdown completed; log sequence number 30535472345669; transaction id 71002773175
2022-06-15 15:18:07 0 [Note] /usr/sbin/mysqld: Shutdown complete
 
2022-06-15 15:24:17 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-06-15 15:24:17 0 [Note] InnoDB: Using transactional memory
2022-06-15 15:24:17 0 [Note] InnoDB: Number of pools: 1
2022-06-15 15:24:17 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-06-15 15:24:17 0 [Note] InnoDB: Using Linux native AIO
2022-06-15 15:24:17 0 [Note] InnoDB: Initializing buffer pool, total size = 332859965440, chunk size = 134217728
2022-06-15 15:24:18 0 [Note] InnoDB: Completed initialization of buffer pool
2022-06-15 15:24:20 0 [ERROR] InnoDB: Upgrade after a crash is not supported. The redo log was created with MariaDB 10.3.32.
2022-06-15 15:24:20 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
2022-06-15 15:24:20 0 [Note] InnoDB: Starting shutdown...
2022-06-15 15:24:20 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-06-15 15:24:20 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2022-06-15 15:24:20 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-06-15 15:24:20 0 [ERROR] Failed to initialize plugins.
2022-06-15 15:24:20 0 [ERROR] Aborting

Here are related my.cnf settings:

innodb_log_files_in_group      = 1
innodb_log_file_size           = 16G
innodb_flush_log_at_trx_commit = 1

What we did not try: Stop slave replication manually before shutdown and then upgrade.
Other notes: We had major problems in the past before changing innodb_change_buffering to none. Basically restarting slaves broke some index in the database nearly all the time. We ran scripts to check all tables rebuild all the broken table indexes using OPTIMIZE TABLE. On the other hand one of the servers that did not get this error was one rebuild recently from mariadbbackup. However the backup server running the mariadbbackup had the same error upgrading. Just adding in case some recovery we did broke the database before upgrade, but we don't think that is the issue.
Other notes: Mariadb database is about 1.5 TB, with peak about 50,000 selects per second, 50 deletes per second, 500 inserts per second, 500 updates per second on master, with averages over half the peak

2.
On four out of five of the slaves after upgrade, we got

2022-06-16 13:57:57 8 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'

one server had it twenty times in the last month, one server had it four times, one server had it twice, one server had it once.
CHECK TABLE TABLENAMEREDACTED shows no error.
percona checksum showed no data difference.
There was no other slave replication problems.
TABLENAMEREDACTED is the same table on all servers, and it is actually relatively not a big table or busy table.

We performed OPTIMIZE TABLE TABLENAMEREDACTED on the master and slaves to recreate table just in case.
However, we are still gettings errors on slaves with the same table, even after performing OPTIMIZE TABLE TABLENAMEREDACTED.
Also, percona checksum still shows no data difference between servers.
CHECK TABLE TABLENAMEREDACTED still shows no error.
There are still no other slave replication problems.

Other notes: all servers are runing with the following setting which enable parallel replication.

slave_parallel_threads = 8

3.
The documentation in MariaDB suggests running mariadb_upgrade after updating MariaDB.
I would suggest putting in the documentation to restart mariadb after running mariadb_upgrade as that seems to potentially remove some problems that may occur because of startup without the fixes mariadb_upgrade does.



 Comments   
Comment by F K [ 2022-08-10 ]

Regarding issue number 2:
Here is additional information for one of the slaves where it occurred more commonly.
We noticed TABLENAMEREDACTED didn't have a primary key and thought maybe it is a bug related to "slave_parallel_threads = 8" setting.
Therefore we created a primary index on TABLENAMEREDACTED and then the error no longer occurs.
So this bug is possibly related to slave_parallel_threads setting and tables without primary keys.

2022-06-19 0:07:59 292734 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-19 0:11:40 292732 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-19 0:12:25 292733 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-19 0:13:16 292730 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-19 0:13:28 292731 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-19 0:13:59 292730 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-19 0:23:02 292732 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-21 18:49:49 292731 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-22 23:10:26 292734 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-22 23:12:24 292732 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-24 2:00:27 292727 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-06-26 18:38:07 292727 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-03 22:07:18 292734 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-05 21:46:53 292731 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-05 21:57:19 292732 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-15 22:07:36 292728 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-17 15:06:35 292730 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-17 20:53:22 292734 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-17 20:55:47 292728 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-21 0:58:59 292734 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-21 21:35:37 292730 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-27 23:31:06 292731 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-28 17:34:45 292733 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-28 21:25:53 292729 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-28 21:25:53 292730 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-28 21:25:53 292732 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-07-28 21:25:53 292733 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-08-01 22:58:41 292731 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-08-02 21:30:50 292731 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-08-02 21:32:23 292729 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
2022-08-02 23:38:02 292734 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'

Comment by F K [ 2022-08-20 ]

Follow up:
For issue number 2, after creating primary index on TABLENAMEREDACTED, the error has not shown up for about 20 days, so it probably ocurrs when the table does not have a primary index.

Comment by F K [ 2022-09-16 ]

Follow up:
For issue number 2, after creating primary index on TABLENAMEREDACTED, the error no longer shows up, so it ocurrs when the table does not have a primary index

Comment by Marko Mäkelä [ 2022-11-30 ]

This looks like a possible duplicate of MDEV-24412.

What were the following parameters before the upgrade?

innodb_log_file_size
innodb_log_files_in_group

A work-around for that bug was to change one of these parameters before upgrading, so that the log files were rebuilt. That error occurred when the log block at checkpoint is at least 4 GiB from the start of the log file.

To check if this is a duplicate of MDEV-24412, can you please post the first 2048 bytes of an ib_logfile0 that is claimed to be in crashed state? There is no application data within that byte range: only a format identifier and 2 blocks containing information for the latest 2 checkpoints.

Comment by F K [ 2022-11-30 ]

> What were the following parameters before the upgrade?
> innodb_log_file_size
> innodb_log_files_in_group
Before and after the upgrade they were and are
innodb_log_file_size = 16G
innodb_log_files_in_group is default I haven't set the setting at all manually.

> To check if this is a duplicate of MDEV-24412, can you please post the first 2048 bytes of an ib_logfile0 that is claimed to be in crashed state?
I kept the data for a while for this bug report but have since deleted the data since this ticket is more than 5 months old and no one had replied...

Comment by F K [ 2022-11-30 ]

Is it "ususally" safe the way we did it?
Shutdown 10.3 and delete the ib_logfile0 file, upgrade to 10.6.8 and start mariadb to upgrade

Comment by Marko Mäkelä [ 2022-11-30 ]

No, I can never recommend deleting the ib_logfile*. It is safer to let the server resize the redo log on startup. That is, change one or both the parameters (the minimum innodb_log_files_in_group in MariaDB is 1), start up and shut down the old server, and then upgrade to 10.5 or later.

I think that with 2×16G log files, you may very possibly have hit MDEV-24412. If you still have a backup of a old log file, we could verify that based on the first 2048 bytes.

Comment by F K [ 2022-11-30 ]

Thank you for reply.
Just a note, after this happened to be on the safe side we did run
CHECK TABLE on all tables
and percona checksum on all tables
without error or difference in tables.

Comment by Marko Mäkelä [ 2022-12-01 ]

FK, only if the FIL_PAGE_FILE_FLUSH_LSN in the system tablespace had been updated on shutdown, it was safe to delete the log files. Sometimes, a server shutdown could hang before that happens, and something could silently kill the shut down server. That is why I advise against manually deleting the files.

Nothing can reliably catch all inconsistencies caused by lost logs. The worst thing that can happen is not that some committed transactions written since the last log checkpoint could be lost, but that some data pages (which had been written between the latest checkpoint and the time the server was killed and the log files removed) would be newer than FIL_PAGE_FILE_FLUSH_LSN. The result could be broken links between index pages, or incorrect links to undo pages. MDEV-13542 removed many crashes due to this but it did not fix the corruption itself.

CHECK TABLE (especially after MDEV-24402) is a rather good consistency check, but it is not perfect.

Because of the widespread unsafe practice of deleting ib_logfile*, I implemented MDEV-27199 in MariaDB Server 10.8.

The feedback that this ticket is waiting for is a copy of the first 2048 bytes of ib_logfile0 for a cleanly shut down data directory for which an upgrade fails.

Comment by F K [ 2022-12-01 ]

> The feedback that this ticket is waiting for is a copy of the first 2048 bytes of ib_logfile0 for a cleanly shut down data directory for which an upgrade fails.

This is no longer available.

Of note, this ticket has two issues, one it the one you replied too, and one is regarding
2022-06-16 13:57:57 8 [ERROR] mariadbd: Can't find record in 'TABLENAMEREDACTED'
errors when there is no primary key.

Generated at Thu Feb 08 10:04:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.