[MDEV-22918] innodb_init() returned 11 (Generic error) upon incremental backup Created: 2020-06-16  Updated: 2022-10-06  Resolved: 2022-09-14

Status: Closed
Project: MariaDB Server
Component/s: mariabackup
Affects Version/s: 10.4
Fix Version/s: 10.4.27, 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: None

Attachments: File mbackup_backup_0.log     File mbackup_backup_1.log     File mbackup_backup_2.log     File mbackup_backup_3.log     File mbackup_prepare_0.log     File mbackup_prepare_1.log    
Issue Links:
Duplicate
duplicates MDEV-18589 Assertion `fil_space_t::physical_size... Closed
Relates
relates to MDEV-18550 InnoDB: Failing assertion: success du... Open
relates to MDEV-22913 Error: xtrabackup_apply_delta() upon ... Closed

 Description   

Note: This report is about non-debug 10.4 build. 10.4 debug fails with MDEV-18589 instead, and 10.5 non-debug fails with MDEV-22913.

10.4 non-debug 49ac606a

2020-06-17  1:55:47 0 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
2020-06-17  1:55:47 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=516541629
 
<cut: 11685 lines Ignoring data file '...' with space ID 140296, since the redo log references test/t1.ibd with space ID ...>
 
2020-06-17  1:55:48 0 [Note] InnoDB: Starting a batch to recover 3973 pages from redo log.
2020-06-17  1:55:49 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
[00] FATAL ERROR: 2020-06-17 01:55:49 mariabackup: innodb_init() returned 11 (Generic error).

To reproduce (same test as in MDEV-22913):
To reproduce:

git clone https://github.com/MariaDB/randgen --branch mdev22913 rqg-mdev22913
rqg-mdev22913
perl ./runall-trials.pl --basedir=<basedir> --trials=10 --skip-gendata --duration=150 --threads=4 --scenario=MariaBackupIncremental --grammar=mdev22913.yy --vardir=/dev/shm/var_mdev22918 --output="returned 11"

Remember to set the basedir on the command line.
For reproducing, use a 10.4 non-debug build.

The grammar in the test above only contains single query:

query:
  CREATE OR REPLACE TABLE { 't'.$prng->int(1,9) } (b BIT) ENGINE=InnoDB ROW_FORMAT COMPRESSED;

The command line above runs the test 10 times, until it fails with "returned 11" output. It may take a while, because it can fail with different problems, particularly MDEV-18589 (the non-debug variation "innodb_init() returned 39 (Data structure corruption)").

Other versions/builds fail with different problems, see MDEV-22913 for more details.



 Comments   
Comment by Marko Mäkelä [ 2022-09-14 ]

I suspect that this bug is a duplicate of MDEV-18589, like I confirmed the 10.5 version MDEV-22913 of this to be.

Alas, I failed to reproduce this bug even after reverting the MDEV-18589 fix.

10.4 289105e282fb5f7865bb7ddb6226149ac5bdf2b8

# 2022-09-14T13:22:44 [530307] Backup #3 finished successfully
# 2022-09-14T13:22:44 [530307] #-------------------------#
# 2022-09-14T13:22:44 [530307] #-- Stopping the server --#
# 2022-09-14T13:22:44 [530307] #-------------------------#
# 2022-09-14T13:22:44 [530307] Stopping server on port 19300
# 2022-09-14T13:22:44 [530388][ERROR] Executor::MySQL::execute: Failed to reconnect after getting STATUS_SERVER_CRASHED
# 2022-09-14T13:22:44 [530388] Executor::MySQL::execute: Query: CREATE /* QNO 26556 CON_ID 12 */ OR REPLACE TABLE t1 (b BIT) ENGINE=InnoDB ROW_FORMAT COMPRESSED failed: 2013 Lost connection to server during query (STATUS_SERVER_CRASHED)
# 2022-09-14T13:22:56 [530307] Stopping server on port 19300
# 2022-09-14T13:22:58 [530307] Server has been stopped
# 2022-09-14T13:22:58 [530307] [530307] /mariadb/rqg-mdev22913/runall-new.pl will exit with exit status STATUS_OK (0)
# 2022-09-14T13:22:58 [527891] Trial 10 ended with exit status STATUS_OK (0)
# 2022-09-14T13:22:58 [527891] Output did not match the pattern 'returned 11', result will be ignored
# 2022-09-14T13:22:58 [527891] ./runall-trials.pl will exit with exit status FAILED TO REPRODUCE (0)

I did not find a core dump or even server error log output for the crash. I did not notice any message about --prepare failure in the log.

I will next try the test with the MDEV-18589 fix.

Comment by Marko Mäkelä [ 2022-09-14 ]

I got a similar outcome also with the MDEV-18589 fix:

10.4 3e3cfa893481abe9524a1657c4246fa9f91d4826

# 2022-09-14T13:50:29 [533732] Backup #3 finished successfully
# 2022-09-14T13:50:29 [533732] #-------------------------#
# 2022-09-14T13:50:29 [533732] #-- Stopping the server --#
# 2022-09-14T13:50:29 [533732] #-------------------------#
# 2022-09-14T13:50:29 [533732] Stopping server on port 19300
# 2022-09-14T13:50:29 [533815][ERROR] Executor::MySQL::execute: Failed to reconnect after getting STATUS_SERVER_CRASHED
# 2022-09-14T13:50:29 [533815] Executor::MySQL::execute: Query: CREATE /* QNO 25383 CON_ID 14 */ OR REPLACE TABLE t1 (b BIT) ENGINE=InnoDB ROW_FORMAT COMPRESSED failed: 2013 Lost connection to server during query (STATUS_SERVER_CRASHED)
# 2022-09-14T13:50:29 [533810] ErrorFilter: Query: CREATE /* QNO 25424 CON_ID 13 */ OR REPLACE TABLE t4 (b BIT) ENGINE=InnoDB ROW_FORMAT COMPRESSED failed: 1927 Connection was killed. Further errors of this kind will be suppressed.
# 2022-09-14T13:55:45 [534318] Backup #3 finished successfully
# 2022-09-14T13:55:45 [534318] #-------------------------#
# 2022-09-14T13:55:45 [534318] #-- Stopping the server --#
# 2022-09-14T13:55:45 [534318] #-------------------------#
# 2022-09-14T13:55:45 [534318] Stopping server on port 19300
# 2022-09-14T13:55:45 [534397] ErrorFilter: Query: CREATE /* QNO 25549 CON_ID 14 */ OR REPLACE TABLE t9 (b BIT) ENGINE=InnoDB ROW_FORMAT COMPRESSED failed: 1927 Connection was killed. Further errors of this kind will be suppressed.
# 2022-09-14T13:55:45 [534404][ERROR] Executor::MySQL::execute: Failed to reconnect after getting STATUS_SERVER_CRASHED
# 2022-09-14T13:55:45 [534404] Executor::MySQL::execute: Query: CREATE /* QNO 25582 CON_ID 15 */ OR REPLACE TABLE t3 (b BIT) ENGINE=InnoDB ROW_FORMAT COMPRESSED failed: 2013 Lost connection to server during query (STATUS_SERVER_CRASHED)
# 2022-09-14T13:55:45 [534404] Mixer: Server crash or critical failure 2013 (STATUS_SERVER_CRASHED) reported at dsn dbi:mysql:host=127.0.0.1:port=19300:user=root:database=test:mysql_local_infile=1
# 2022-09-14T13:55:57 [534318] For table `test`.`t9` : status : OK
# 2022-09-14T13:55:57 [534318] #-------------------------#
# 2022-09-14T13:55:57 [534318] #-- Stopping the server --#
# 2022-09-14T13:55:57 [534318] #-------------------------#
# 2022-09-14T13:55:57 [534318] Stopping server on port 19300
# 2022-09-14T13:56:00 [534318] Server has been stopped
# 2022-09-14T13:56:00 [534318] [534318] /mariadb/rqg-mdev22913/runall-new.pl will exit with exit status STATUS_OK (0)
# 2022-09-14T13:56:00 [530851] Trial 10 ended with exit status STATUS_OK (0)
# 2022-09-14T13:56:00 [530851] Output did not match the pattern 'returned 11', result will be ignored
# 2022-09-14T13:56:00 [530851] ./runall-trials.pl will exit with exit status FAILED TO REPRODUCE (0)

Could it simply be that the worker threads of the test harness cannot distinguish a server shutdown from a killed server?

Comment by Marko Mäkelä [ 2022-09-14 ]

With the reported revision, I reproduced something that looks more like a genuine failure:

10.4 49ac606a75d2f4d2898d0d5638354493e12b5dc0

# 2022-09-14T15:15:38 [549486] | 2022-09-14 15:15:37 0 [Note] InnoDB: Starting final batch to recover 17768 pages from redo log.
# 2022-09-14T15:15:38 [549486] | 2022-09-14 15:15:37 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
# 2022-09-14T15:15:38 [549486] | [00] FATAL ERROR: 2022-09-14 15:15:37 mariabackup: innodb_init() returned 39 (Data structure corruption).
# 2022-09-14T15:15:38 [549486] ----------------------------------
# 2022-09-14T15:15:38 [549486] Killed process 549530
# 2022-09-14T15:15:38 [549486] [549486] /mariadb/rqg-mdev22913/runall-new.pl will exit with exit status STATUS_BACKUP_FAILURE (108)
# 2022-09-14T15:15:38 [548052] Trial 10 ended with exit status STATUS_BACKUP_FAILURE (108)
# 2022-09-14T15:15:38 [548052] Output did not match the pattern 'returned 11', result will be ignored
# 2022-09-14T15:15:38 [548052] ./runall-trials.pl will exit with exit status FAILED TO REPRODUCE (0)

Comment by Marko Mäkelä [ 2022-09-14 ]

Finally, I tested the stated revision after applying the fix of MDEV-18589 to it:

git checkout 49ac606a75d2f4d2898d0d5638354493e12b5dc0
git cherry-pick --no-commit 3e3cfa893481abe9524a1657c4246fa9f91d4826

10.4 49ac606a75d2f4d2898d0d5638354493e12b5dc0 + MDEV-18589

# 2022-09-14T15:43:36 [551908] For table `test`.`t9` : status : OK
# 2022-09-14T15:43:36 [551908] #-------------------------#
# 2022-09-14T15:43:36 [551908] #-- Stopping the server --#
# 2022-09-14T15:43:36 [551908] #-------------------------#
# 2022-09-14T15:43:36 [551908] Stopping server on port 19300
# 2022-09-14T15:43:39 [551908] Server has been stopped
# 2022-09-14T15:43:39 [551908] [551908] /mariadb/rqg-mdev22913/runall-new.pl will exit with exit status STATUS_OK (0)
# 2022-09-14T15:43:39 [549818] Trial 10 ended with exit status STATUS_OK (0)
# 2022-09-14T15:43:39 [549818] Output did not match the pattern 'returned (11|39)', result will be ignored
# 2022-09-14T15:43:39 [549818] ./runall-trials.pl will exit with exit status FAILED TO REPRODUCE (0)

Furthermore, I browsed over the the log. There was no Ignoring data file message flood, like there was without the fix. Thus, this bug must be a duplicate of MDEV-18589.

Generated at Thu Feb 08 09:18:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.