[MDEV-25795] INSERT_REUSE_REDUNDANT still happening in 10.5.10 Created: 2021-05-27  Updated: 2021-06-01  Resolved: 2021-06-01

Status: Closed
Project: MariaDB Server
Component/s: Galera SST, mariabackup
Affects Version/s: 10.5.10
Fix Version/s: 10.6.2, 10.5.11

Type: Bug Priority: Major
Reporter: Nicky Gerritsen Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MDEV-25745 InnoDB recovery fails with [ERROR] In... Closed

 Description   

We are running into an issue very similar to MDEV-25031, even though we have upgraded all nodes to 10.5.10.

This is the exact error:

May 27 10:06:45 db1 -innobackupex-apply: 2021-05-27 12:06:45 0 [Note] InnoDB: Starting a batch to recover 4940 pages from redo log.
May 27 10:06:45 db1 -innobackupex-apply: 2021-05-27 12:06:45 0 [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=4]
May 27 10:06:45 db1 -innobackupex-apply: 2021-05-27 12:06:45 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
May 27 10:06:45 db1 -innobackupex-apply: 2021-05-27 12:06:45 0 [ERROR] InnoDB: Not applying DELETE_ROW_FORMAT_REDUNDANT due to corruption on [page id: space=0, page number=4]
May 27 10:06:45 db1 -innobackupex-apply: 2021-05-27 12:06:45 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
May 27 10:06:45 db1 -innobackupex-apply: 2021-05-27 12:06:45 0 [ERROR] InnoDB: Not applying INSERT_HEAP_REDUNDANT due to corruption on [page id: space=0, page number=4]
May 27 10:06:45 db1 -innobackupex-apply: 2021-05-27 12:06:45 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
May 27 10:06:47 db1 -innobackupex-apply: 2021-05-27 12:06:47 0 [ERROR] InnoDB: Plugin initialization aborted with error Generic error
May 27 10:06:47 db1 -innobackupex-apply: [00] FATAL ERROR: 2021-05-27 12:06:47 mariabackup: innodb_init() returned 11 (Generic error).

Things I have tried:

  • First of all, upgrade all nodes to 10.5.10
  • Created a full new cluster with 2 nodes, set it up as the slave of the old cluster, used mysqldump to create a dump and imported it, set the master pos / file correctly and waited for the slave to catch up. Destroyed and recreated the old nodes and linked them to the new cluster.

Both these things did nothing.

It does seem this only happens as soon as we try to add a third node. That is to say the second thing from above worked for the two nodes I created, but when I wanted to add a third node it broke down.

I am completely lost as what do now, since I really would like a 3+ node cluster.

I am more than willing to help debug stuff if that would help.



 Comments   
Comment by Nicky Gerritsen [ 2021-05-27 ]

I wonder if this is indeed a duplicate of MDEV-25745, since as I read in MDEV-25031 that page 4 is special, in that it contains the root page of the change buffer B-tree, while MDEV-25745 seems to be a page further along.

Comment by Nicky Gerritsen [ 2021-05-27 ]

An interesting thing I have found out, if I do the following:

  • Have a node db3 that contains all data
  • Start db4, wait for SST to complete (coming from db3)
  • Stop db4
  • Start db1, wait for SST to complete (coming from db3 as well)
  • Start db1 again, wait for IST to complete

The three nodes can communicate. However, if I keep db4 running, the SST of db1 fails, but the SST comes from db4 in that case, not from db3.
I tried forcing the SST from db3 (with wsrep_sst_donor="db3,") it still does not work, so it doesn't seem to matter from which node we do the SST, only if any other nodes are connected.

Generated at Thu Feb 08 09:40:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.