[MDEV-25745] InnoDB recovery fails with [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.5.4, 10.5.2, 10.5.3, 10.5.5, 10.5.6, 10.5.7, 10.5.8, 10.5.9, 10.6.0, 10.5.10, 10.6.1
Fix Version/s: 10.6.2, 10.5.11
Component/s: Galera, Storage Engine - InnoDB
Labels:
- corruption
- rr-profile
Environment:
Operating System: Amazon Linux 2 AMI
Hardware: Amazon AWS t3a.medium
3-Node Galera Cluster

Description

We have several production databases that we migrated early 2021 from single node setup to a 3-node Galera Cluster for high availability. During the migration project we implemented a tool for managing the cluster and one of the features of that tool is "cycle-db-cluster" functionality that replaces the oldest node of the cluster with a totally new one. Since the project we have kept the cluster up to date by replacing nodes periodically using this automated process.

At 2021-05-11 the replace process failed as a new 10.5.10 node could not join our cluster that consisted of 10.5.9 nodes. In mysqld.log the new node reported [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=14408]. We tried to repeat the process several times also using MariaDB version 10.5.9 and a new virtual server each time but the issue persisted. Our mysqldumps worked just fine so we scheduled a service break for 2021-05-12 and started a new cluster using MariaDB 10.5.10 and data from mysqldumps. An important detail is that we imported mysqldumps to a 1-node cluster and added 2 nodes after imports had finished – so joining definitely worked shortly after import. We also replaced the first node of the cluster by using the "cycle-db-cluster" functionality ~1 hour after the service break.

At 2021-05-20 we tried to replace a node in the cluster for the first time since the day we set it up. Again joining a new node to the cluster failed and mysqld.log in the new joining node contains [ERROR] InnoDB: Not applying INSERT_REUSE_REDUNDANT due to corruption on [page id: space=0, page number=2366]. So now we have almost new 3-node 10.5.10 cluster where we can't join new nodes. The cluster itself continues to perform just fine and mysqldumps succeed. For me it seems that the system detects a corruption for some reason while the data inside databases is actually totally OK.

Attachments:

20210520-cycle-db-cluster.log:

Contains log of the automated cycle-db-cluster process. It contains details how a new db server was set up from scratch all the way to attempting to start the mysqld process.

Conf.zip:

Database configuration files that we use for our nodes. These have a few variables that are replaced automatically during node setup process.

20210520-clusterjoin-failure-mysqld.log:

mysqld.log from the node join failure at 2021-05-20.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

20210520-clusterjoin-failure-mysqld.log
17 kB
2021-05-20 15:00
20210520-cycle-db-cluster.log
135 kB
2021-05-20 15:00
conf.zip
2 kB
2021-05-20 15:02

Issue Links

is caused by

MDEV-21724 Optimize page_cur_insert_rec_low() redo logging

Closed

is duplicated by

MDEV-25795 INSERT_REUSE_REDUNDANT still happening in 10.5.10

Closed

relates to

MDEV-25031 Not applying INSERT_REUSE_REDUNDANT due to corruption on page

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Tomi Tukiainen

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2021-05-20 15:10

Updated:: 2021-06-01 09:32

Resolved:: 2021-05-31 17:41

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.