[MDEV-33205] [ERROR] InnoDB: We detected index corruption in an InnoDB type table. - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Incomplete
Affects Version/s: 10.6.14
Fix Version/s: N/A
Component/s: Galera, Server
Labels:
None
Environment:
RHEL7 on VMware

Description

Our environment is 3 nodes Galera cluster - 2 DB nodes + 1 arbitrator. HAProxy redirected traffic to one 1 DB node.

We encountered index corruption and, then, node inconsistent on standby node.

We needed to restart DB node manually and let it triggered implicitly SST to get resolve.

Kindly advise what we can check to troubleshot this case.

2024-01-03 14:52:10 7 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.

2024-01-03 14:52:10 12 [ERROR] mariadbd: Index for table 'pa_pos_psdo' is corrupt; try to repair it

2024-01-03 14:52:10 7 [ERROR] mariadbd: Index for table 'pa_pos_psdo' is corrupt; try to repair it

2024-01-03 14:52:10 7 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table tes-public.pa_pos_psdo; Index for table 'pa_pos_psdo' is corrupt; try to repair it, Error_code: 1034; handler error HA_ERR_CRASHED; the event's master log FIRST, end_log_pos 9325, Internal MariaDB error code: 1034

2024-01-03 14:52:10 7 [Warning] WSREP: Event 6 Write_rows_v1 apply failed: 126, seqno 27074073

2024-01-03 14:52:10 12 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table tes-public.pa_pos_psdo; Table tes@002dpublic/pa_pos_psdo in tablespace 94889212653016 corrupted., Error_code: 126; Index for table 'pa_pos_psdo' is corrupt; try to repair it, Error_code: 1034; handler error HA_ERR_CRASHED; the event's master log FIRST, end_log_pos 9335, Internal MariaDB error code: 1034

2024-01-03 14:52:10 12 [Warning] WSREP: Event 6 Write_rows_v1 apply failed: 126, seqno 27074074

2024-01-03 14:52:10 6 [ERROR] mariadbd: Index for table 'pa_pos_psdo' is corrupt; try to repair it

2024-01-03 14:52:10 6 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table tes-public.pa_pos_psdo; Table tes@002dpublic/pa_pos_psdo in tablespace 94889212653016 corrupted., Error_code: 126; Index for table 'pa_pos_psdo' is corrupt; try to repair it, Error_code: 1034; handler error HA_ERR_CRASHED; the event's master log FIRST, end_log_pos 9325, Internal MariaDB error code: 1034

2024-01-03 14:52:10 6 [Warning] WSREP: Event 6 Write_rows_v1 apply failed: 126, seqno 27074076

2024-01-03 14:52:10 0 [Note] WSREP: Member 1(t2vdbs-hpftes-tesdbppd-2-01) initiates vote on 9b0bd445-e8dc-11ed-aff1-6ec3700a3571:27074073,9359a244aa1e2038:  Index for table 'pa_pos_psdo' is corrupt; try to repair it, Error_code: 1034;

xxx

2024-01-03 14:52:10 7 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on 9b0bd445-e8dc-11ed-aff1-6ec3700a3571:27074073

         at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:process_apply_error():1357

Attachments

Issue Links

relates to

MDEV-30531 Corrupt index(es) on busy table when using FOREIGN KEY with CASCADE or SET NULL

Closed

MDEV-31767 InnoDB tables are being flagged as corrupted on an I/O bound server

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2024-01-10 07:48

I think that this could be a duplicate of ~~MDEV-31767~~ or ~~MDEV-30531~~.

Marko Mäkelä added a comment - 2024-01-10 07:48 I think that this could be a duplicate of MDEV-31767 or MDEV-30531 .

Marko Mäkelä added a comment - 2024-01-10 07:48

Have you tried to upgrade to MariaDB Server 10.6.16?

Marko Mäkelä added a comment - 2024-01-10 07:48 Have you tried to upgrade to MariaDB Server 10.6.16?

William Wong added a comment - 2024-01-19 11:32

Thanks Marko
We plan to upgrade to MariaDB Server 10.6.16

William Wong added a comment - 2024-01-19 11:32 Thanks Marko We plan to upgrade to MariaDB Server 10.6.16

Elena Stepanova added a comment - 2024-02-19 20:22

frelist,
Did you have a chance to upgrade, and if so, did it help?

Elena Stepanova added a comment - 2024-02-19 20:22 frelist , Did you have a chance to upgrade, and if so, did it help?

Andrey added a comment - 2024-03-10 20:00

The same problem on v11.2.2. It happens 2-3 times per week. Galera Cluster with 3 nodes is used.

After that, Galera Cluster start reporting in the loop the following error:

2024-03-10 19:41:29 3080 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)

After restart, it's recovering with SST.

Please let me know what kind of additional debug to switch on for obtaining more details.

Andrey added a comment - 2024-03-10 20:00 The same problem on v11.2.2. It happens 2-3 times per week. Galera Cluster with 3 nodes is used. After that, Galera Cluster start reporting in the loop the following error: 2024-03-10 19:41:29 3080 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort) After restart, it's recovering with SST. Please let me know what kind of additional debug to switch on for obtaining more details.

Marko Mäkelä added a comment - 2024-03-11 10:12

andr04, if you are using the default wsrep_sst_method=rsync, your symptoms could be explained by MDEV-32115. Currently there is no evidence that it would affect MariaDB Server 10.5 or later.

frelist, did an upgrade to 10.6.16 or 10.6.17 fix the problem for you?

Marko Mäkelä added a comment - 2024-03-11 10:12 andr04 , if you are using the default wsrep_sst_method=rsync , your symptoms could be explained by MDEV-32115 . Currently there is no evidence that it would affect MariaDB Server 10.5 or later. frelist , did an upgrade to 10.6.16 or 10.6.17 fix the problem for you?

Andrey added a comment - 2024-03-11 10:32 - edited

No, I use wsrep_sst_method = mariabackup.

Let's investigate the source problem of corrupting indexes. I suppose WSREP reports about connection abort because the table is corrupted and MariaDB is in the state of declining all requests.

Andrey added a comment - 2024-03-11 10:32 - edited No, I use wsrep_sst_method = mariabackup . Let's investigate the source problem of corrupting indexes. I suppose WSREP reports about connection abort because the table is corrupted and MariaDB is in the state of declining all requests.

Marko Mäkelä added a comment - 2024-03-11 15:11

While testing ~~MDEV-33588~~ we just found something that may explain wrongly claimed corruption in 10.6 and later versions. It is something similar to ~~MDEV-31767~~.

Marko Mäkelä added a comment - 2024-03-11 15:11 While testing MDEV-33588 we just found something that may explain wrongly claimed corruption in 10.6 and later versions. It is something similar to MDEV-31767 .

People

Assignee:: Marko Mäkelä

Reporter:: William Wong

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2024-01-09 19:02

Updated:: 2024-03-26 12:41

Resolved:: 2024-03-26 12:41

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.