Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33205

[ERROR] InnoDB: We detected index corruption in an InnoDB type table.

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Incomplete
    • 10.6.14
    • N/A
    • Galera, Server
    • None
    • RHEL7 on VMware

    Description

      Our environment is 3 nodes Galera cluster - 2 DB nodes + 1 arbitrator. HAProxy redirected traffic to one 1 DB node.

      We encountered index corruption and, then, node inconsistent on standby node.

      We needed to restart DB node manually and let it triggered implicitly SST to get resolve.

      Kindly advise what we can check to troubleshot this case.

      2024-01-03 14:52:10 7 [ERROR] InnoDB: We detected index corruption in an InnoDB type table. You have to dump + drop + reimport the table or, in a case of widespread corruption, dump all InnoDB tables and recreate the whole tablespace. If the mariadbd server crashes after the startup or when you dump the tables. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
      2024-01-03 14:52:10 12 [ERROR] mariadbd: Index for table 'pa_pos_psdo' is corrupt; try to repair it
      2024-01-03 14:52:10 7 [ERROR] mariadbd: Index for table 'pa_pos_psdo' is corrupt; try to repair it
      2024-01-03 14:52:10 7 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table tes-public.pa_pos_psdo; Index for table 'pa_pos_psdo' is corrupt; try to repair it, Error_code: 1034; handler error HA_ERR_CRASHED; the event's master log FIRST, end_log_pos 9325, Internal MariaDB error code: 1034
      2024-01-03 14:52:10 7 [Warning] WSREP: Event 6 Write_rows_v1 apply failed: 126, seqno 27074073
      2024-01-03 14:52:10 12 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table tes-public.pa_pos_psdo; Table tes@002dpublic/pa_pos_psdo in tablespace 94889212653016 corrupted., Error_code: 126; Index for table 'pa_pos_psdo' is corrupt; try to repair it, Error_code: 1034; handler error HA_ERR_CRASHED; the event's master log FIRST, end_log_pos 9335, Internal MariaDB error code: 1034
      2024-01-03 14:52:10 12 [Warning] WSREP: Event 6 Write_rows_v1 apply failed: 126, seqno 27074074
      2024-01-03 14:52:10 6 [ERROR] mariadbd: Index for table 'pa_pos_psdo' is corrupt; try to repair it
      2024-01-03 14:52:10 6 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table tes-public.pa_pos_psdo; Table tes@002dpublic/pa_pos_psdo in tablespace 94889212653016 corrupted., Error_code: 126; Index for table 'pa_pos_psdo' is corrupt; try to repair it, Error_code: 1034; handler error HA_ERR_CRASHED; the event's master log FIRST, end_log_pos 9325, Internal MariaDB error code: 1034
      2024-01-03 14:52:10 6 [Warning] WSREP: Event 6 Write_rows_v1 apply failed: 126, seqno 27074076
      2024-01-03 14:52:10 0 [Note] WSREP: Member 1(t2vdbs-hpftes-tesdbppd-2-01) initiates vote on 9b0bd445-e8dc-11ed-aff1-6ec3700a3571:27074073,9359a244aa1e2038:  Index for table 'pa_pos_psdo' is corrupt; try to repair it, Error_code: 1034;
       
      xxx
       
      2024-01-03 14:52:10 7 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on 9b0bd445-e8dc-11ed-aff1-6ec3700a3571:27074073
               at /home/buildbot/buildbot/build/galera/src/replicator_smm.cpp:process_apply_error():1357
      

      Attachments

        Issue Links

          Activity

            I think that this could be a duplicate of MDEV-31767 or MDEV-30531.

            marko Marko Mäkelä added a comment - I think that this could be a duplicate of MDEV-31767 or MDEV-30531 .

            Have you tried to upgrade to MariaDB Server 10.6.16?

            marko Marko Mäkelä added a comment - Have you tried to upgrade to MariaDB Server 10.6.16?
            frelist William Wong added a comment -

            Thanks Marko
            We plan to upgrade to MariaDB Server 10.6.16

            frelist William Wong added a comment - Thanks Marko We plan to upgrade to MariaDB Server 10.6.16

            frelist,
            Did you have a chance to upgrade, and if so, did it help?

            elenst Elena Stepanova added a comment - frelist , Did you have a chance to upgrade, and if so, did it help?
            andr04 Andrey added a comment -

            The same problem on v11.2.2. It happens 2-3 times per week. Galera Cluster with 3 nodes is used.

            After that, Galera Cluster start reporting in the loop the following error:

            2024-03-10 19:41:29 3080 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)

            After restart, it's recovering with SST.

            Please let me know what kind of additional debug to switch on for obtaining more details.

            andr04 Andrey added a comment - The same problem on v11.2.2. It happens 2-3 times per week. Galera Cluster with 3 nodes is used. After that, Galera Cluster start reporting in the loop the following error: 2024-03-10 19:41:29 3080 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort) After restart, it's recovering with SST. Please let me know what kind of additional debug to switch on for obtaining more details.

            andr04, if you are using the default wsrep_sst_method=rsync, your symptoms could be explained by MDEV-32115. Currently there is no evidence that it would affect MariaDB Server 10.5 or later.

            frelist, did an upgrade to 10.6.16 or 10.6.17 fix the problem for you?

            marko Marko Mäkelä added a comment - andr04 , if you are using the default wsrep_sst_method=rsync , your symptoms could be explained by MDEV-32115 . Currently there is no evidence that it would affect MariaDB Server 10.5 or later. frelist , did an upgrade to 10.6.16 or 10.6.17 fix the problem for you?
            andr04 Andrey added a comment - - edited

            No, I use wsrep_sst_method = mariabackup.

            Let's investigate the source problem of corrupting indexes. I suppose WSREP reports about connection abort because the table is corrupted and MariaDB is in the state of declining all requests.

            andr04 Andrey added a comment - - edited No, I use wsrep_sst_method = mariabackup . Let's investigate the source problem of corrupting indexes. I suppose WSREP reports about connection abort because the table is corrupted and MariaDB is in the state of declining all requests.

            While testing MDEV-33588 we just found something that may explain wrongly claimed corruption in 10.6 and later versions. It is something similar to MDEV-31767.

            marko Marko Mäkelä added a comment - While testing MDEV-33588 we just found something that may explain wrongly claimed corruption in 10.6 and later versions. It is something similar to MDEV-31767 .

            People

              marko Marko Mäkelä
              frelist William Wong
              Votes:
              1 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.