Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9663

InnoDB assertion failure: *cursor->index->name == TEMP_INDEX_PREFIX, or !cursor->index->is_committed()

Details

    Description

      A Windows user upgraded from MariaDB 10.0.21 to MariaDB 10.1.11. After running mysql_upgrade, the user saw errors like this in the error log:

      InnoDB: tried to purge sec index entry not marked for deletion in
      InnoDB: index "column" of table "db1"."tab1"
      InnoDB: tuple DATA TUPLE: 3 fields;
       0: len 15; hex XXXX; asc field1;;
       1: len 3; hex XXXX; asc sum;;
       2: len 33; hex XXXX; asc field3;;
       
      InnoDB: record PHYSICAL RECORD: n_fields 3; compact format; info bits 0
       0: len 15; hex XXXX; asc field1;;
       1: len 3; hex XXXX; asc sum;;
       2: len 30; hex XXXX; asc field3; (total 33 bytes);
      

      Some time after that, the user tried inserting more records into the database, and this caused an assertion failure:

      2016-02-12 14:58:34 4428  InnoDB: Assertion failure in thread 17448 in file row0ins.cc line 283
      InnoDB: Failing assertion: *cursor->index->name == TEMP_INDEX_PREFIX
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to http://bugs.mysql.com.
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
      InnoDB: about forcing recovery.
      

      Attachments

        Issue Links

          Activity

            Has anyone experienced this corruption on a fresh MariaDB 10.5 installation that was not upgraded from an earlier version, or on any MariaDB version where innodb_change_buffering=none was always set? I suspect that MDEV-24449 could explain this bug (as well as MDEV-20934). If my hypothesis is correct, then this corruption should not be possible in MariaDB 10.5. (But the database may have been corrupted before upgrade.)

            marko Marko Mäkelä added a comment - Has anyone experienced this corruption on a fresh MariaDB 10.5 installation that was not upgraded from an earlier version, or on any MariaDB version where innodb_change_buffering=none was always set? I suspect that MDEV-24449 could explain this bug (as well as MDEV-20934 ). If my hypothesis is correct, then this corruption should not be possible in MariaDB 10.5. (But the database may have been corrupted before upgrade.)

            As noted in MDEV-24449, I was able to reproduce one type of corruption, but not exactly this type. I am rather convinced that the bug that was fixed by the one-line fix in MDEV-24449 can explain various types of corruption, including this MDEV-9663 type that I am aware of since the MySQL 5.1 or 5.5 days (possibly since about 2008, and for sure since 2010). That race condition is present in the very first InnoDB commit. The probability of encountering the race condition was significantly increased by the widespread use of hot backup tools (at least Percona XtraBackup or Mariabackup).

            I am afraid that we will have to wait for feedback for several months to confirm whether MDEV-24449 (or MDEV-19514 in MariaDB 10.5) actually was the last bug that can cause corruption of secondary index pages.

            marko Marko Mäkelä added a comment - As noted in MDEV-24449 , I was able to reproduce one type of corruption, but not exactly this type. I am rather convinced that the bug that was fixed by the one-line fix in MDEV-24449 can explain various types of corruption, including this MDEV-9663 type that I am aware of since the MySQL 5.1 or 5.5 days (possibly since about 2008, and for sure since 2010). That race condition is present in the very first InnoDB commit . The probability of encountering the race condition was significantly increased by the widespread use of hot backup tools (at least Percona XtraBackup or Mariabackup). I am afraid that we will have to wait for feedback for several months to confirm whether MDEV-24449 (or MDEV-19514 in MariaDB 10.5) actually was the last bug that can cause corruption of secondary index pages.

            Two months have passed. I think we can close this. If anything will pop up, it can reopened anytime.

            serg Sergei Golubchik added a comment - Two months have passed. I think we can close this. If anything will pop up, it can reopened anytime.

            As much as I would like to believe that MDEV-24449 and MDEV-24709 fixed all remaining causes of this, we recently had a support customer who encountered corruption of a secondary index (of non-virtual columns) after starting from a logical dump, without restoring any backup or invoking crash recovery in between. So, I am afraid that some bug may still be out there. But, it might be best filed as a new ticket.

            marko Marko Mäkelä added a comment - As much as I would like to believe that MDEV-24449 and MDEV-24709 fixed all remaining causes of this, we recently had a support customer who encountered corruption of a secondary index (of non-virtual columns) after starting from a logical dump, without restoring any backup or invoking crash recovery in between. So, I am afraid that some bug may still be out there. But, it might be best filed as a new ticket.

            The CHECK TABLE…EXTENDED that was implemented in MDEV-24402 will flag secondary indexes corrupted if they contain entries that should not exist.

            Crashes due to this corruption should have been (mostly) fixed in MDEV-13542. Because we were not able to reproduce this corruption, I cannot fully guarantee it.

            While analyzing a failure from a stress test of MDEV-30009, I may have found a possible explanation of this. The scenario is as follows.

            1. Some changes were buffered to a secondary index leaf page that was not located in the buffer pool.
            2. The page was freed (possibly as part of DROP INDEX).
            3. During ibuf_read_merge_pages(), we will reset the change buffer bitmap bits but will not remove the change buffer records.
            4. The same page is allocated and reused for something else.
            5. The page is evicted from the buffer pool.
            6. Something is added to the change buffer for the page.
            7. On a change buffer merge, we will apply both old (bogus) and new entries to the page.

            The extra delete-unmarked records could simply originate from previously buffered inserts that were not discarded as they were supposed to, in the above scenario.

            As far as I can tell, all MySQL and MariaDB versions are affected by this. The code changes that were applied in MDEV-20934 did not fix this, because that code would only be executed on shutdown with innodb_fast_shutdown=0.

            The InnoDB change buffer was disabled by default in MDEV-27734.

            Note: We still have many open bugs related to the corruption of indexes that include virtual columns. Unlike this corruption, those corruptions are much easier to reproduce. Implementing some file format changes such as MDEV-17598 could help a lot with those bugs.

            marko Marko Mäkelä added a comment - The CHECK TABLE…EXTENDED that was implemented in MDEV-24402 will flag secondary indexes corrupted if they contain entries that should not exist. Crashes due to this corruption should have been (mostly) fixed in MDEV-13542 . Because we were not able to reproduce this corruption, I cannot fully guarantee it. While analyzing a failure from a stress test of MDEV-30009 , I may have found a possible explanation of this. The scenario is as follows. Some changes were buffered to a secondary index leaf page that was not located in the buffer pool. The page was freed (possibly as part of DROP INDEX ). During ibuf_read_merge_pages() , we will reset the change buffer bitmap bits but will not remove the change buffer records. The same page is allocated and reused for something else. The page is evicted from the buffer pool. Something is added to the change buffer for the page. On a change buffer merge, we will apply both old (bogus) and new entries to the page. The extra delete-unmarked records could simply originate from previously buffered inserts that were not discarded as they were supposed to, in the above scenario. As far as I can tell, all MySQL and MariaDB versions are affected by this. The code changes that were applied in MDEV-20934 did not fix this, because that code would only be executed on shutdown with innodb_fast_shutdown=0 . The InnoDB change buffer was disabled by default in MDEV-27734 . Note: We still have many open bugs related to the corruption of indexes that include virtual columns. Unlike this corruption, those corruptions are much easier to reproduce. Implementing some file format changes such as MDEV-17598 could help a lot with those bugs.

            People

              thiru Thirunarayanan Balathandayuthapani
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              10 Vote for this issue
              Watchers:
              29 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.