[MDEV-24402] CHECK TABLE may miss some cases of index inconsistencies - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Affects Version/s: 10.2.31
Fix Version/s: 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1
Component/s: Storage Engine - InnoDB
Labels:
- check
- corruption

Description

Looks as if some cases of InnoDB index inconsistencies – see e.g. ~~MDEV-22373~~ – are missed by CHECK TABLE.

InnoDB currently ignores any other CHECK TABLE attributes than QUICK (which will suppress the time-consuming checks of the upper levels of the index trees).

Most importantly, InnoDB is ignoring the attribute EXTENDED, and is essentially only performing a MVCC COUNT(*) in each index. A verified case exists where the record counts in the indexes would match, yet the records do not match between the indexes. A true EXTENDED check should report any mismatch between the clustered index (primary key) and the secondary index records.

Note: By the design of InnoDB MVCC (see MDEV-17598 for some discussion), secondary indexes may contain multiple copies of a record, while in the clustered index the multiple versions are formed via a singly-linked list that the undo log pages are part of.

An extremely extended test would have to check that in any possible read view, the secondary indexes match the clustered index, instead of only checking the current read view.

It may turn out that only a special ‘locking’ variant of CHECK TABLE would be able to detect truly any type of mismatch.

Attachments

Issue Links

blocks

MDEV-28349 Provide "crash safe" options for CHECK TABLE and ALTER TABLE ... CHECK PARTITION ...

Open

causes

MDEV-29886 Assertion `!index->table->is_temporary()' failed in trx_undo_prev_version_build upon CHECK

Closed

MDEV-29978 Corruption errors upon CHECK on temporary InnoDB table

Closed

MDEV-35227 Executing CHECK TABLE...EXTENDED right after server startup may attempt to access too old history

Confirmed

relates to

MDEV-25459 MVCC read from index on CHAR or VARCHAR wrongly omits rows

Closed

MDEV-29593 Purge misses a chance to free not-yet-reused undo pages

Closed

MDEV-29666 InnoDB fails to purge secondary index records when indexed virtual columns exist

Closed

MDEV-29823 Secondary index records may never be purged after rollback

Confirmed

MDEV-9663 InnoDB assertion failure: *cursor->index->name == TEMP_INDEX_PREFIX, or !cursor->index->is_committed()

Closed

MDEV-11802 innodb.innodb_bug14676111 fails in buildbot due to InnoDB purge failing to start when there is work to do

Closed

MDEV-22373 Unable to find a record to delete-mark ends up crashing mysqld process after upgrading from 10.1.43 to 10.4

Closed

MDEV-29954 Unique hash key on column prefix is computed incorrectly

Closed

MDEV-30129 MySQL 5.7 --> MariaDB 10.8 switch: mysqlcheck --check --extended hangs on table

Closed

(8 relates to)

Activity

Ascending order - Click to sort in descending order

View 19 older comments

Marko Mäkelä added a comment - 2022-10-17 16:29

I got an RQG grammar that would occasionally report orphan delete-marked secondary index records, similar to ~~MDEV-29666~~. I analyzed an rr replay trace of such a failure. The cause is a race condition between rollback and purge. Rollback would not remove a secondary index record that was inserted on top of a committed delete-marked record, because purge had not advanced that far yet. After rollback, trx_t::commit() of the ‘empty’ transaction would invoke trx_purge_add_undo_to_history(), which unnecessarily adds the empty undo log to the purge queue. I tried disabling trx_undo_try_truncate() so that the already rolled-back undo log records would be added to the purge queue, but even with that work-around (not a proper fix), the grammar would fail. With that tweak present, several tests that involve BLOBs would fail, presumably because some BLOBs would be prematurely freed.

The error is not easily fixed. We might relax the check and ‘wave through’ cases where a matching clustered index record cannot be found at all. Other cases (where a record with a primary key is found for a newer version of the record, but not for one that matches the secondary index record) would still be flagged:

diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc

index e1e7ed0e12a..4baa1e66376 100644

--- a/storage/innobase/row/row0sel.cc

+++ b/storage/innobase/row/row0sel.cc

@@ -6399,16 +6399,12 @@ dberr_t row_check_index(row_prebuilt_t *prebuilt, ulint *n_rows)

         if (!rec_deleted)

         not_found:

-          ib::error() << "Clustered index record not found for index "

+          ib::fatal() << "Clustered index record not found for index "

                       << index->name << " of table " << index->table->name

                       << ": " << rec_offsets_print(rec, offsets);

           if (prebuilt->autoinc_error == DB_SUCCESS)

             prebuilt->autoinc_error= DB_CORRUPTION;

-        else if (&view == &check_table_extended_view)

-        extended_not_found:

-          if (view.changes_visible(page_trx_id))

-            goto not_found;

       did_not_find:

         mtr.rollback_to_savepoint(savepoint);

         goto next_rec;

@@ -6587,8 +6583,9 @@ dberr_t row_check_index(row_prebuilt_t *prebuilt, ulint *n_rows)

             clust_rec version or when trx_undo_prev_version_build()

             encounters a BLOB that may have been freed according to

             purge_sys.view (not purge_sys.end_view). */

-            if (&view == &check_table_extended_view && !got_extended_match)

-              goto extended_not_found;

+            if (&view == &check_table_extended_view && !got_extended_match &&

+                view.changes_visible(page_trx_id))

+              goto not_found;

             goto did_not_find;

Alternatively, we could issue warnings for this but not flag the secondary index as corrupted. After all, this type of corruption should only hurt performance, not cause any correctness issues.

Marko Mäkelä added a comment - 2022-10-17 16:29 I got an RQG grammar that would occasionally report orphan delete-marked secondary index records, similar to MDEV-29666 . I analyzed an rr replay trace of such a failure. The cause is a race condition between rollback and purge. Rollback would not remove a secondary index record that was inserted on top of a committed delete-marked record, because purge had not advanced that far yet. After rollback, trx_t::commit() of the ‘empty’ transaction would invoke trx_purge_add_undo_to_history() , which unnecessarily adds the empty undo log to the purge queue. I tried disabling trx_undo_try_truncate() so that the already rolled-back undo log records would be added to the purge queue, but even with that work-around (not a proper fix), the grammar would fail. With that tweak present, several tests that involve BLOBs would fail, presumably because some BLOBs would be prematurely freed. The error is not easily fixed. We might relax the check and ‘wave through’ cases where a matching clustered index record cannot be found at all. Other cases (where a record with a primary key is found for a newer version of the record, but not for one that matches the secondary index record) would still be flagged: diff --git a/storage/innobase/row/row0sel.cc b/storage/innobase/row/row0sel.cc index e1e7ed0e12a..4baa1e66376 100644 --- a/storage/innobase/row/row0sel.cc +++ b/storage/innobase/row/row0sel.cc @@ -6399,16 +6399,12 @@ dberr_t row_check_index(row_prebuilt_t *prebuilt, ulint *n_rows) if (!rec_deleted) { not_found: - ib::error() << "Clustered index record not found for index " + ib::fatal() << "Clustered index record not found for index " << index->name << " of table " << index->table->name << ": " << rec_offsets_print(rec, offsets); if (prebuilt->autoinc_error == DB_SUCCESS) prebuilt->autoinc_error= DB_CORRUPTION; } - else if (&view == &check_table_extended_view) - extended_not_found: - if (view.changes_visible(page_trx_id)) - goto not_found; did_not_find: mtr.rollback_to_savepoint(savepoint); goto next_rec; @@ -6587,8 +6583,9 @@ dberr_t row_check_index(row_prebuilt_t *prebuilt, ulint *n_rows) clust_rec version or when trx_undo_prev_version_build() encounters a BLOB that may have been freed according to purge_sys.view (not purge_sys.end_view). */ - if (&view == &check_table_extended_view && !got_extended_match) - goto extended_not_found; + if (&view == &check_table_extended_view && !got_extended_match && + view.changes_visible(page_trx_id)) + goto not_found; goto did_not_find; } Alternatively, we could issue warnings for this but not flag the secondary index as corrupted. After all, this type of corruption should only hurt performance, not cause any correctness issues.

Matthias Leich added a comment - 2022-10-17 20:02 - edited

RQG testing on some older version of origin/bb-10.6-MDEV-24402 and using the following simplified RQG grammars

TBR-757-micro.yy

---------------------------

create_table:

    CREATE TABLE IF NOT EXISTS t1 ( non_generated_cols ) ENGINE = InnoDB ROW_FORMAT = Dynamic ;

insert_part:

    INSERT INTO t1 (col1,col2, col_int, col_text ) VALUES ;

my_int:

    { $my_int= 3 } |

    { $my_int= 5 } |

    { $my_int= 7 } ;

non_generated_cols:

    col1 INT PRIMARY KEY, col2 INT, col_int INTEGER , col_text TEXT ;

query:

    ALTER TABLE t1 ADD UNIQUE ( col_text(9) , col_int ) |

    ALTER TABLE t1 ADD UNIQUE ( col_int , col_int ) |

    CHECK TABLE t1 EXTENDED |

    DELETE FROM t1 WHERE col2 = my_int; COMMIT |

    DELETE FROM t1 WHERE col2 = my_int; COMMIT |

    insert_part ( my_int , $my_int, $my_int, NULL ); COMMIT |

    insert_part ( my_int , $my_int, $my_int, NULL ); ROLLBACK ;

thread1:

    query ;

thread1_connect:

    SET AUTOCOMMIT = 0 ;

thread1_init:

    create_table ;

thread_init:

    { sleep 30; exit 0 };

TBR-36-min_EXTENDED.yy

-----------------------------------------

insert_part:

    INSERT INTO t1 (col1,col2, col_text) VALUES ( 13 , 13, NULL ) ;

query:

    ALTER TABLE t1 ADD INDEX ( col_text(9) ) , ADD INDEX ( col1 ) |

    CHECK TABLE t1 EXTENDED |

    DELETE FROM t1 WHERE col2 = 13 OR col2 IS NULL ;

thread1:

    query ;

thread1_connect:

thread1_init:

    CREATE TABLE t1 ( col1 INT PRIMARY KEY, col2 INT, col_text TEXT ) ENGINE = InnoDB ROW_FORMAT = Dynamic ;

thread2:

    insert_part ;

thread2_connect:

thread2_init:

thread3:

    { sleep 30 ; exit 0 } ;

thread3_connect:

thread3_init:

thread_init:

    { sleep 30 ; exit 0 } ;

showed non rare as result of CHECK TABLE

- [ 'TBR-757', 'ERROR: ->.{1,100}check Warning InnoDB: The B-tree of index .{1,30} is corrupted.<-.' ],

- [ 'TBR-36', 'InnoDB: Index .{1,150} contains .{1,10} entries, should be .{1,10}.' ],

  [ 'TBR-36', 'failed: 1712 , errstr: Index .{1,100} is corrupted' ],

  [ 'TBR-36', 'failed: 1712 Index .* is corrupted' ],

Attempts to replay these problems on

origin/bb-10.6-MDEV-24402 435729fba417b5afcf759f05f3d6a1c62d3de058 2022-10-13T16:50:59+03:00 + the patch relax_check_extended.patch.diff

were given up after a few thousand tests "passed".

Nevertheless the bad effect

     [ 'TBR-757', 'ERROR: ->.{1,100}check Warning InnoDB: The B-tree of index .{1,30} is corrupted.<-.'

occurs again if using the full (== non simplified) grammar conf/mariadb/table_stress_innodb_nocopy1.yy.

The automatic RQG test simplifier is in the moment running.

Matthias Leich added a comment - 2022-10-17 20:02 - edited RQG testing on some older version of origin/bb-10.6-MDEV-24402 and using the following simplified RQG grammars TBR-757-micro.yy --------------------------- create_table: CREATE TABLE IF NOT EXISTS t1 ( non_generated_cols ) ENGINE = InnoDB ROW_FORMAT = Dynamic ; insert_part: INSERT INTO t1 (col1,col2, col_int, col_text ) VALUES ; my_int: { $my_int= 3 } | { $my_int= 5 } | { $my_int= 7 } ; non_generated_cols: col1 INT PRIMARY KEY, col2 INT, col_int INTEGER , col_text TEXT ; query: ALTER TABLE t1 ADD UNIQUE ( col_text(9) , col_int ) | ALTER TABLE t1 ADD UNIQUE ( col_int , col_int ) | CHECK TABLE t1 EXTENDED | DELETE FROM t1 WHERE col2 = my_int; COMMIT | DELETE FROM t1 WHERE col2 = my_int; COMMIT | insert_part ( my_int , $my_int, $my_int, NULL ); COMMIT | insert_part ( my_int , $my_int, $my_int, NULL ); ROLLBACK ; thread1: query ; thread1_connect: SET AUTOCOMMIT = 0 ; thread1_init: create_table ; thread_init: { sleep 30; exit 0 }; TBR-36-min_EXTENDED.yy ----------------------------------------- insert_part: INSERT INTO t1 (col1,col2, col_text) VALUES ( 13 , 13, NULL ) ; query: ALTER TABLE t1 ADD INDEX ( col_text(9) ) , ADD INDEX ( col1 ) | CHECK TABLE t1 EXTENDED | DELETE FROM t1 WHERE col2 = 13 OR col2 IS NULL ; thread1: query ; thread1_connect: ; thread1_init: CREATE TABLE t1 ( col1 INT PRIMARY KEY, col2 INT, col_text TEXT ) ENGINE = InnoDB ROW_FORMAT = Dynamic ; thread2: insert_part ; thread2_connect: ; thread2_init: ; thread3: { sleep 30 ; exit 0 } ; thread3_connect: ; thread3_init: ; thread_init: { sleep 30 ; exit 0 } ; showed non rare as result of CHECK TABLE - [ 'TBR-757', 'ERROR: ->.{1,100}check Warning InnoDB: The B-tree of index .{1,30} is corrupted.<-.' ], - [ 'TBR-36', 'InnoDB: Index .{1,150} contains .{1,10} entries, should be .{1,10}.' ], [ 'TBR-36', 'failed: 1712 , errstr: Index .{1,100} is corrupted' ], [ 'TBR-36', 'failed: 1712 Index .* is corrupted' ], Attempts to replay these problems on origin/bb-10.6-MDEV-24402 435729fba417b5afcf759f05f3d6a1c62d3de058 2022-10-13T16:50:59+03:00 + the patch relax_check_extended.patch.diff were given up after a few thousand tests "passed". Nevertheless the bad effect [ 'TBR-757', 'ERROR: ->.{1,100}check Warning InnoDB: The B-tree of index .{1,30} is corrupted.<-.' occurs again if using the full (== non simplified) grammar conf/mariadb/table_stress_innodb_nocopy1.yy. The automatic RQG test simplifier is in the moment running.

Matthias Leich added a comment - 2022-10-18 13:29

The following simplified test applied to origin/bb-10.6-MDEV-24402 (see above)

TBR-36--TBR-757_2-micro.yy

------------------------------------------

__clone__1:

    { $my_int= 2 } |

    { $my_int= 6 } |

    { $my_int= 8 } ;

__clone__2:

    { $my_int= 2 } |

    { $my_int= 6 } |

    { $my_int= 8 } ;

insert_part:

    INSERT INTO t3 (col1,col2, col_int, col_string, col_text ) VALUES ;

non_generated_cols:

    col1 INT PRIMARY KEY, col2 INT, col_int INTEGER, col_string INTEGER, col_text TEXT ;

query:

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 ADD UNIQUE KEY ( col_int ) , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 ADD UNIQUE KEY ( col_int ) , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    ALTER TABLE t3 ADD UNIQUE KEY ( col_int ) , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE |

    CHECK TABLE t3 EXTENDED |

    CHECK TABLE t3 EXTENDED |

    CHECK TABLE t3 EXTENDED |

    UPDATE t3 SET col_int = __clone__2 ORDER BY col1 DESC LIMIT 2 ; ROLLBACK |

    UPDATE t3 SET col_int = __clone__1 ORDER BY col1 DESC LIMIT 2 ; ROLLBACK |

    insert_part ( __clone__2 , $my_int, $my_int, '', '' ); ROLLBACK |

    insert_part ( __clone__1 , $my_int, $my_int, '', '' ); ROLLBACK ;

thread1:

    query ;

thread1_connect:

thread1_init:

    CREATE TABLE IF NOT EXISTS t3 ( non_generated_cols ) ENGINE = InnoDB ROW_FORMAT = Compact ;

thread_init:

    { sleep 30 ; exit 0 };

harvested during CHECK TABLE    check Warning InnoDB: The B-tree of index .{1,30} is corrupted.

In case of origin/bb-10.6-MDEV-24402 87e4f0dde4877090b6948c2734fbd37c860e216b 2022-10-18T10:20:02+03:00 + a patch which disables MDEV-371

the CHECK TABLE harvests now a warning like 'InnoDB: Clustered index record not found for index col_int of table test.t3 ...'.

Per Marko: This is no error.

Matthias Leich added a comment - 2022-10-18 13:29 The following simplified test applied to origin/bb-10.6-MDEV-24402 (see above) TBR-36--TBR-757_2-micro.yy ------------------------------------------ __clone__1: { $my_int= 2 } | { $my_int= 6 } | { $my_int= 8 } ; __clone__2: { $my_int= 2 } | { $my_int= 6 } | { $my_int= 8 } ; insert_part: INSERT INTO t3 (col1,col2, col_int, col_string, col_text ) VALUES ; non_generated_cols: col1 INT PRIMARY KEY, col2 INT, col_int INTEGER, col_string INTEGER, col_text TEXT ; query: ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 ADD UNIQUE KEY ( col_int ) , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 ADD UNIQUE KEY ( col_int ) , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | ALTER TABLE t3 ADD UNIQUE KEY ( col_int ) , ALGORITHM = NOCOPY , LOCK = EXCLUSIVE | CHECK TABLE t3 EXTENDED | CHECK TABLE t3 EXTENDED | CHECK TABLE t3 EXTENDED | UPDATE t3 SET col_int = __clone__2 ORDER BY col1 DESC LIMIT 2 ; ROLLBACK | UPDATE t3 SET col_int = __clone__1 ORDER BY col1 DESC LIMIT 2 ; ROLLBACK | insert_part ( __clone__2 , $my_int, $my_int, '', '' ); ROLLBACK | insert_part ( __clone__1 , $my_int, $my_int, '', '' ); ROLLBACK ; thread1: query ; thread1_connect: ; thread1_init: CREATE TABLE IF NOT EXISTS t3 ( non_generated_cols ) ENGINE = InnoDB ROW_FORMAT = Compact ; thread_init: { sleep 30 ; exit 0 }; harvested during CHECK TABLE check Warning InnoDB: The B-tree of index .{1,30} is corrupted. In case of origin/bb-10.6-MDEV-24402 87e4f0dde4877090b6948c2734fbd37c860e216b 2022-10-18T10:20:02+03:00 + a patch which disables MDEV-371 the CHECK TABLE harvests now a warning like 'InnoDB: Clustered index record not found for index col_int of table test.t3 ...'. Per Marko: This is no error.

Marko Mäkelä added a comment - 2022-10-18 14:02 - edited

Thank you, mleich. The test case corresponds to the scenario that I was analyzing yesterday, now filed as MDEV-29823. It is a ‘memory leak’ (failure to remove a secondary index that is no longer needed), but not one that can be fixed easily. Therefore, I would report it as a warning but not flag the secondary index as corrupted. MVCC reads as well as row_vers_impl_x_locked_low() will silently ignore delete-marked secondary index records for which no counterpart exists in the clustered index.

Marko Mäkelä added a comment - 2022-10-18 14:02 - edited Thank you, mleich . The test case corresponds to the scenario that I was analyzing yesterday, now filed as MDEV-29823 . It is a ‘memory leak’ (failure to remove a secondary index that is no longer needed), but not one that can be fixed easily. Therefore, I would report it as a warning but not flag the secondary index as corrupted. MVCC reads as well as row_vers_impl_x_locked_low() will silently ignore delete-marked secondary index records for which no counterpart exists in the clustered index.

Vladislav Lesin added a comment - 2022-10-21 06:43

The logic of row_check_index() and the code around purge_sys.end_view looks correct to me.

Vladislav Lesin added a comment - 2022-10-21 06:43 The logic of row_check_index() and the code around purge_sys.end_view looks correct to me.

People

Assignee:: Marko Mäkelä

Reporter:: Hartmut Holzgraefe

Votes:: 2 Vote for this issue

Watchers:: 9 Start watching this issue

Dates

Created:: 2020-12-14 18:13

Updated:: 2024-10-22 07:55

Resolved:: 2022-10-21 07:46

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server