Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25459

MVCC read from index on CHAR or VARCHAR wrongly omits rows

Details

    Description

      origin/10.6 af418bb9ef7e422282dc976640409a6af8fcd58c 2021-04-19T14:04:02+10:00
       
      Scenario:
      1. Start the server, create one table and fill it with 100 rows.
      2. Session 1 runs in a loop some update like
           UPDATE table100_innodb_int_autoinc SET `col_varchar_255_ucs2_key` = CONVERT( 'degsotrsfannidwyvkuvlkeslrryhpkeevqmbksdrzadzpyisznignsytihyjixyalxfxpnafjwzgnkbbayklurufrsajtzohanbuvcfyykvtmesobixwipkoihhqykvoejckythjnjshxgohmecmklxryubdexjgxehdiqqui' USING ASCII )
          Session 2 runs in a loop
          CHECK TABLE table100_innodb_int_autoinc EXTENDED
      After short time the CHECK TABLE harvests a
      test.table100_innodb_int_autoinc check Warning InnoDB: Index 'col_varchar_255_ucs2_key' contains 98 entries, should be 100.
       
      sdp:/data/Results/1618846463/TBR-36/dev/shm/vardir/1618846463/4/1/rr
           
      

      Attachments

        1. simp_many_indexes.cfg
          43 kB
        2. TBR-36_micro.yy
          0.9 kB
        3. TBR-36.zz
          1 kB

        Issue Links

          Activity

            I will have to debug the trace further (around when 61095) to check why exactly we are assigning clust_rec=NULL here:

            		if (clust_rec
            		    && (old_vers
            			|| trx->isolation_level <= TRX_ISO_READ_UNCOMMITTED
            			|| dict_index_is_spatial(sec_index)
            			|| rec_get_deleted_flag(rec, dict_table_is_comp(
            							sec_index->table)))) {
            			err = row_sel_sec_rec_is_for_clust_rec(rec, sec_index,
            						clust_rec, clust_index, thr);
            			switch (err) {
            			case DB_SUCCESS:
            				clust_rec = NULL;
            				break;
            

            Currently it looks like we are constructing wrong old_vers with (pk=63,DB_TRX_ID=0x3c,(update)) and apparently empty string. It is as if CHECK TABLE is using a newer read view for accessing the clustered index.

            marko Marko Mäkelä added a comment - I will have to debug the trace further (around when 61095) to check why exactly we are assigning clust_rec=NULL here: if (clust_rec && (old_vers || trx->isolation_level <= TRX_ISO_READ_UNCOMMITTED || dict_index_is_spatial(sec_index) || rec_get_deleted_flag(rec, dict_table_is_comp( sec_index->table)))) { err = row_sel_sec_rec_is_for_clust_rec(rec, sec_index, clust_rec, clust_index, thr); switch (err) { case DB_SUCCESS: clust_rec = NULL; break ; Currently it looks like we are constructing wrong old_vers with (pk=63,DB_TRX_ID=0x3c,(update)) and apparently empty string. It is as if CHECK TABLE is using a newer read view for accessing the clustered index.

            For the clustered index scan in the failing CHECK TABLE, row_sel_clust_sees() will not hold for the single record
            (pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)=
            (63,0x3e,(update),'degsotrsfann'…,…)
            so we constructed old_vers that contains a BLOB reference for that column:
            (pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)=
            (63,0x3c,(update),(space=8, page=5, offset=0x26, length=0x1f0),…)

            The payload of the BLOB page is the following UTF-16BE encoded 248-char ASCII string:

            qemleuzymtgopfikwzloyibtgehtjhlablwaewqzuglpeoqnnoxsqeyzbocilomtnpaxihztdrtqahffxekdhrwdvtdvmlshvdhrgfkmuwonkyqzwelujpeyggixymdugzhiqlmnrlpxzrytoatxolflxdzknhkgyttnjqwqcutogtwviuoqkbzxlbzkzdupmxeeifroaweulyyetwdnnrvxrtoksrgcplpyubzpyntedwlgnvzehakg
            

            The prefix of this seems to match the delete-marked record in the secondary index page that I posted earlier.

            So, nothing seems to be really corrupted, but the secondary index MVCC code probably fails to read the BLOB, and instead treats the 20-byte BLOB pointer as the actual string. I will post more, once I have debugged that part of the code.

            marko Marko Mäkelä added a comment - For the clustered index scan in the failing CHECK TABLE , row_sel_clust_sees() will not hold for the single record (pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)= (63,0x3e,(update),'degsotrsfann'…,…) so we constructed old_vers that contains a BLOB reference for that column: (pk,DB_TRX_ID,DB_ROLL_PTR,col_varchar_255_ucs2_key,…)= (63,0x3c,(update),(space=8, page=5, offset=0x26, length=0x1f0),…) The payload of the BLOB page is the following UTF-16BE encoded 248-char ASCII string: qemleuzymtgopfikwzloyibtgehtjhlablwaewqzuglpeoqnnoxsqeyzbocilomtnpaxihztdrtqahffxekdhrwdvtdvmlshvdhrgfkmuwonkyqzwelujpeyggixymdugzhiqlmnrlpxzrytoatxolflxdzknhkgyttnjqwqcutogtwviuoqkbzxlbzkzdupmxeeifroaweulyyetwdnnrvxrtoksrgcplpyubzpyntedwlgnvzehakg The prefix of this seems to match the delete-marked record in the secondary index page that I posted earlier. So, nothing seems to be really corrupted, but the secondary index MVCC code probably fails to read the BLOB, and instead treats the 20-byte BLOB pointer as the actual string. I will post more, once I have debugged that part of the code.

            We have this code in row_sel_sec_rec_is_for_clust_rec():

            		if (ifield->prefix_len > 0 && len != UNIV_SQL_NULL
            		    && sec_len != UNIV_SQL_NULL && !is_virtual) {
             
            			if (rec_offs_nth_extern(clust_offs, clust_pos)) {
            				len -= BTR_EXTERN_FIELD_REF_SIZE;
            			}
             
            			len = dtype_get_at_most_n_mbchars(
            				col->prtype, col->mbminlen, col->mbmaxlen,
            				ifield->prefix_len, len, (char*) clust_field);
             
            			if (rec_offs_nth_extern(clust_offs, clust_pos)
            			    && len < sec_len) {
            				if (!row_sel_sec_rec_is_for_blob(
            

            Because the secondary index does not comprise a column prefix, but in fact the full column col_varchar_255_ucs2_key, we fail to invoke the check for BLOB (really, off-page column, in this case VARCHAR).

            This should mean that MVCC is broken for full-column secondary indexes in ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED InnoDB tables if the column happened to become chosen for external storage in the clustered index. In ROW_FORMAT=REDUNDANT (the default and only format until MySQL 5.0.3) and ROW_FORMAT=COMPACT (the default between MySQL 5.0.3 and 5.7, or MariaDB Server before 10.2.2), we would always store the entire column in the clustered index.

            I will have to check if something similar is broken in the implicit locking of secondary indexes, in row_vers_impl_x_locked_low(). If the call to row_build() is not fetching the externally stored VARCHAR column, locking for secondary index records could be broken. I will try to verify that on a copy of the data directory and possibly a patched server.

            marko Marko Mäkelä added a comment - We have this code in row_sel_sec_rec_is_for_clust_rec() : if (ifield->prefix_len > 0 && len != UNIV_SQL_NULL && sec_len != UNIV_SQL_NULL && !is_virtual) {   if (rec_offs_nth_extern(clust_offs, clust_pos)) { len -= BTR_EXTERN_FIELD_REF_SIZE; }   len = dtype_get_at_most_n_mbchars( col->prtype, col->mbminlen, col->mbmaxlen, ifield->prefix_len, len, ( char *) clust_field);   if (rec_offs_nth_extern(clust_offs, clust_pos) && len < sec_len) { if (!row_sel_sec_rec_is_for_blob( Because the secondary index does not comprise a column prefix, but in fact the full column col_varchar_255_ucs2_key , we fail to invoke the check for BLOB (really, off-page column, in this case VARCHAR ). This should mean that MVCC is broken for full-column secondary indexes in ROW_FORMAT=DYNAMIC and ROW_FORMAT=COMPRESSED InnoDB tables if the column happened to become chosen for external storage in the clustered index. In ROW_FORMAT=REDUNDANT (the default and only format until MySQL 5.0.3) and ROW_FORMAT=COMPACT (the default between MySQL 5.0.3 and 5.7, or MariaDB Server before 10.2.2), we would always store the entire column in the clustered index. I will have to check if something similar is broken in the implicit locking of secondary indexes, in row_vers_impl_x_locked_low() . If the call to row_build() is not fetching the externally stored VARCHAR column, locking for secondary index records could be broken. I will try to verify that on a copy of the data directory and possibly a patched server.

            I can repeat the problem with the following:

            --source include/innodb_page_size_small.inc
             
            CREATE TABLE t1 (
              pk int PRIMARY KEY, c varchar(255) UNIQUE,
              d char(255), e varchar(255), f char(255), g char(255)
            ) ENGINE=InnoDB ROW_FORMAT=DYNAMIC DEFAULT CHARACTER SET ucs2;
             
            INSERT INTO t1 VALUES
            (1,REPEAT('c',248),REPEAT('a',106),REPEAT('b',220),REPEAT('x',14),'');
             
            BEGIN;
            UPDATE t1 SET c=REPEAT('d',170);
             
            connect (con1,localhost,root,,);
            SELECT pk FROM t1 FORCE INDEX (c);
            connection default;
            COMMIT;
            connection con1;
            SELECT pk FROM t1 FORCE INDEX (c);
            disconnect con1;
            connection default;
            DROP TABLE t1;
            

            10.2 922e676b43c7b5cb0f20ca67c6d2222e2fc5ec03

            innodb.mvcc_secondary '16k,innodb'       w1 [ pass ]      6
            innodb.mvcc_secondary '8k,innodb'        w3 [ pass ]      5
            innodb.mvcc_secondary '4k,innodb'        w2 [ fail ]
                    Test ended at 2021-04-23 14:06:40
             
            CURRENT_TEST: innodb.mvcc_secondary
            --- /mariadb/10.2o/mysql-test/suite/innodb/r/mvcc_secondary.result	2021-04-23 14:06:25.841488727 +0300
            +++ /mariadb/10.2o/mysql-test/suite/innodb/r/mvcc_secondary.reject	2021-04-23 14:06:40.565659501 +0300
            @@ -9,7 +9,6 @@
             connect  con1,localhost,root,,;
             SELECT pk FROM t1 FORCE INDEX (c);
             pk
            -1
             connection default;
             COMMIT;
             connection con1;
             
            mysqltest: Result length mismatch
            

            I believe that the bug affects also earlier versions, but I did not test it, because MariaDB 5.5, 10.0, 10.1 have already reached their end of life.

            Also, this test case only repeats the problem for innodb_page_size=4k, it should be repeatable with any page size when using appropriately sized records.

            marko Marko Mäkelä added a comment - I can repeat the problem with the following: --source include/innodb_page_size_small.inc   CREATE TABLE t1 ( pk int PRIMARY KEY , c varchar (255) UNIQUE , d char (255), e varchar (255), f char (255), g char (255) ) ENGINE=InnoDB ROW_FORMAT= DYNAMIC DEFAULT CHARACTER SET ucs2;   INSERT INTO t1 VALUES (1,REPEAT( 'c' ,248),REPEAT( 'a' ,106),REPEAT( 'b' ,220),REPEAT( 'x' ,14), '' );   BEGIN ; UPDATE t1 SET c=REPEAT( 'd' ,170);   connect (con1,localhost,root,,); SELECT pk FROM t1 FORCE INDEX (c); connection default ; COMMIT ; connection con1; SELECT pk FROM t1 FORCE INDEX (c); disconnect con1; connection default ; DROP TABLE t1; 10.2 922e676b43c7b5cb0f20ca67c6d2222e2fc5ec03 innodb.mvcc_secondary '16k,innodb' w1 [ pass ] 6 innodb.mvcc_secondary '8k,innodb' w3 [ pass ] 5 innodb.mvcc_secondary '4k,innodb' w2 [ fail ] Test ended at 2021-04-23 14:06:40   CURRENT_TEST: innodb.mvcc_secondary --- /mariadb/10.2o/mysql-test/suite/innodb/r/mvcc_secondary.result 2021-04-23 14:06:25.841488727 +0300 +++ /mariadb/10.2o/mysql-test/suite/innodb/r/mvcc_secondary.reject 2021-04-23 14:06:40.565659501 +0300 @@ -9,7 +9,6 @@ connect con1,localhost,root,,; SELECT pk FROM t1 FORCE INDEX (c); pk -1 connection default; COMMIT; connection con1;   mysqltest: Result length mismatch I believe that the bug affects also earlier versions, but I did not test it, because MariaDB 5.5, 10.0, 10.1 have already reached their end of life. Also, this test case only repeats the problem for innodb_page_size=4k , it should be repeatable with any page size when using appropriately sized records.

            If I append FOR UPDATE or LOCK IN SHARE MODE to the first SELECT, a locking conflict will occur:

            10.2 922e676b43c7b5cb0f20ca67c6d2222e2fc5ec03

            mysqltest: At line 15: query 'SELECT pk FROM t1 FORCE INDEX (c) LOCK IN SHARE MODE' failed: 1205: Lock wait timeout exceeded; try restarting transaction
            

            This suggests that no change for row_vers_impl_x_locked_low() is needed, and only row_sel_sec_rec_is_for_clust_rec() is wrong.

            marko Marko Mäkelä added a comment - If I append FOR UPDATE or LOCK IN SHARE MODE to the first SELECT , a locking conflict will occur: 10.2 922e676b43c7b5cb0f20ca67c6d2222e2fc5ec03 mysqltest: At line 15: query 'SELECT pk FROM t1 FORCE INDEX (c) LOCK IN SHARE MODE' failed: 1205: Lock wait timeout exceeded; try restarting transaction This suggests that no change for row_vers_impl_x_locked_low() is needed, and only row_sel_sec_rec_is_for_clust_rec() is wrong.

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.