Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27746

Wrong comparision of BLOB's empty preffix with non-preffixed BLOB causes rows count mismatch for clustered and secondary indexes during non-locking read

Details

    Description

      origin/10.7 commit 3351dfaab0599268eaf25f9d6995ef128910a8b9
       
      SET GLOBAL default_storage_engine = 'InnoDB' ;
      SET SESSION sql_mode = 'NO_ENGINE_SUBSTITUTION' ;
      CREATE TABLE IF NOT EXISTS t1 ( col_text TEXT NOT NULL, KEY ( col_text(9) ) ) ENGINE = InnoDB ROW_FORMAT = Dynamic ;
      SET @fill_amount = (@@innodb_page_size / 2 ) + 1 ;
      INSERT INTO t1 (col_text) VALUES ( REPEAT(SUBSTR(CAST( 24 - 1 AS CHAR),1,1), @fill_amount) ) ;
      UPDATE t1 SET col_text = REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), @fill_amount) ;
      Warnings:
      Warning 1048    Column 'col_text' cannot be null
      UPDATE t1 SET col_text = REPEAT(SUBSTR(CAST( NULL AS CHAR),1,1), @fill_amount) ;
      Warnings:
      Warning 1048    Column 'col_text' cannot be null
      UPDATE t1 SET col_text = REPEAT(SUBSTR(CAST( 18 AS CHAR),1,1), @fill_amount) ;
      CHECK TABLE t1 ;
      Table   Op      Msg_type        Msg_text
      test.t1 check   Warning InnoDB: Index 'col_text' contains 2 entries, should be 1.
      test.t1 check   error   Corrupt
       
      I cannot exclude that the uploaded MTR test contains more SQL than required for showing the problem.
      Warning:
      The layout of the table with one column only comes from the attempt to get the smallest test possible.
      In case the table contains additional columns of type INT and/or CHAR/VARCHAR I get the same bad effect.
      

      Attachments

        Issue Links

          Activity

            vlad.lesin Vladislav Lesin added a comment - - edited

            I repeated and recorded the issue locally on 10.6 with Marko's test. I debugged row_search_mvcc() for CHECK TABLE's index scanning. Clustered index scanning looks good, the index contains one record with correct link to BLOB and trx_id.

            (DB_ROW_ID=0x000000000205, DB_TRX_ID=0x000000000066, DB_ROLL_PTR=0x2f0000015d0110, col_text={(space_id=0x00000a, page=0x00000006, offset=0x00000026,...), points to: {len= 0x00002001, 0xff, 0xff, 0xff, 0xff, 0x79('y') <repeats 192 times>}})
            

            The first found record during secondary index scanning is:

            (col_text='', DB_ROW_ID=0x000000000205) - delete-marked
            p/x rec[-6]@12
            $29 = {0x0, 0x20, 0x0, 0x18, 0xff, 0xeb, 0x0, 0x0, 0x0, 0x0, 0x2, 0x5}
            

            As CHECK TABLE's readview can see the changes of transaction with id equal to the secondary index page transaction id, clustered index record is read. As the readview can see clustered index record, and the secondary index record is delete-marked, the BLOB fields are compared.

            As we compare the empty field of secondary index(with length equal to 0) with non-empty BLOB field of clustered index, row_sel_sec_rec_is_for_clust_rec() must return DB_SUCCESS with the following calls trace:

            â–¾ row_sel_sec_rec_is_for_clust_rec                                                                                                
              â–¾ Row_sel_get_clust_rec_for_mysql::operator()                                 
                â–¸ row_search_mvcc
            

            but it returns DB_SUCCESS_LOCKED_REC because of the following code:

            static                                                                          
            dberr_t                                                                         
            row_sel_sec_rec_is_for_clust_rec(...)                                           
            {                                                                               
            ...                                                                             
                                                                                            
                                    if (ulint prefix_len = ifield->prefix_len) {            
                                            len = dtype_get_at_most_n_mbchars(              
                                                    col->prtype, col->mbminlen,             
                                                    col->mbmaxlen, prefix_len, len,         
                                                    reinterpret_cast<const char*>(          
                                                            clust_field));                  
                                            if (len < sec_len) {                                                                              
                                                    goto check_for_blob;                    
                                            }                                               
                                    } else {                                                
            check_for_blob:                                                                 
                                            if (rec_offs_nth_extern(clust_offs,             
                                                                    clust_pos)) {           
                                                    if (!row_sel_sec_rec_is_for_blob(       
                                                                col->mtype, col->prtype,    
                                                                col->mbminlen,              
                                                                col->mbmaxlen,              
                                                                clust_field, clust_len,     
                                                                sec_field, sec_len,         
                                                                prefix_len,                                                                   
                                                                clust_index->table)) {      
                                                            return DB_SUCCESS;              
                                                    }                                                                                         
                                                                                                                                              
                                                    continue;                                                                                 
                                            }                                               
                                    }                                                       
                            }                                                               
                                                                                            
                            if (0 != cmp_data_data(col->mtype, col->prtype,                 
                                                   clust_field, len,                        
                                                   sec_field, sec_len)) {
                                    return DB_SUCCESS;                                      
                            }                                                               
            ...                                                                             
            }
            

            I suppose that for BLOB comparison row_sel_sec_rec_is_for_blob() must be invoked. But in the above code "if (ulint prefix_len = ifield->prefix_len)" branch is executed. Then dtype_get_at_most_n_mbchars(...) returns 0 and then this 0 is compared with sec_len, which is 0 too. So instead of row_sel_sec_rec_is_for_blob() cmp_data_data() is executed, which, in turns, returns 0 as both len and sec_len equal to 0. The secondary index record length is counted from offsets array, which is initialized in rec_init_offsets_comp_ordinary(), and the length of the BLOB prefix field is read from rec[-6] byte, which is 0. According to the above, empty BLOB prefix field from secondary index equals to non-empty BLOB field in clustered record. Commit 25ed665a is under suspicion.

            Why dtype_get_at_most_n_mbchars(...) returns 0, i.e. why BLOB's field len is 0? That's because of the following code in row_sel_sec_rec_is_for_clust_rec():

                                    len = clust_len;                                        
                                    if (rec_offs_nth_extern(clust_offs, clust_pos)) {       
                                            len -= BTR_EXTERN_FIELD_REF_SIZE;               
            

            For our case clust_len equals to BTR_EXTERN_FIELD_REF_SIZE, so len = 0, and it's passed as data_len argument in dtype_get_at_most_n_mbchars(...), that is why it returns 0.

            As row_sel_sec_rec_is_for_clust_rec() treats empty BLOB prefix field in secondary index equal to any external BLOB field in clustered index and returns DB_SUCCESS_LOCKED_REC, Row_sel_get_clust_rec_for_mysql::operator() doesn't zerro out clustered record pointer in row_search_mvcc(), and row_search_mvcc() thinks that delete-marked secondary index record has visible for CHECK TABLES read view old-versioned clustered index record, and row_scan_index_for_mysql() counts it as a row.

            vlad.lesin Vladislav Lesin added a comment - - edited I repeated and recorded the issue locally on 10.6 with Marko's test. I debugged row_search_mvcc() for CHECK TABLE's index scanning. Clustered index scanning looks good, the index contains one record with correct link to BLOB and trx_id. (DB_ROW_ID=0x000000000205, DB_TRX_ID=0x000000000066, DB_ROLL_PTR=0x2f0000015d0110, col_text={(space_id=0x00000a, page=0x00000006, offset=0x00000026,...), points to: {len= 0x00002001, 0xff, 0xff, 0xff, 0xff, 0x79('y') <repeats 192 times>}}) The first found record during secondary index scanning is: (col_text='', DB_ROW_ID=0x000000000205) - delete-marked p/x rec[-6]@12 $29 = {0x0, 0x20, 0x0, 0x18, 0xff, 0xeb, 0x0, 0x0, 0x0, 0x0, 0x2, 0x5} As CHECK TABLE's readview can see the changes of transaction with id equal to the secondary index page transaction id, clustered index record is read. As the readview can see clustered index record, and the secondary index record is delete-marked, the BLOB fields are compared. As we compare the empty field of secondary index(with length equal to 0) with non-empty BLOB field of clustered index, row_sel_sec_rec_is_for_clust_rec() must return DB_SUCCESS with the following calls trace: â–¾ row_sel_sec_rec_is_for_clust_rec â–¾ Row_sel_get_clust_rec_for_mysql::operator() â–¸ row_search_mvcc but it returns DB_SUCCESS_LOCKED_REC because of the following code: static dberr_t row_sel_sec_rec_is_for_clust_rec(...) { ... if (ulint prefix_len = ifield->prefix_len) { len = dtype_get_at_most_n_mbchars( col->prtype, col->mbminlen, col->mbmaxlen, prefix_len, len, reinterpret_cast< const char *>( clust_field)); if (len < sec_len) { goto check_for_blob; } } else { check_for_blob: if (rec_offs_nth_extern(clust_offs, clust_pos)) { if (!row_sel_sec_rec_is_for_blob( col->mtype, col->prtype, col->mbminlen, col->mbmaxlen, clust_field, clust_len, sec_field, sec_len, prefix_len, clust_index->table)) { return DB_SUCCESS; } continue ; } } } if ( 0 != cmp_data_data(col->mtype, col->prtype, clust_field, len, sec_field, sec_len)) { return DB_SUCCESS; } ... } I suppose that for BLOB comparison row_sel_sec_rec_is_for_blob() must be invoked. But in the above code "if (ulint prefix_len = ifield->prefix_len)" branch is executed. Then dtype_get_at_most_n_mbchars(...) returns 0 and then this 0 is compared with sec_len, which is 0 too. So instead of row_sel_sec_rec_is_for_blob() cmp_data_data() is executed, which, in turns, returns 0 as both len and sec_len equal to 0. The secondary index record length is counted from offsets array, which is initialized in rec_init_offsets_comp_ordinary(), and the length of the BLOB prefix field is read from rec [-6] byte, which is 0. According to the above, empty BLOB prefix field from secondary index equals to non-empty BLOB field in clustered record. Commit 25ed665a is under suspicion. Why dtype_get_at_most_n_mbchars(...) returns 0, i.e. why BLOB's field len is 0? That's because of the following code in row_sel_sec_rec_is_for_clust_rec(): len = clust_len; if (rec_offs_nth_extern(clust_offs, clust_pos)) { len -= BTR_EXTERN_FIELD_REF_SIZE; For our case clust_len equals to BTR_EXTERN_FIELD_REF_SIZE, so len = 0, and it's passed as data_len argument in dtype_get_at_most_n_mbchars(...), that is why it returns 0. As row_sel_sec_rec_is_for_clust_rec() treats empty BLOB prefix field in secondary index equal to any external BLOB field in clustered index and returns DB_SUCCESS_LOCKED_REC, Row_sel_get_clust_rec_for_mysql::operator() doesn't zerro out clustered record pointer in row_search_mvcc(), and row_search_mvcc() thinks that delete-marked secondary index record has visible for CHECK TABLES read view old-versioned clustered index record, and row_scan_index_for_mysql() counts it as a row.

            If a BLOB pointer is present, then the record should not match a zero-length column prefix. This is because dtuple_convert_big_rec() should never create a BLOB for a column that is shorter than BTR_EXTERN_FIELD_REF_SIZE (20) bytes.

            Furthermore, if the BTR_EXTERN_LEN is not at least as long as the column prefix in the secondary index record, a match should be impossible, and we can avoid fetching the BLOB.

            marko Marko Mäkelä added a comment - If a BLOB pointer is present, then the record should not match a zero-length column prefix. This is because dtuple_convert_big_rec() should never create a BLOB for a column that is shorter than BTR_EXTERN_FIELD_REF_SIZE (20) bytes. Furthermore, if the BTR_EXTERN_LEN is not at least as long as the column prefix in the secondary index record, a match should be impossible, and we can avoid fetching the BLOB.

            I understood why the mtr test case is unstable. The difference between failed and passed tests is that CHECK TABLE's readview sees secondary index record with empty BLOB for passed test and doesn't see it for failed test(that it why it requests clustered index and compares BLOB fields). For the failed test there was active rw-transaction when CHECK TABLE's readview was created - dict statistics was being updated. That is why STATS_PERSISTENT=1 influenced test result. To make it stable is enough to have active rw-transaction when CHECK TABLE is executed. Here is stable test case: MDEV-27746.test.

            vlad.lesin Vladislav Lesin added a comment - I understood why the mtr test case is unstable. The difference between failed and passed tests is that CHECK TABLE's readview sees secondary index record with empty BLOB for passed test and doesn't see it for failed test(that it why it requests clustered index and compares BLOB fields). For the failed test there was active rw-transaction when CHECK TABLE's readview was created - dict statistics was being updated. That is why STATS_PERSISTENT=1 influenced test result. To make it stable is enough to have active rw-transaction when CHECK TABLE is executed. Here is stable test case: MDEV-27746.test .

            Pushed bb-10.6-MDEV-27746-emty-blob.

            vlad.lesin Vladislav Lesin added a comment - Pushed bb-10.6- MDEV-27746 -emty-blob.

            origin/bb-10.6-MDEV-27746-emty-blob 04589516851124d33746ef89c9ed0c47f074bb62 2022-02-10T11:49:06+03:00
            behaved well in RQG testing.

            mleich Matthias Leich added a comment - origin/bb-10.6- MDEV-27746 -emty-blob 04589516851124d33746ef89c9ed0c47f074bb62 2022-02-10T11:49:06+03:00 behaved well in RQG testing.

            People

              vlad.lesin Vladislav Lesin
              mleich Matthias Leich
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.