Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-19344

innodb.innodb-change-buffer-recovery-innodb fails with incorrect results or crash

Details

    Description

      In debug mode running a test with --repeat results in an every second run failed like this:

      innodb.innodb-change-buffer-recovery 'innodb' [ fail ]
              Test ended at 2019-04-26 12:23:01
       
      CURRENT_TEST: innodb.innodb-change-buffer-recovery
      --- /home/kevg/work/mariadb/mysql-test/suite/innodb/r/innodb-change-buffer-recovery.result	2019-04-26 12:14:05.804197310 +0300
      +++ /home/kevg/work/mariadb/mysql-test/suite/innodb/r/innodb-change-buffer-recovery.reject	2019-04-26 12:23:01.364749083 +0300
      @@ -38,7 +38,7 @@
       ERROR HY000: Lost connection to MySQL server during query
       disconnect con1;
       connection default;
      -FOUND 1 /Wrote log record for ibuf update in place operation/ in my_restart.err
      +FOUND 2 /Wrote log record for ibuf update in place operation/ in my_restart.err
       CHECK TABLE t1;
       Table	Op	Msg_type	Msg_text
       test.t1	check	status	OK
       
      mysqltest: Result content mismatch
       
       - saving '/home/kevg/work/debug/mysql-test/var/log/innodb.innodb-change-buffer-recovery-innodb/' to '/home/kevg/work/debug/mysql-test/var/log/innodb.innodb-change-buffer-recovery-innodb/'
      

      In debug mode with -O2 it sometime crashes like this:

      #6  0x00007f4016d55012 in __GI___assert_fail (assertion=0xfadc93 "field != 16 || val", file=0xf9048b "/work/mariadb/storage/innobase/include/page0page.ic", line=176, function=0xfadc3f "void page_header_set_field(ib_page_t *, page_zip_des_t *, ulint, ulint)") at assert.c:101
      #7  0x0000000000a1bbcb in page_header_set_field (page=<optimized out>, page_zip=0x0, field=16, val=<optimized out>) at page0page.ic:176
      #8  page_mem_free (page=<optimized out>, page_zip=0x0, rec=<optimized out>, index=<optimized out>, offsets=<optimized out>) at page0page.ic:1047
      #9  page_cur_delete_rec (cursor=0x7f40106e8d01, index=<optimized out>, offsets=0x7ffe97abb990, mtr=0x7ffe97abc230) at page0cur.cc:2417
      #10 0x0000000000b79fab in btr_cur_optimistic_delete_func (cursor=0x7ffe97abc760, flags=<optimized out>, mtr=0x7ffe97abc230) at btr0cur.cc:5163
      #11 0x00000000009c4387 in ibuf_delete_rec (space=4, page_no=27, pcur=0x7ffe97abc760, search_tuple=0x26faf98, mtr=0x7ffe97abc230) at ibuf0ibuf.cc:4261
      #12 0x00000000009c11de in ibuf_merge_or_delete_for_page (block=0x7f40102526a8, page_id=..., page_size=0x7f40102526b8, update_ibuf_bitmap=1) at ibuf0ibuf.cc:4645
      #13 0x00000000009fcd6f in mlog_init_t::ibuf_merge (this=<optimized out>, mtr=...) at log0recv.cc:330
      #14 0x00000000009f44c5 in recv_apply_hashed_log_recs (last_batch=<optimized out>) at log0recv.cc:2516
      #15 0x0000000000ae989b in innobase_start_or_create_for_mysql () at srv0start.cc:2224
      #16 0x000000000097d77f in innobase_init (p=<optimized out>) at ha_innodb.cc:4377
      #17 0x00000000007af955 in ha_initialize_handlerton (plugin=0x2367ca8) at handler.cc:521
      #18 0x000000000060cc59 in plugin_initialize (tmp_root=0x7ffe97ac26b8, plugin=0x2367ca8, argc=0x15d1f68 <remaining_argc>, argv=0x2225428, options_only=<optimized out>) at sql_plugin.cc:1416
      #19 0x000000000060c77e in plugin_init (argc=0x15d1f68 <remaining_argc>, argv=0x2225428, flags=2) at sql_plugin.cc:1697
      #20 0x00000000005525b5 in init_server_components () at mysqld.cc:5311
      #21 mysqld_main (argc=<optimized out>, argv=0x2225428) at mysqld.cc:5907
      

      Adding -mysqld=-debug=d,ib_log_checkpoint_avoid make debug version fail with the same backtrace.

      Attachments

        Issue Links

          Activity

            kevg Eugene Kosov (Inactive) created issue -

            kevg, in MDEV-19346 I had to disable the assertion, because it would now fails by default, due to fewer redo log checkpoints being executed. I hope that you can now more easily reduce the test case and narrow down the cause. Please remember --mysqld=--debug=d,ib_log_checkpoint_avoid just in case.

            marko Marko Mäkelä added a comment - kevg , in MDEV-19346 I had to disable the assertion, because it would now fails by default, due to fewer redo log checkpoints being executed. I hope that you can now more easily reduce the test case and narrow down the cause. Please remember --mysqld=--debug=d,ib_log_checkpoint_avoid just in case.
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Fix Version/s 10.5 [ 23123 ]

            We could trivially fix this bug by removing the ability of the purge of history to submit work to the change buffer, but first I would like to see some results of performance tests.

            marko Marko Mäkelä added a comment - We could trivially fix this bug by removing the ability of the purge of history to submit work to the change buffer , but first I would like to see some results of performance tests.

            I ported to 10.2 my patch to cease the purge buffering. It also re-enables the assertion.

            marko Marko Mäkelä added a comment - I ported to 10.2 my patch to cease the purge buffering. It also re-enables the assertion.
            marko Marko Mäkelä made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            marko Marko Mäkelä made changes -

            I believe that the test failure shares the root cause with MySQL Bug #61104, which was never fixed (the assertion was merely changed to a warning).

            marko Marko Mäkelä added a comment - I believe that the test failure shares the root cause with MySQL Bug #61104 , which was never fixed (the assertion was merely changed to a warning).

            Based on the benchmarks that were run by axel, the change buffering of purge operations is significantly improving performance when the buffer pool size is slightly smaller than the working set.

            Hence, we should try to spend some more effort to correct the error rather than to disable the feature altogether.

            marko Marko Mäkelä added a comment - Based on the benchmarks that were run by axel , the change buffering of purge operations is significantly improving performance when the buffer pool size is slightly smaller than the working set. Hence, we should try to spend some more effort to correct the error rather than to disable the feature altogether.
            marko Marko Mäkelä made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Eugene Kosov [ kevg ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            axel Axel Schwenke made changes -
            Attachment MDEV-19344.ods [ 49142 ]
            axel Axel Schwenke added a comment - - edited

            I attach the spread sheet MDEV-19344.ods with the results of my benchmark. The workload is OLTP-ish with SELECT using the secondary index (and thus realizing buffered changes). Three different mixes of writes are used additionally:

            1. DELETE + INSERT
            2. UPDATE on the indexed column
            3. a mix of the former two

            I did 3 complete runs with different buffer pool size. The working set is ~40GB and the buffer pool was set to 32G, 20G and 40G respectively. Relevant for this task are the runs with 32 and 20G.

            It turns out that workloads with UPDATE suffer quite a lot from removing the change buffering. Performance drops by ~5% on average, more than 10% for certain thread counts.

            axel Axel Schwenke added a comment - - edited I attach the spread sheet MDEV-19344.ods with the results of my benchmark. The workload is OLTP-ish with SELECT using the secondary index (and thus realizing buffered changes). Three different mixes of writes are used additionally: DELETE + INSERT UPDATE on the indexed column a mix of the former two I did 3 complete runs with different buffer pool size. The working set is ~40GB and the buffer pool was set to 32G, 20G and 40G respectively. Relevant for this task are the runs with 32 and 20G. It turns out that workloads with UPDATE suffer quite a lot from removing the change buffering. Performance drops by ~5% on average, more than 10% for certain thread counts.

            mleich provided a rr record trace of a related failure for a 10.5-based tree that I spent all day analyzing today. I think that I may have found the cause of this corruption. I do not know this for sure yet, and I would have to go back to an older version to reproduce the failure stated in the Description, to check it with the fix, once I have a fix.

            In short, the function ibuf_get_volume_buffered_hash() is calculating the hash value for garbage, and thus it will consider two equal keys to be distinct. The key is (NULL,520) in our case.

            	len = ibuf_rec_get_size(
            		rec, types,
            		rec_get_n_fields_old(rec) - IBUF_REC_FIELD_USER,
            		FALSE, comp);
            	fold = ut_fold_binary(data, len);
            

            For the two occurrences of (NULL,520), we got len=510 even though the payload (0x80, 0x0, 0x2, 0x8) seems to be in the first 4 bytes. The more complete scenario was like this:

            SET @r=REPEAT(_ucs2 0x0008, 152);
             
            Secondary index page contents at eviction:
            (@r,34),(@r,35),(@r,36)
             
            Buffered changes:
            866171 INSERT (NULL, 518)
            page contents after applying: (NULL,518),(@r,34),(@r,35),(@r,36)
            885859 DELETE_MARK (@r, 34)
            page contents after applying: (NULL,518),d(@r,34),(@r,35),(@r,36)
            886807 DELETE (@r, 34)
            page contents after applying: (NULL,518),(@r,35),(@r,36)
            890308 DELETE_MARK (NULL, 518)
            page contents after applying: d(NULL,518),(@r,35),(@r,36)
            890961 DELETE (NULL, 518)
            page contents after applying: (@r,35),(@r,36)
            903869 DELETE_MARK (@r, 35)
            page contents after applying: d(@r,35),(@r,36)
            904861 DELETE (@r, 35)
            page contents after applying: (@r,36)
            912808 DELETE_MARK (@r, 36)
            page contents after applying: d(@r,36)
            922143 INSERT (NULL, 519)
            page contents after applying: (NULL, 519), d(@r,36)
            923229 DELETE (@r, 36)
            page contents after applying: (NULL, 519)
            923233 DELETE (@r, 36) # duplicating the previous op
            page contents after applying: (NULL, 519)
            929226 DELETE_MARK (NULL, 519)
            page contents after applying: d(NULL, 519)
            931121 INSERT (NULL, 520)
            page contents after applying: d(NULL, 519), (NULL, 520)
            931299 DELETE (NULL, 519)
            page contents after applying: (NULL, 520)
            934641 DELETE_MARK (NULL, 520)
            page contents after applying: (NULL, 520)
            935534 DELETE (NULL, 520) # ERROR! page would become empty!
            page contents after applying: empty (not allowed!)
             
            945413 INSERT (NULL, 521)
            946546 DELETE_MARK (NULL, 521)
            946550 DELETE_MARK (NULL, 521) # duplicating the previous operation
            950322 INSERT (REPEAT (_ucs2 0x0003, 4), 521) # (NULL, 521) was never deleted
            

            For the wrongly buffered DELETE operation, we miscalculated that there would be at least 4 records in the page based on the change buffer contents, in addition to the 3 records that we cannot see because the page is not in the buffer pool. Finally, after all these records were buffered, we would merge the changes to the page and issue a message like this:

            2020-05-05 16:29:00 15 [ERROR] InnoDB: Unable to purge a record
            InnoDB: tuple DATA TUPLE: 2 fields;
             0: SQL NULL;
             1: len 4; hex 80000208; asc     ;;InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 32
             0: SQL NULL;
             1: len 4; hex 80000208; asc     ;;space 5 offset 1109 (1 records, index id 31)
            InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
            

            More anomalies that I did not check yet:

            • The above log is from ibuf_insert_low() calls, the number being the when counter of rr replay. There are 15 calls before the wrongly buffered DELETE operation. But, there only exist 14 buffered operations in the change buffer at that time.
            • The function ibuf_get_volume_buffered() processes the records backwards. Could this be a problem if some first buffered operations are not considered?
            • There is no underflow protection when min_n_recs is decremented. If we only saw previously buffered DELETE operations, we might wrongly consider the page to be nonempty.
            • The keys for DELETE operations will not be added to ibuf_get_volume_buffered_hash(). This might actually be OK.

            The primary fix will be to compute the hash only on the index record that would be applied to the secondary index leaf page in the table. While doing that, we could replace the slow ut_fold_binary() with ut_crc32().

            marko Marko Mäkelä added a comment - mleich provided a rr record trace of a related failure for a 10.5-based tree that I spent all day analyzing today. I think that I may have found the cause of this corruption. I do not know this for sure yet, and I would have to go back to an older version to reproduce the failure stated in the Description, to check it with the fix, once I have a fix. In short, the function ibuf_get_volume_buffered_hash() is calculating the hash value for garbage, and thus it will consider two equal keys to be distinct. The key is (NULL,520) in our case. len = ibuf_rec_get_size( rec, types, rec_get_n_fields_old(rec) - IBUF_REC_FIELD_USER, FALSE, comp); fold = ut_fold_binary(data, len); For the two occurrences of (NULL,520), we got len=510 even though the payload ( 0x80, 0x0, 0x2, 0x8 ) seems to be in the first 4 bytes. The more complete scenario was like this: SET @r=REPEAT(_ucs2 0x0008, 152);   Secondary index page contents at eviction: (@r,34),(@r,35),(@r,36)   Buffered changes: 866171 INSERT (NULL, 518) page contents after applying: (NULL,518),(@r,34),(@r,35),(@r,36) 885859 DELETE_MARK (@r, 34) page contents after applying: (NULL,518),d(@r,34),(@r,35),(@r,36) 886807 DELETE (@r, 34) page contents after applying: (NULL,518),(@r,35),(@r,36) 890308 DELETE_MARK (NULL, 518) page contents after applying: d(NULL,518),(@r,35),(@r,36) 890961 DELETE (NULL, 518) page contents after applying: (@r,35),(@r,36) 903869 DELETE_MARK (@r, 35) page contents after applying: d(@r,35),(@r,36) 904861 DELETE (@r, 35) page contents after applying: (@r,36) 912808 DELETE_MARK (@r, 36) page contents after applying: d(@r,36) 922143 INSERT (NULL, 519) page contents after applying: (NULL, 519), d(@r,36) 923229 DELETE (@r, 36) page contents after applying: (NULL, 519) 923233 DELETE (@r, 36) # duplicating the previous op page contents after applying: (NULL, 519) 929226 DELETE_MARK (NULL, 519) page contents after applying: d(NULL, 519) 931121 INSERT (NULL, 520) page contents after applying: d(NULL, 519), (NULL, 520) 931299 DELETE (NULL, 519) page contents after applying: (NULL, 520) 934641 DELETE_MARK (NULL, 520) page contents after applying: (NULL, 520) 935534 DELETE (NULL, 520) # ERROR! page would become empty! page contents after applying: empty (not allowed!)   945413 INSERT (NULL, 521) 946546 DELETE_MARK (NULL, 521) 946550 DELETE_MARK (NULL, 521) # duplicating the previous operation 950322 INSERT (REPEAT (_ucs2 0x0003, 4), 521) # (NULL, 521) was never deleted For the wrongly buffered DELETE operation, we miscalculated that there would be at least 4 records in the page based on the change buffer contents, in addition to the 3 records that we cannot see because the page is not in the buffer pool. Finally, after all these records were buffered, we would merge the changes to the page and issue a message like this: 2020-05-05 16:29:00 15 [ERROR] InnoDB: Unable to purge a record InnoDB: tuple DATA TUPLE: 2 fields; 0: SQL NULL; 1: len 4; hex 80000208; asc ;;InnoDB: record PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: SQL NULL; 1: len 4; hex 80000208; asc ;;space 5 offset 1109 (1 records, index id 31) InnoDB: Submit a detailed bug report to https://jira.mariadb.org/ More anomalies that I did not check yet: The above log is from ibuf_insert_low() calls, the number being the when counter of rr replay . There are 15 calls before the wrongly buffered DELETE operation. But, there only exist 14 buffered operations in the change buffer at that time. The function ibuf_get_volume_buffered() processes the records backwards. Could this be a problem if some first buffered operations are not considered? There is no underflow protection when min_n_recs is decremented. If we only saw previously buffered DELETE operations, we might wrongly consider the page to be nonempty. The keys for DELETE operations will not be added to ibuf_get_volume_buffered_hash() . This might actually be OK. The primary fix will be to compute the hash only on the index record that would be applied to the secondary index leaf page in the table. While doing that, we could replace the slow ut_fold_binary() with ut_crc32() .
            marko Marko Mäkelä made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            Labels corruption upstream-5.5
            marko Marko Mäkelä made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            marko Marko Mäkelä added a comment - - edited

            The ‘extra’ call to ibuf_insert_low() is explained by the following:

            	err = ibuf_insert_low(BTR_MODIFY_PREV, op, no_counter,
            			      entry, entry_size,
            			      index, page_id, zip_size, thr);
            	if (err == DB_FAIL) {
            		err = ibuf_insert_low(BTR_MODIFY_TREE | BTR_LATCH_FOR_INSERT,
            				      op, no_counter, entry, entry_size,
            				      index, page_id, zip_size, thr);
            	}
            

            That is, no duplicate record was inserted into the change buffer for either the DELETE (@r, 36) or the DELETE_MARK (NULL, 521). Only 14 records were present when the DELETE (NULL, 520) was wrongly buffered.

            In ibuf_get_volume_buffered() we check also the preceding change buffer page. This worries me a little, because normally, a backward B-tree scan in InnoDB could lead to a deadlock of threads, and we normally save the cursor position and search for the preceding record to move to the preceding page. Only if the caller ibuf_insert_low() was invoked with mode = BTR_MODIFY_TREE | BTR_LATCH_FOR_INSERT would there likely be no concern, because we would be holding ibuf_mutex and ibuf_pessimistic_insert_mutex. However, that call has been there since the inclusion of InnoDB in MySQL. If I remember correctly the discussions between 2005 and 2007, going back one more page could lead to deadlocks. So, I guess that we are fine with regard to deadlocks here. (Also, I am not aware of any actual deadlocks in this code.)

            The following hash function ought to work. False positives are not a correctness problem. For example, ('fu','bar'), (NULL,'fubar'),('fub','ar') would be treated as equal, and we would increment min_n_rec by at most 1 for those 3 distinct keys.

            /** Determine if a change buffer record has been encountered already.
            @param rec   change buffer record in the MySQL 5.5 format
            @param hash  hash table of encountered records
            @param size  number of elements in hash
            @retval true if a distinct record
            @retval false if this may be duplicating an earlier record */
            static bool ibuf_get_volume_buffered_hash(const rec_t *rec, ulint *hash,
                                                      ulint size)
            {
              ut_ad(rec_get_n_fields_old(rec) > IBUF_REC_FIELD_USER);
              const ulint start= rec_get_field_start_offs(rec, IBUF_REC_FIELD_USER);
              const ulint len= rec_get_data_size_old(rec) - start;
              const uint32_t fold= ut_crc32(rec + start, len);
              hash+= (fold / (CHAR_BIT * sizeof *hash)) % size;
              ulint bitmask= static_cast<ulint>(1) << (fold % (CHAR_BIT * sizeof(*hash)));
             
              if (*hash & bitmask)
                return false;
             
              /* We have not seen this record yet. Remember it. */
              *hash|= bitmask;
              return true;
            }
            

            The comparison min_n_recs < 2 in ibuf_insert_low() actually does check for underflow, because the variable is a signed type (lint, not ulint).

            With the corrected hash function, the estimate of min_n_recs looks conservative to me, even if there were multiple change buffer pages of buffered changes and the estimate did not consider them all. In the worst case, we missed many preceding pages filled with IBUF_DELETE_OP records that had been buffered for the user page earlier. But, it would have been an error to buffer those operations if the records did not exist in the first place. So, it is OK to ignore any preceding pages of buffered operations and start with a conservative estimate that there exist at least min_n_recs=0 records in the page.

            Update: I tried 10.2 with the following patch:

            diff --git a/storage/innobase/include/page0page.ic b/storage/innobase/include/page0page.ic
            index 4efb92f7182..276f85df69a 100644
            --- a/storage/innobase/include/page0page.ic
            +++ b/storage/innobase/include/page0page.ic
            @@ -173,9 +173,7 @@ page_header_set_field(
             {
             	ut_ad(page);
             	ut_ad(field <= PAGE_N_RECS);
            -#if 0 /* FIXME: MDEV-19344 hits this */
             	ut_ad(field != PAGE_N_RECS || val);
            -#endif
             	ut_ad(field == PAGE_N_HEAP || val < srv_page_size);
             	ut_ad(field != PAGE_N_HEAP || (val & 0x7fff) < srv_page_size);
             
            

            but the test innodb.innodb-change-buffer-recovery did not fail in 70 runs even without my fix. I will try harder, possibly with an older version, so that I can test my fix.

            marko Marko Mäkelä added a comment - - edited The ‘extra’ call to ibuf_insert_low() is explained by the following: err = ibuf_insert_low(BTR_MODIFY_PREV, op, no_counter, entry, entry_size, index, page_id, zip_size, thr); if (err == DB_FAIL) { err = ibuf_insert_low(BTR_MODIFY_TREE | BTR_LATCH_FOR_INSERT, op, no_counter, entry, entry_size, index, page_id, zip_size, thr); } That is, no duplicate record was inserted into the change buffer for either the DELETE (@r, 36) or the DELETE_MARK (NULL, 521) . Only 14 records were present when the DELETE (NULL, 520) was wrongly buffered. In ibuf_get_volume_buffered() we check also the preceding change buffer page. This worries me a little, because normally, a backward B-tree scan in InnoDB could lead to a deadlock of threads, and we normally save the cursor position and search for the preceding record to move to the preceding page. Only if the caller ibuf_insert_low() was invoked with mode = BTR_MODIFY_TREE | BTR_LATCH_FOR_INSERT would there likely be no concern, because we would be holding ibuf_mutex and ibuf_pessimistic_insert_mutex . However, that call has been there since the inclusion of InnoDB in MySQL . If I remember correctly the discussions between 2005 and 2007, going back one more page could lead to deadlocks. So, I guess that we are fine with regard to deadlocks here. (Also, I am not aware of any actual deadlocks in this code.) The following hash function ought to work. False positives are not a correctness problem. For example, ('fu','bar'), (NULL,'fubar'),('fub','ar') would be treated as equal, and we would increment min_n_rec by at most 1 for those 3 distinct keys. /** Determine if a change buffer record has been encountered already. @param rec change buffer record in the MySQL 5.5 format @param hash hash table of encountered records @param size number of elements in hash @retval true if a distinct record @retval false if this may be duplicating an earlier record */ static bool ibuf_get_volume_buffered_hash( const rec_t *rec, ulint *hash, ulint size) { ut_ad(rec_get_n_fields_old(rec) > IBUF_REC_FIELD_USER); const ulint start= rec_get_field_start_offs(rec, IBUF_REC_FIELD_USER); const ulint len= rec_get_data_size_old(rec) - start; const uint32_t fold= ut_crc32(rec + start, len); hash+= (fold / (CHAR_BIT * sizeof *hash)) % size; ulint bitmask= static_cast <ulint>(1) << (fold % (CHAR_BIT * sizeof (*hash)));   if (*hash & bitmask) return false ;   /* We have not seen this record yet. Remember it. */ *hash|= bitmask; return true ; } The comparison min_n_recs < 2 in ibuf_insert_low() actually does check for underflow, because the variable is a signed type ( lint , not ulint ). With the corrected hash function, the estimate of min_n_recs looks conservative to me, even if there were multiple change buffer pages of buffered changes and the estimate did not consider them all. In the worst case, we missed many preceding pages filled with IBUF_DELETE_OP records that had been buffered for the user page earlier. But, it would have been an error to buffer those operations if the records did not exist in the first place. So, it is OK to ignore any preceding pages of buffered operations and start with a conservative estimate that there exist at least min_n_recs=0 records in the page. Update: I tried 10.2 with the following patch: diff --git a/storage/innobase/include/page0page.ic b/storage/innobase/include/page0page.ic index 4efb92f7182..276f85df69a 100644 --- a/storage/innobase/include/page0page.ic +++ b/storage/innobase/include/page0page.ic @@ -173,9 +173,7 @@ page_header_set_field( { ut_ad(page); ut_ad(field <= PAGE_N_RECS); -#if 0 /* FIXME: MDEV-19344 hits this */ ut_ad(field != PAGE_N_RECS || val); -#endif ut_ad(field == PAGE_N_HEAP || val < srv_page_size); ut_ad(field != PAGE_N_HEAP || (val & 0x7fff) < srv_page_size); but the test innodb.innodb-change-buffer-recovery did not fail in 70 runs even without my fix. I will try harder, possibly with an older version, so that I can test my fix.

            So far, I have been unable to repeat the assertion failure on 10.2. It looks like the result difference was a caused by the test not removing the file my_restart.err between runs, causing a failure on ./mtr --repeat. That test problem was fixed in 10.2.28, 10.3.19, 10.4.10, 10.5.0.

            marko Marko Mäkelä added a comment - So far, I have been unable to repeat the assertion failure on 10.2. It looks like the result difference was a caused by the test not removing the file my_restart.err between runs, causing a failure on ./mtr --repeat . That test problem was fixed in 10.2.28, 10.3.19, 10.4.10, 10.5.0 .

            The rr trace mentioned above belongs to some test run on a 10.5 develoment tree where the server aborts with (snip from server error log slightly edited)
            [ERROR] InnoDB: Unable to purge a record
            mysqld: /home/mleich/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3994: void ibuf_delete(const dtuple_t*, buf_block_t*, dict_index_t*, mtr_t*): Assertion `0' failed.
            [ERROR] mysqld got signal 6 ; 
            

            mleich Matthias Leich added a comment - The rr trace mentioned above belongs to some test run on a 10.5 develoment tree where the server aborts with (snip from server error log slightly edited) [ERROR] InnoDB: Unable to purge a record mysqld: /home/mleich/bb-10.5-marko/storage/innobase/ibuf/ibuf0ibuf.cc:3994: void ibuf_delete(const dtuple_t*, buf_block_t*, dict_index_t*, mtr_t*): Assertion `0' failed. [ERROR] mysqld got signal 6 ;

            mleich reported the assertion on crash recovery. It turns out that the assertion was missing || recv_recovery_is_on() at the end. The reason for this is the following condition, which has been removed in 10.5 (MDEV-12353):

            	if (page_get_n_recs(page) == 1 && !recv_recovery_is_on()) {
            		/* Empty the page, unless we are applying the redo log
            		during crash recovery. During normal operation, the
            		page_create_empty() gets logged as one of MLOG_PAGE_CREATE,
            		MLOG_COMP_PAGE_CREATE, MLOG_ZIP_PAGE_COMPRESS. */
            		ut_ad(page_is_leaf(page));
            		/* Usually, this should be the root page,
            		and the whole index tree should become empty.
            		However, this could also be a call in
            		btr_cur_pessimistic_update() to delete the only
            		record in the page and to insert another one. */
            		page_cur_move_to_next(cursor);
            		ut_ad(page_cur_is_after_last(cursor));
            		page_create_empty(page_cur_get_block(cursor),
            				  const_cast<dict_index_t*>(index), mtr);
            		return;
            	}
            

            Only during crash recovery, we can decrement PAGE_N_RECS from 1 to 0. In all other cases, we would invoke page_create_empty(). The assertion failed on the change buffer tree, because the change buffer was emptied during crash recovery.

            marko Marko Mäkelä added a comment - mleich reported the assertion on crash recovery. It turns out that the assertion was missing || recv_recovery_is_on() at the end. The reason for this is the following condition, which has been removed in 10.5 ( MDEV-12353 ): if (page_get_n_recs(page) == 1 && !recv_recovery_is_on()) { /* Empty the page, unless we are applying the redo log during crash recovery. During normal operation, the page_create_empty() gets logged as one of MLOG_PAGE_CREATE, MLOG_COMP_PAGE_CREATE, MLOG_ZIP_PAGE_COMPRESS. */ ut_ad(page_is_leaf(page)); /* Usually, this should be the root page, and the whole index tree should become empty. However, this could also be a call in btr_cur_pessimistic_update() to delete the only record in the page and to insert another one. */ page_cur_move_to_next(cursor); ut_ad(page_cur_is_after_last(cursor)); page_create_empty(page_cur_get_block(cursor), const_cast <dict_index_t*>(index), mtr); return ; } Only during crash recovery, we can decrement PAGE_N_RECS from 1 to 0. In all other cases, we would invoke page_create_empty() . The assertion failed on the change buffer tree, because the change buffer was emptied during crash recovery.
            marko Marko Mäkelä made changes -

            Summary: The test was not tolerant of ./mtr --repeat=2 and the assertion that was added in a hope to catch the bug that I now filed and fixed as MDEV-22497 was bogus. I removed the assertion that was already bracketed inside #if 0.

            marko Marko Mäkelä added a comment - Summary: The test was not tolerant of ./mtr --repeat=2 and the assertion that was added in a hope to catch the bug that I now filed and fixed as MDEV-22497 was bogus. I removed the assertion that was already bracketed inside #if 0 .
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2.33 [ 24307 ]
            Fix Version/s 10.3.24 [ 24306 ]
            Fix Version/s 10.4.14 [ 24305 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            marko Marko Mäkelä made changes -
            Affects Version/s 10.5 [ 23123 ]
            Labels corruption upstream-5.5 corruption not-10.5 upstream-5.5
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 96379 ] MariaDB v4 [ 156122 ]

            People

              marko Marko Mäkelä
              kevg Eugene Kosov (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.