Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-20775

InnoDB: Failing assertion: !page_zip || page_zip_validate(page_zip, page, index)

Details

    Description

      The assert was observed on the trees
      -    10.2 commit 27664ef29d97be2973982aedcc0c8c90d18b20a4 2019-10-08
          and
      -     bb-10.2-merge commit d9f224233497ce5e3706fe6539a2b711b491af1d 2019-10-08
      Hitting that assert is very rare (one per ~ 700 RQG test runs).
       
      The snippets of logs belong to the test run on bb-10.2-merge which contains
      - a fix for https://jira.mariadb.org/browse/MDEV-19335 (fix is currently in review)
      - some merge from 10.1
       
      10.2.28-MariaDB-debug-log
      2019-10-08 13:29:21 0x7f3bbc51b700  InnoDB: Assertion failure in file storage/innobase/btr/btr0cur.cc line 1434
      InnoDB: Failing assertion: !page_zip || page_zip_validate(page_zip, page, index)
      ...
      Query (0x7f3b140104e0): UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */
      Connection ID (thread ID): 20
      Status: NOT_KILLED
       
      #3  <signal handler called>
      #4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:58
      #5  0x00007f3bc40ad37a in __GI_abort () at abort.c:89
      #6  0x000055c433c9e014 in ut_dbg_assertion_failed (expr=0x55c4342cfe10 "!page_zip || page_zip_validate(page_zip, page, index)", file=0x55c4342cf0a0 "storage/innobase/btr/btr0cur.cc", line=1434) at storage/innobase/ut/ut0dbg.cc:60
      #7  0x000055c433cc8e39 in btr_cur_search_to_nth_level (index=0x7f3b0c042d18, level=0, tuple=0x7f3b14060058, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7f3b1405fe68, has_search_latch=0, file=0x55c434282818 "storage/innobase/row/row0sel.cc", line=4724, mtr=0x7f3bbc5188d0, autoinc=0) at storage/innobase/btr/btr0cur.cc:1434
      #8  0x000055c433bfb479 in btr_pcur_open_with_no_init_func (index=0x7f3b0c042d18, tuple=0x7f3b14060058, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7f3b1405fe68, has_search_latch=0, file=0x55c434282818 "storage/innobase/row/row0sel.cc", line=4724, mtr=0x7f3bbc5188d0) at storage/innobase/include/btr0pcur.ic:527
      #9  0x000055c433c05f52 in row_search_mvcc (buf=0x7f3b14022fe8 "\377", mode=PAGE_CUR_GE, prebuilt=0x7f3b1405fc98, match_mode=1, direction=0) at storage/innobase/row/row0sel.cc:4722
      #10 0x000055c433a9488f in ha_innobase::index_read (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key_ptr=0x7f3b140556d0 "~\v", key_len=4, find_flag=HA_READ_KEY_EXACT) at storage/innobase/handler/ha_innodb.cc:9471
      #11 0x000055c433890b4c in handler::index_read_map (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key=0x7f3b140556d0 "~\v", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at sql/handler.h:3136
      #12 0x000055c433886d8a in handler::ha_index_read_map (this=0x7f3b14029130, buf=0x7f3b14022fe8 "\377", key=0x7f3b140556d0 "~\v", keypart_map=1, find_flag=HA_READ_KEY_EXACT) at sql/handler.cc:2711
      #13 0x000055c43388d962 in handler::read_range_first (this=0x7f3b14029130, start_key=0x7f3b14029218, end_key=0x7f3b14029238, eq_range_arg=true, sorted=true) at sql/handler.cc:5458
      #14 0x000055c43379cbc6 in handler::multi_range_read_next (this=0x7f3b14029130, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:298
      #15 0x000055c43379ccff in Mrr_simple_index_reader::get_next (this=0x7f3b140296a0, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:335
      #16 0x000055c43379f840 in DsMrr_impl::dsmrr_next (this=0x7f3b14029560, range_info=0x7f3bbc519110) at sql/multi_range_read.cc:1411
      #17 0x000055c433aa984e in ha_innobase::multi_range_read_next (this=0x7f3b14029130, range_info=0x7f3bbc519110) at storage/innobase/handler/ha_innodb.cc:21332
      #18 0x000055c4339df8d2 in QUICK_RANGE_SELECT::get_next (this=0x7f3b1401ada0) at sql/opt_range.cc:11473
      #19 0x000055c4339f0882 in rr_quick (info=0x7f3bbc519350) at sql/records.cc:365
      #20 0x000055c4336f219b in mysql_update (thd=0x7f3b14000a98, table_list=0x7f3b14010640, fields=..., values=..., conds=0x7f3b14010fe8, order_num=0, order=0x0, limit=18446744073709551615, handle_duplicates=DUP_ERROR, ignore=false, found_return=0x7f3bbc5198c0, updated_return=0x7f3bbc519970) at sql/sql_update.cc:749
      #21 0x000055c433602a1f in mysql_execute_command (thd=0x7f3b14000a98) at sql/sql_parse.cc:4014
      #22 0x000055c43360e8a0 in mysql_parse (thd=0x7f3b14000a98, rawbuf=0x7f3b140104e0 "UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */", length=82, parser_state=0x7f3bbc51a250, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:7759
      #23 0x000055c4335fcba9 in dispatch_command (command=COM_QUERY, thd=0x7f3b14000a98, packet=0x7f3b140081a9 "UPDATE `t7` SET `pad` = 'o' WHERE `id` = 2942 /* E_R Thread2 QNO 1599 CON_ID 20 */ ", packet_length=83, is_com_multi=false, is_next_command=false) at sql/sql_parse.cc:1831
      #24 0x000055c4335fb4fd in do_command (thd=0x7f3b14000a98) at sql/sql_parse.cc:1384
      #25 0x000055c433750957 in do_handle_one_connection (connect=0x55c4372aed98) at sql/sql_connect.cc:1336
      #26 0x000055c4337506c2 in handle_one_connection (arg=0x55c4372aed98) at sql/sql_connect.cc:1241
      #27 0x00007f3bc4cec6da in start_thread (arg=0x7f3bbc51b700) at pthread_create.c:456
      

      Attachments

        1. simp_page_zip_validate_rqg.log
          312 kB
          Matthias Leich
        2. simp_page_zip_validate.cfg
          52 kB
          Matthias Leich
        3. simp_page_zip_validate.yy
          0.6 kB
          Matthias Leich
        4. simp_page_zip_validate.zz
          1 kB
          Matthias Leich

        Issue Links

          Activity

            I have hit that assert on 10.5 too.

            mleich Matthias Leich added a comment - I have hit that assert on 10.5 too.
            mleich Matthias Leich added a comment - - edited

            simp_page_zip_validate.* RQG files for replaying the problem
            simp_page_zip_validate_rqg.log    Protocol (includes backtrace) of my RQG run against
                                                                                   10.5 commit 8576a7bacfaa4f7ac3ec86cff436e0d6e7bb3a7d
             
            Only two concurrent sessions are required.
            One runs random
            - (frequent) DELETE FROM oltp3 LIMIT 2 |
            - (not often) FLUSH TABLE oltp3 FOR EXPORT
            - (often) UNLOCK TABLES
            The second runs random just
               SELECT SUM(k) FROM oltp3 WHERE id BETWEEN _smallint_unsigned AND _smallint_unsigned ;
             
            After short time (~ 50s) InnoDB reports into the server error that there is a deadlock
            and than DB server aborts with a core.
             
            The uploaded test does not replay the assert on
            - 10.2 commit 7d313214641241e8ce839d9de01529c2335c620f 2020-01-10
            - 10.3 commit d531b4ee3a9bcd89a2fa6b49a2207eaf966f53e3 2020-01-12
            - 10.4 commit 2d4b6571ecf15a4382702d26df9439bc5a1005de 2020-01-14
            In case I memorize correct than the assert in the bug title occured at
            several places in InnoDB code.
            During the automatic+manual RQG grammar the variety of SQL shrinks
            and by that also the places in InnoDB code where the assert was hit.
             
            Please do not work on this bug.
            Caused by some mistake in the setup of the simplifier the uploaded test
            replays some other bug.
            
            

            mleich Matthias Leich added a comment - - edited simp_page_zip_validate.* RQG files for replaying the problem simp_page_zip_validate_rqg.log Protocol (includes backtrace) of my RQG run against 10.5 commit 8576a7bacfaa4f7ac3ec86cff436e0d6e7bb3a7d   Only two concurrent sessions are required. One runs random - (frequent) DELETE FROM oltp3 LIMIT 2 | - (not often) FLUSH TABLE oltp3 FOR EXPORT - (often) UNLOCK TABLES The second runs random just SELECT SUM(k) FROM oltp3 WHERE id BETWEEN _smallint_unsigned AND _smallint_unsigned ;   After short time (~ 50s) InnoDB reports into the server error that there is a deadlock and than DB server aborts with a core.   The uploaded test does not replay the assert on - 10.2 commit 7d313214641241e8ce839d9de01529c2335c620f 2020-01-10 - 10.3 commit d531b4ee3a9bcd89a2fa6b49a2207eaf966f53e3 2020-01-12 - 10.4 commit 2d4b6571ecf15a4382702d26df9439bc5a1005de 2020-01-14 In case I memorize correct than the assert in the bug title occured at several places in InnoDB code. During the automatic+manual RQG grammar the variety of SQL shrinks and by that also the places in InnoDB code where the assert was hit.   Please do not work on this bug. Caused by some mistake in the setup of the simplifier the uploaded test replays some other bug.

            When testing MDEV-12353, I accidentally ran into a failure like this. I believe that there is a race condition between page reads that call page_zip_validate() (with page S-latch) and writes that update PAGE_ROOT_AUTO_INC (with page SX-latch). That writer code was introduced in MDEV-6076.

            The following patch will ignore the PAGE_ROOT_AUTO_INC when comparing the two clustered index root page images (page number 3):

            diff --git a/storage/innobase/page/page0zip.cc b/storage/innobase/page/page0zip.cc
            index 3211c8fad83..40e3909e2f2 100644
            --- a/storage/innobase/page/page0zip.cc
            +++ b/storage/innobase/page/page0zip.cc
            @@ -3310,7 +3310,13 @@ page_zip_validate_low(
             		   FIL_PAGE_LSN - FIL_PAGE_PREV)
             	    || memcmp(page_zip->data + FIL_PAGE_TYPE, page + FIL_PAGE_TYPE, 2)
             	    || memcmp(page_zip->data + FIL_PAGE_DATA, page + FIL_PAGE_DATA,
            -		      PAGE_DATA - FIL_PAGE_DATA)) {
            +		      PAGE_ROOT_AUTO_INC)
            +	    || (page_get_page_no(page) != 3
            +		&& memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC],
            +			  &page[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC], 8))
            +	    || memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END],
            +		      &page[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END],
            +		      PAGE_DATA - FIL_PAGE_DATA - PAGE_HEADER_PRIV_END)) {
             		page_zip_fail(("page_zip_validate: page header\n"));
             		page_zip_hexdump(page_zip, sizeof *page_zip);
             		page_zip_hexdump(page_zip->data, page_zip_get_size(page_zip));
            

            I successfully tested it on a 10.5-based MDEV-12353 branch with the following test, adapted from innodb_zip.wl5522_zip:

            -- source include/have_innodb.inc
             
            set @save_limit= @@innodb_limit_optimistic_insert_debug;
             
            set global innodb_limit_optimistic_insert_debug=2;
             
            CREATE TABLE t1
            (a INT AUTO_INCREMENT PRIMARY KEY,
             b char(22),
             c varchar(255),
             KEY (b))
            ENGINE = InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;
             
            insert into t1 (b, c) values ('Apa', 'Filler........'),
            ('Banan', 'Filler........'), ('Cavalry', '..asdasdfaeraf'),
            ('Devotion', 'asdfuihknaskdf'), ('Evolution', 'lsjndofiabsoibeg');
             
            INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
            INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
            INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
            INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
            INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
            INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
            INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;
             
            DROP TABLE t1;
             
            set global innodb_limit_optimistic_insert_debug=@save_limit;
            

            ./mtr --mysqld=--innodb-page-size=4k --parallel=auto --repeat=1000 innodb_zip.wz{,,,,,,,,,,,}
            

            On my system, the test would typically crash on the first round already. With the fix, it already completed 20 or 21 parallel rounds (of 14 seconds each) without failures.

            mleich, do you still see the failures with this fix? And was it always related to a table that contains an AUTO_INCREMENT column?

            marko Marko Mäkelä added a comment - When testing MDEV-12353 , I accidentally ran into a failure like this. I believe that there is a race condition between page reads that call page_zip_validate() (with page S-latch) and writes that update PAGE_ROOT_AUTO_INC (with page SX-latch). That writer code was introduced in MDEV-6076 . The following patch will ignore the PAGE_ROOT_AUTO_INC when comparing the two clustered index root page images (page number 3): diff --git a/storage/innobase/page/page0zip.cc b/storage/innobase/page/page0zip.cc index 3211c8fad83..40e3909e2f2 100644 --- a/storage/innobase/page/page0zip.cc +++ b/storage/innobase/page/page0zip.cc @@ -3310,7 +3310,13 @@ page_zip_validate_low( FIL_PAGE_LSN - FIL_PAGE_PREV) || memcmp(page_zip->data + FIL_PAGE_TYPE, page + FIL_PAGE_TYPE, 2) || memcmp(page_zip->data + FIL_PAGE_DATA, page + FIL_PAGE_DATA, - PAGE_DATA - FIL_PAGE_DATA)) { + PAGE_ROOT_AUTO_INC) + || (page_get_page_no(page) != 3 + && memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC], + &page[FIL_PAGE_DATA + PAGE_ROOT_AUTO_INC], 8)) + || memcmp(&page_zip->data[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END], + &page[FIL_PAGE_DATA + PAGE_HEADER_PRIV_END], + PAGE_DATA - FIL_PAGE_DATA - PAGE_HEADER_PRIV_END)) { page_zip_fail(("page_zip_validate: page header\n")); page_zip_hexdump(page_zip, sizeof *page_zip); page_zip_hexdump(page_zip->data, page_zip_get_size(page_zip)); I successfully tested it on a 10.5-based MDEV-12353 branch with the following test, adapted from innodb_zip.wl5522_zip : -- source include/have_innodb.inc   set @save_limit= @@innodb_limit_optimistic_insert_debug;   set global innodb_limit_optimistic_insert_debug=2;   CREATE TABLE t1 (a INT AUTO_INCREMENT PRIMARY KEY , b char (22), c varchar (255), KEY (b)) ENGINE = InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=1;   insert into t1 (b, c) values ( 'Apa' , 'Filler........' ), ( 'Banan' , 'Filler........' ), ( 'Cavalry' , '..asdasdfaeraf' ), ( 'Devotion' , 'asdfuihknaskdf' ), ( 'Evolution' , 'lsjndofiabsoibeg' );   INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a; INSERT INTO t1 (b, c) SELECT b,c FROM t1 ORDER BY a;   DROP TABLE t1;   set global innodb_limit_optimistic_insert_debug=@save_limit; ./mtr --mysqld=--innodb-page-size=4k --parallel=auto --repeat=1000 innodb_zip.wz{,,,,,,,,,,,} On my system, the test would typically crash on the first round already. With the fix, it already completed 20 or 21 parallel rounds (of 14 seconds each) without failures. mleich , do you still see the failures with this fix? And was it always related to a table that contains an AUTO_INCREMENT column?

            People

              thiru Thirunarayanan Balathandayuthapani
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.