Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22611

Assertion btr_search_enabled failed during buffer pool resizing

Details

    Description

      After I merged the MDEV-22456 changes to 10.4, I observed a crash during InnoDB buffer pool resizing.

      It is unclear whether this is a true regression of MDEV-22456. Here are two examples:

      10.4 66f1e288a12c96b1306e204cca37ffee09e97a64

      CURRENT_TEST: innodb.innodb_buffer_pool_resize_with_chunks
      mysqltest: In included file "./include/wait_condition.inc": 
      included from /mariadb/10.4/mysql-test/suite/innodb/t/innodb_buffer_pool_resize_with_chunks.test at line 45:
      At line 54: query 'let $success= `$wait_condition`' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 0...
      2020-05-18 12:07:31 9 [Note] InnoDB: Resizing buffer pool from 838860 (new size: 16777216 bytes)
      2020-05-18 12:07:31 0 [Note] InnoDB: Disabling adaptive hash index.
      2020-05-18 12:07:31 0 [Note] InnoDB: disabled adaptive hash index.
      2020-05-18 12:07:31 0 [Note] InnoDB: Withdrawing blocks to be shrunken.
      2020-05-18 12:07:31 0 [Note] InnoDB: Latching whole of buffer pool.
      2020-05-18 12:07:31 0 [Note] InnoDB: buffer pool 0 : resizing with chunks 4 to 8.
      mysqld: /mariadb/10.4/storage/innobase/ha/ha0ha.cc:233: ulint ha_insert_for_fold_func(hash_table_t *, ulint, buf_block_t *, const rec_t *): Assertion `btr_search_enabled' failed.
      200518 12:07:31 [ERROR] mysqld got signal 6 ;
      #7  0x00005636497fd0a9 in ha_insert_for_fold_func (table=0x56364cb75200, fold=3381765660571125693, block=0x7fda948f1d98, data=0x7fda94a3f529 "\200\004x\250") at /mariadb/10.4/storage/innobase/ha/ha0ha.cc:233
      #8  0x0000563649710c9a in btr_search_update_hash_ref (info=<optimized out>, block=<optimized out>, cursor=<optimized out>) at /mariadb/10.4/storage/innobase/btr/btr0sea.cc:722
      #9  btr_search_info_update_slow (info=<optimized out>, cursor=<optimized out>) at /mariadb/10.4/storage/innobase/btr/btr0sea.cc:1580
      #10 0x00005636496ddbfd in btr_search_info_update (index=0x7fda4011b740, cursor=0x56364ccc80c8) at /mariadb/10.4/storage/innobase/include/btr0sea.ic:84
      #11 btr_cur_search_to_nth_level_func (index=<optimized out>, level=0, tuple=<optimized out>, mode=<optimized out>, latch_mode=2, cursor=<optimized out>, ahi_latch=0x0, file=0x563649c74ef5 "/mariadb/10.4/storage/innobase/row/row0row.cc", line=1212, mtr=0x7fda80ff8298, autoinc=0) at /mariadb/10.4/storage/innobase/btr/btr0cur.cc:2477
      #12 0x000056364960bf76 in btr_pcur_open_low (index=<optimized out>, level=0, tuple=0x7fda687d0270, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x56364ccc80c8, file=0x0, line=1212, autoinc=0, mtr=0x8) at /mariadb/10.4/storage/innobase/include/btr0pcur.ic:441
      #13 0x000056364960aa13 in row_search_on_row_ref (pcur=0x56364ccc80c8, mode=2, table=0x7fda40118660, ref=0x7fda687d0270, mtr=0x7fda80ff8298) at /mariadb/10.4/storage/innobase/row/row0row.cc:1212
      #14 0x00005636495f8e7b in row_purge_reposition_pcur (mode=2, node=0x56364ccc8020, mtr=0x7fda80ff8298) at /mariadb/10.4/storage/innobase/row/row0purge.cc:78
      #15 0x00005636495fc541 in row_purge_reset_trx_id (node=0x56364ccc8020, mtr=0x7fda80ff8298) at /mariadb/10.4/storage/innobase/row/row0purge.cc:798
      

      and another, from the same test run:

      10.4 66f1e288a12c96b1306e204cca37ffee09e97a64

      CURRENT_TEST: innodb.innodb_buffer_pool_resize
      mysqltest: In included file "./include/wait_condition.inc": 
      included from /mariadb/10.4/mysql-test/suite/innodb/t/innodb_buffer_pool_resize.test at line 58:
      At line 54: query 'let $success= `$wait_condition`' failed with wrong errno 2013: 'Lost connection to MySQL server during query', instead of 0...
      2020-05-18 12:07:47 0 [Note] InnoDB: buffer pool 0 : 1 chunks (1786 blocks) were freed.
      2020-05-18 12:07:47 0 [Note] InnoDB: Completed to resize buffer pool from 16777216 to 8388608.
      2020-05-18 12:07:47 0 [Note] InnoDB: Re-enabled adaptive hash index.
      2020-05-18 12:07:47 0 [Note] InnoDB: Completed resizing buffer pool at 200518 12:07:47.
      mysqld: /mariadb/10.4/storage/innobase/ha/ha0ha.cc:233: ulint ha_insert_for_fold_func(hash_table_t *, ulint, buf_block_t *, const rec_t *): Assertion `btr_search_enabled' failed.
      #7  0x0000560a71d540a9 in ha_insert_for_fold_func (table=0x560a743a7200, fold=3381765663485158439, block=0x7fd0496da8b0, data=0x7fd049adf5ab "\200\004~\b") at /mariadb/10.4/storage/innobase/ha/ha0ha.cc:233
      #8  0x0000560a71c67c9a in btr_search_update_hash_ref (info=<optimized out>, block=<optimized out>, cursor=<optimized out>) at /mariadb/10.4/storage/innobase/btr/btr0sea.cc:722
      #9  btr_search_info_update_slow (info=<optimized out>, cursor=<optimized out>) at /mariadb/10.4/storage/innobase/btr/btr0sea.cc:1580
      #10 0x0000560a71c34bfd in btr_search_info_update (index=0x7fcff01555b0, cursor=0x560a744797f8) at /mariadb/10.4/storage/innobase/include/btr0sea.ic:84
      #11 btr_cur_search_to_nth_level_func (index=<optimized out>, level=0, tuple=<optimized out>, mode=<optimized out>, latch_mode=2, cursor=<optimized out>, ahi_latch=0x0, file=0x560a721cbef5 "/mariadb/10.4/storage/innobase/row/row0row.cc", line=1212, mtr=0x7fd02dffa298, autoinc=0) at /mariadb/10.4/storage/innobase/btr/btr0cur.cc:2477
      #12 0x0000560a71b62f76 in btr_pcur_open_low (index=<optimized out>, level=0, tuple=0x7fd0247ceec0, mode=PAGE_CUR_LE, latch_mode=2, cursor=0x560a744797f8, file=0x0, line=1212, autoinc=0, mtr=0x8) at /mariadb/10.4/storage/innobase/include/btr0pcur.ic:441
      #13 0x0000560a71b61a13 in row_search_on_row_ref (pcur=0x560a744797f8, mode=2, table=0x7fcff01185d0, ref=0x7fd0247ceec0, mtr=0x7fd02dffa298) at /mariadb/10.4/storage/innobase/row/row0row.cc:1212
      #14 0x0000560a71b4fe7b in row_purge_reposition_pcur (mode=2, node=0x560a74479750, mtr=0x7fd02dffa298) at /mariadb/10.4/storage/innobase/row/row0purge.cc:78
      #15 0x0000560a71b53541 in row_purge_reset_trx_id (node=0x560a74479750, mtr=0x7fd02dffa298) at /mariadb/10.4/storage/innobase/row/row0purge.cc:798
      

      It looks like we should re-evaluate btr_search_enabled after acquiring the adaptive hash index latch.

      Attachments

        Issue Links

          Activity

            This is reasonably repeatable with

            ./mtr --parallel=auto --big-test --repeat=100 innodb.innodb_buffer_pool_resize_with_chunks innodb.innodb_buffer_pool_resize
            

            I checked my previous test log, and it looks like the server was restarted immediately before running either of these tests.

            marko Marko Mäkelä added a comment - This is reasonably repeatable with ./mtr --parallel=auto --big-test --repeat=100 innodb.innodb_buffer_pool_resize_with_chunks innodb.innodb_buffer_pool_resize I checked my previous test log, and it looks like the server was restarted immediately before running either of these tests.

            I confirm that this is a regression of MDEV-22456. Before that, we invoked btr_search_update_hash_ref() while the adaptive hash index latch was already being held.
            mleich, can you please test the adaptive hash index and the buffer pool resizing under RQG with the following patch on 10.4 or 10.2?

            diff --git a/storage/innobase/btr/btr0sea.cc b/storage/innobase/btr/btr0sea.cc
            index fbd3d5a4b1f..d09cbd8943d 100644
            --- a/storage/innobase/btr/btr0sea.cc
            +++ b/storage/innobase/btr/btr0sea.cc
            @@ -685,7 +685,7 @@ btr_search_update_hash_ref(
             
             	dict_index_t* index = block->index;
             
            -	if (!index) {
            +	if (!index || !info->n_hash_potential) {
             		return;
             	}
             
            @@ -695,7 +695,7 @@ btr_search_update_hash_ref(
             	rw_lock_t* const latch = btr_get_search_latch(index);
             	rw_lock_x_lock(latch);
             
            -	if ((info->n_hash_potential > 0)
            +	if (block->index
             	    && (block->curr_n_fields == info->n_fields)
             	    && (block->curr_n_bytes == info->n_bytes)
             	    && (block->curr_left_side == info->left_side)) {
            

            marko Marko Mäkelä added a comment - I confirm that this is a regression of MDEV-22456 . Before that, we invoked btr_search_update_hash_ref() while the adaptive hash index latch was already being held. mleich , can you please test the adaptive hash index and the buffer pool resizing under RQG with the following patch on 10.4 or 10.2? diff --git a/storage/innobase/btr/btr0sea.cc b/storage/innobase/btr/btr0sea.cc index fbd3d5a4b1f..d09cbd8943d 100644 --- a/storage/innobase/btr/btr0sea.cc +++ b/storage/innobase/btr/btr0sea.cc @@ -685,7 +685,7 @@ btr_search_update_hash_ref( dict_index_t* index = block->index; - if (!index) { + if (!index || !info->n_hash_potential) { return; } @@ -695,7 +695,7 @@ btr_search_update_hash_ref( rw_lock_t* const latch = btr_get_search_latch(index); rw_lock_x_lock(latch); - if ((info->n_hash_potential > 0) + if (block->index && (block->curr_n_fields == info->n_fields) && (block->curr_n_bytes == info->n_bytes) && (block->curr_left_side == info->left_side)) {

            I was able to replay that bug with RQG on 10.4 commit ea912d1605b463a16a0c186bf10e61c4af519439 compiled with debug.
            perl rqg.pl \ 
            --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
            --grammar=conf/mariadb/table_stress_innodb.yy \
            --gendata=conf/mariadb/table_stress.zz \
            --gendata_sql=conf/mariadb/table_stress.sql \
            --engine=Innodb \
            --reporters=Deadlock1,ErrorLog,Backtrace \
            --mysqld=--connect_timeout=60 \
            --mysqld=--net_read_timeout=30 \
            --mysqld=--net_write_timeout=60 \
            --mysqld=--loose-idle_readonly_transaction_timeout=0 \
            --mysqld=--loose-idle_transaction_timeout=0 \
            --mysqld=--loose-idle_write_transaction_timeout=0 \
            --mysqld=--interactive_timeout=28800 \
            --mysqld=--lock_wait_timeout=86400 \
            --mysqld=--innodb-lock-wait-timeout=50 \
            --mysqld=--loose-table_lock_wait_timeout=50 \
            --mysqld=--wait_timeout=28800 \
            --mysqld=--slave_net_timeout=60 \
            --mysqld=--loose-max-statement-time=30 \
            --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
            --mysqld=--log-output=none \
            --duration=300 \
            --seed=random \
            --sqltrace=MarkErrors \
            --threads=1 \
             --mysqld=--innodb_adaptive_hash_index=ON \
            --mysqld=--innodb_page_size=4K \
            --mysqld=--innodb-buffer-pool-size=8M \
            --no_mask \
            --workdir=<local settings> \
            --vardir=<local settings> \
            --mtr-build-thread=<local settings>3 \
            --basedir1=<local settings> \
            --basedir2=<local settings> \
            --script_debug=_nix_ \
            --rr=Server
            But only one of 282 tests hit that problem.
            

            mleich Matthias Leich added a comment - I was able to replay that bug with RQG on 10.4 commit ea912d1605b463a16a0c186bf10e61c4af519439 compiled with debug. perl rqg.pl \ --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \ --grammar=conf/mariadb/table_stress_innodb.yy \ --gendata=conf/mariadb/table_stress.zz \ --gendata_sql=conf/mariadb/table_stress.sql \ --engine=Innodb \ --reporters=Deadlock1,ErrorLog,Backtrace \ --mysqld=--connect_timeout=60 \ --mysqld=--net_read_timeout=30 \ --mysqld=--net_write_timeout=60 \ --mysqld=--loose-idle_readonly_transaction_timeout=0 \ --mysqld=--loose-idle_transaction_timeout=0 \ --mysqld=--loose-idle_write_transaction_timeout=0 \ --mysqld=--interactive_timeout=28800 \ --mysqld=--lock_wait_timeout=86400 \ --mysqld=--innodb-lock-wait-timeout=50 \ --mysqld=--loose-table_lock_wait_timeout=50 \ --mysqld=--wait_timeout=28800 \ --mysqld=--slave_net_timeout=60 \ --mysqld=--loose-max-statement-time=30 \ --mysqld=--loose-debug_assert_on_not_freed_memory=0 \ --mysqld=--log-output=none \ --duration=300 \ --seed=random \ --sqltrace=MarkErrors \ --threads=1 \ --mysqld=--innodb_adaptive_hash_index=ON \ --mysqld=--innodb_page_size=4K \ --mysqld=--innodb-buffer-pool-size=8M \ --no_mask \ --workdir=<local settings> \ --vardir=<local settings> \ --mtr-build-thread=<local settings>3 \ --basedir1=<local settings> \ --basedir2=<local settings> \ --script_debug=_nix_ \ --rr=Server But only one of 282 tests hit that problem.

            The bug did no more show up after applying the patch.

            mleich Matthias Leich added a comment - The bug did no more show up after applying the patch.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.