Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13918

Race condition between INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS and ALTER/DROP/TRUNCATE TABLE

Details

    Description

      elenst reported this in MDEV-13900:

      mysqld: /home/elenst/git/bb-10.3-marko-instant-column/storage/innobase/handler/handler0alter.cc:8295: bool commit_try_rebuild(Alter_inplace_info*, ha_innobase_inplace_ctx*, TABLE*, const TABLE*, trx_t*, const char*): Assertion `user_table->get_ref_count() == 1' failed.
      

      From the stack traces of all threads, I concluded that the culprit must be a race condition between a table-rebuilding ALTER TABLE (or OPTIMIZE TABLE) and INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS.

      This was broken in MySQL 5.7.10 and merged to MariaDB 10.2.2.

      The bug is that i_s_sys_tables_fill_table_stats() is incrementing the reference count of a table while not holding something that conflicts with a concurrent DDL operation (either shared dict_operation_lock or something that conflicts with MDL_EXCLUSIVE).

      In MySQL 5.6 and MariaDB 10.0/10.1, there is a different race condition: the table can be dropped while the function is accessing it.

      I believe that the correct fix would be to acquire shared dict_operation_lock before looking up the table, and releasing it after the table is no longer being used. There is no need to increment or decrement the reference count.

      Attachments

        Issue Links

          Activity

            This MTR test seems to reproduce the problem:

            --source include/have_innodb.inc
             
            CREATE TABLE IF NOT EXISTS t1 (f INT) ENGINE=InnoDB;
            INSERT INTO t1 () VALUES ();
             
            --connect (con1,localhost,root,,test)
            --let $conid= `SELECT CONNECTION_ID()`
            --delimiter $$
            CREATE PROCEDURE pr()
            BEGIN
              LOOP
                ALTER TABLE t1 ADD COLUMN IF NOT EXISTS c INT PRIMARY KEY;
                ALTER TABLE t1 DROP COLUMN IF EXISTS c;
              END LOOP;
            END $$
            --delimiter ;
            --send CALL pr()
             
            --connection default
            --let $run= 100000
             
            --disable_query_log
            --disable_result_log
            --echo # Running SELECTs from INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS
            while ($run)
            {
                SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS;
              --dec $run
            }
            --enable_query_log
            --enable_result_log
             
            # Cleanup
            --eval KILL $conid
            DROP TABLE t1;
            DROP PROCEDURE pr;
            

            If it doesn't, here is also the RQG way:

            git clone https://github.com/elenst/rqg --branch=experimental rqg-mdev13918
            cd rqg-mdev13918
            # To start the server and run the flow
            $ perl ./runall-new.pl --grammar=conf/mariadb/bugs/mdev13918.yy --threads=2 --skip-gendata --duration=300 --basedir=<your basedir> --vardir=<your vardir>
            # or, on a server already running on port 3306
            perl ./gentest.pl --grammar=conf/mariadb/bugs/mdev13918.yy --threads=2 --duration=600 --queries=100M --dsn="dbi:mysql:user=root:host=127.0.0.1:port=3306:database=test"
            

            10.2 a659291e85

            mysqld: /data/src/10.2/storage/innobase/handler/handler0alter.cc:7833: bool commit_try_rebuild(Alter_inplace_info*, ha_innobase_inplace_ctx*, TABLE*, const TABLE*, trx_t*, const char*): Assertion `user_table->get_ref_count() == 1' failed.
            171009 19:22:31 [ERROR] mysqld got signal 6 ;
             
            #7  0x00007fdbf3135ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
            #8  0x0000557d5e67ff33 in commit_try_rebuild (ha_alter_info=0x7fdbe4510650, ctx=0x7fdb900273f0, altered_table=0x7fdb900aade0, old_table=0x7fdb900a51f0, trx=0x7fdbed0a70d8, table_name=0x7fdb90034d8d "t1") at /data/src/10.2/storage/innobase/handler/handler0alter.cc:7833
            #9  0x0000557d5e67c8a9 in ha_innobase::commit_inplace_alter_table (this=0x7fdb900a5df8, altered_table=0x7fdb900aade0, ha_alter_info=0x7fdbe4510650, commit=true) at /data/src/10.2/storage/innobase/handler/handler0alter.cc:8587
            #10 0x0000557d5e3343af in handler::ha_commit_inplace_alter_table (this=0x7fdb900a5df8, altered_table=0x7fdb900aade0, ha_alter_info=0x7fdbe4510650, commit=true) at /data/src/10.2/sql/handler.cc:4228
            #11 0x0000557d5e189be4 in mysql_inplace_alter_table (thd=0x7fdb90000b00, table_list=0x7fdb900195b8, table=0x7fdb900a51f0, altered_table=0x7fdb900aade0, ha_alter_info=0x7fdbe4510650, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7fdbe45106c0, alter_ctx=0x7fdbe4511280) at /data/src/10.2/sql/sql_table.cc:7357
            #12 0x0000557d5e18edfe in mysql_alter_table (thd=0x7fdb90000b00, new_db=0x7fdb90018fc8 "test", new_name=0x0, create_info=0x7fdbe4511e90, table_list=0x7fdb900195b8, alter_info=0x7fdbe4511de0, order_num=0, order=0x0, ignore=false) at /data/src/10.2/sql/sql_table.cc:9323
            #13 0x0000557d5e207c26 in Sql_cmd_alter_table::execute (this=0x7fdb90019d70, thd=0x7fdb90000b00) at /data/src/10.2/sql/sql_alter.cc:324
            #14 0x0000557d5e0c486c in mysql_execute_command (thd=0x7fdb90000b00) at /data/src/10.2/sql/sql_parse.cc:6203
            #15 0x0000557d5e4aa24e in sp_instr_stmt::exec_core (this=0x7fdb90019d78, thd=0x7fdb90000b00, nextp=0x7fdbe45131a4) at /data/src/10.2/sql/sp_head.cc:3253
            #16 0x0000557d5e4a98bf in sp_lex_keeper::reset_lex_and_exec_core (this=0x7fdb90019db8, thd=0x7fdb90000b00, nextp=0x7fdbe45131a4, open_tables=false, instr=0x7fdb90019d78) at /data/src/10.2/sql/sp_head.cc:3016
            #17 0x0000557d5e4a9ef9 in sp_instr_stmt::execute (this=0x7fdb90019d78, thd=0x7fdb90000b00, nextp=0x7fdbe45131a4) at /data/src/10.2/sql/sp_head.cc:3169
            #18 0x0000557d5e4a510e in sp_head::execute (this=0x7fdb90018608, thd=0x7fdb90000b00, merge_da_on_success=true) at /data/src/10.2/sql/sp_head.cc:1324
            #19 0x0000557d5e4a71a5 in sp_head::execute_procedure (this=0x7fdb90018608, thd=0x7fdb90000b00, args=0x7fdb900053b0) at /data/src/10.2/sql/sp_head.cc:2122
            #20 0x0000557d5e0b9ae5 in do_execute_sp (thd=0x7fdb90000b00, sp=0x7fdb90018608) at /data/src/10.2/sql/sql_parse.cc:2891
            #21 0x0000557d5e0c2f34 in mysql_execute_command (thd=0x7fdb90000b00) at /data/src/10.2/sql/sql_parse.cc:5798
            #22 0x0000557d5e0c911c in mysql_parse (thd=0x7fdb90000b00, rawbuf=0x7fdb90011078 "CALL pr()", length=9, parser_state=0x7fdbe4514200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7875
            #23 0x0000557d5e0b6dc0 in dispatch_command (command=COM_QUERY, thd=0x7fdb90000b00, packet=0x7fdb90008891 "", packet_length=9, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1812
            #24 0x0000557d5e0b5736 in do_command (thd=0x7fdb90000b00) at /data/src/10.2/sql/sql_parse.cc:1360
            #25 0x0000557d5e202917 in do_handle_one_connection (connect=0x557d608398c0) at /data/src/10.2/sql/sql_connect.cc:1354
            #26 0x0000557d5e2026a4 in handle_one_connection (arg=0x557d608398c0) at /data/src/10.2/sql/sql_connect.cc:1260
            #27 0x0000557d5e61f11e in pfs_spawn_thread (arg=0x557d608584c0) at /data/src/10.2/storage/perfschema/pfs.cc:1862
            #28 0x00007fdbf507a494 in start_thread (arg=0x7fdbe4515700) at pthread_create.c:333
            #29 0x00007fdbf31f293f in clone () from /lib/x86_64-linux-gnu/libc.so.6
            

            elenst Elena Stepanova added a comment - This MTR test seems to reproduce the problem: --source include/have_innodb.inc   CREATE TABLE IF NOT EXISTS t1 (f INT) ENGINE=InnoDB; INSERT INTO t1 () VALUES ();   --connect (con1,localhost,root,,test) --let $conid= `SELECT CONNECTION_ID()` --delimiter $$ CREATE PROCEDURE pr() BEGIN LOOP ALTER TABLE t1 ADD COLUMN IF NOT EXISTS c INT PRIMARY KEY; ALTER TABLE t1 DROP COLUMN IF EXISTS c; END LOOP; END $$ --delimiter ; --send CALL pr()   --connection default --let $run= 100000   --disable_query_log --disable_result_log --echo # Running SELECTs from INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS while ($run) { SELECT COUNT(*) FROM INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS; --dec $run } --enable_query_log --enable_result_log   # Cleanup --eval KILL $conid DROP TABLE t1; DROP PROCEDURE pr; If it doesn't, here is also the RQG way: git clone https://github.com/elenst/rqg --branch=experimental rqg-mdev13918 cd rqg-mdev13918 # To start the server and run the flow $ perl ./runall-new.pl --grammar=conf/mariadb/bugs/mdev13918.yy --threads=2 --skip-gendata --duration=300 --basedir=<your basedir> --vardir=<your vardir> # or, on a server already running on port 3306 perl ./gentest.pl --grammar=conf/mariadb/bugs/mdev13918.yy --threads=2 --duration=600 --queries=100M --dsn="dbi:mysql:user=root:host=127.0.0.1:port=3306:database=test" 10.2 a659291e85 mysqld: /data/src/10.2/storage/innobase/handler/handler0alter.cc:7833: bool commit_try_rebuild(Alter_inplace_info*, ha_innobase_inplace_ctx*, TABLE*, const TABLE*, trx_t*, const char*): Assertion `user_table->get_ref_count() == 1' failed. 171009 19:22:31 [ERROR] mysqld got signal 6 ;   #7 0x00007fdbf3135ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6 #8 0x0000557d5e67ff33 in commit_try_rebuild (ha_alter_info=0x7fdbe4510650, ctx=0x7fdb900273f0, altered_table=0x7fdb900aade0, old_table=0x7fdb900a51f0, trx=0x7fdbed0a70d8, table_name=0x7fdb90034d8d "t1") at /data/src/10.2/storage/innobase/handler/handler0alter.cc:7833 #9 0x0000557d5e67c8a9 in ha_innobase::commit_inplace_alter_table (this=0x7fdb900a5df8, altered_table=0x7fdb900aade0, ha_alter_info=0x7fdbe4510650, commit=true) at /data/src/10.2/storage/innobase/handler/handler0alter.cc:8587 #10 0x0000557d5e3343af in handler::ha_commit_inplace_alter_table (this=0x7fdb900a5df8, altered_table=0x7fdb900aade0, ha_alter_info=0x7fdbe4510650, commit=true) at /data/src/10.2/sql/handler.cc:4228 #11 0x0000557d5e189be4 in mysql_inplace_alter_table (thd=0x7fdb90000b00, table_list=0x7fdb900195b8, table=0x7fdb900a51f0, altered_table=0x7fdb900aade0, ha_alter_info=0x7fdbe4510650, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7fdbe45106c0, alter_ctx=0x7fdbe4511280) at /data/src/10.2/sql/sql_table.cc:7357 #12 0x0000557d5e18edfe in mysql_alter_table (thd=0x7fdb90000b00, new_db=0x7fdb90018fc8 "test", new_name=0x0, create_info=0x7fdbe4511e90, table_list=0x7fdb900195b8, alter_info=0x7fdbe4511de0, order_num=0, order=0x0, ignore=false) at /data/src/10.2/sql/sql_table.cc:9323 #13 0x0000557d5e207c26 in Sql_cmd_alter_table::execute (this=0x7fdb90019d70, thd=0x7fdb90000b00) at /data/src/10.2/sql/sql_alter.cc:324 #14 0x0000557d5e0c486c in mysql_execute_command (thd=0x7fdb90000b00) at /data/src/10.2/sql/sql_parse.cc:6203 #15 0x0000557d5e4aa24e in sp_instr_stmt::exec_core (this=0x7fdb90019d78, thd=0x7fdb90000b00, nextp=0x7fdbe45131a4) at /data/src/10.2/sql/sp_head.cc:3253 #16 0x0000557d5e4a98bf in sp_lex_keeper::reset_lex_and_exec_core (this=0x7fdb90019db8, thd=0x7fdb90000b00, nextp=0x7fdbe45131a4, open_tables=false, instr=0x7fdb90019d78) at /data/src/10.2/sql/sp_head.cc:3016 #17 0x0000557d5e4a9ef9 in sp_instr_stmt::execute (this=0x7fdb90019d78, thd=0x7fdb90000b00, nextp=0x7fdbe45131a4) at /data/src/10.2/sql/sp_head.cc:3169 #18 0x0000557d5e4a510e in sp_head::execute (this=0x7fdb90018608, thd=0x7fdb90000b00, merge_da_on_success=true) at /data/src/10.2/sql/sp_head.cc:1324 #19 0x0000557d5e4a71a5 in sp_head::execute_procedure (this=0x7fdb90018608, thd=0x7fdb90000b00, args=0x7fdb900053b0) at /data/src/10.2/sql/sp_head.cc:2122 #20 0x0000557d5e0b9ae5 in do_execute_sp (thd=0x7fdb90000b00, sp=0x7fdb90018608) at /data/src/10.2/sql/sql_parse.cc:2891 #21 0x0000557d5e0c2f34 in mysql_execute_command (thd=0x7fdb90000b00) at /data/src/10.2/sql/sql_parse.cc:5798 #22 0x0000557d5e0c911c in mysql_parse (thd=0x7fdb90000b00, rawbuf=0x7fdb90011078 "CALL pr()", length=9, parser_state=0x7fdbe4514200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7875 #23 0x0000557d5e0b6dc0 in dispatch_command (command=COM_QUERY, thd=0x7fdb90000b00, packet=0x7fdb90008891 "", packet_length=9, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1812 #24 0x0000557d5e0b5736 in do_command (thd=0x7fdb90000b00) at /data/src/10.2/sql/sql_parse.cc:1360 #25 0x0000557d5e202917 in do_handle_one_connection (connect=0x557d608398c0) at /data/src/10.2/sql/sql_connect.cc:1354 #26 0x0000557d5e2026a4 in handle_one_connection (arg=0x557d608398c0) at /data/src/10.2/sql/sql_connect.cc:1260 #27 0x0000557d5e61f11e in pfs_spawn_thread (arg=0x557d608584c0) at /data/src/10.2/storage/perfschema/pfs.cc:1862 #28 0x00007fdbf507a494 in start_thread (arg=0x7fdbe4515700) at pthread_create.c:333 #29 0x00007fdbf31f293f in clone () from /lib/x86_64-linux-gnu/libc.so.6

            elenst, while you did not repeat this on 10.0, I think a similar bug is possible there. Instead of leading into an assertion failure on the table reference count, we could be accessing a dict_table_t object after it has been removed the data dictionary cache and freed.

            I pushed a fix to bb-10.0-marko for validation. It survives the mysql-test-run stress test case (which I did not include). I plan to merge the same code to 10.2, replacing the current improper use of the reference counter.

            marko Marko Mäkelä added a comment - elenst , while you did not repeat this on 10.0, I think a similar bug is possible there. Instead of leading into an assertion failure on the table reference count, we could be accessing a dict_table_t object after it has been removed the data dictionary cache and freed. I pushed a fix to bb-10.0-marko for validation. It survives the mysql-test-run stress test case (which I did not include). I plan to merge the same code to 10.2, replacing the current improper use of the reference counter.

            I pushed a fix to 10.0. There are no merge conflicts when merging it up to 10.2, but an additional code patch is needed in 10.2 to remove the 10.2 specific assertion failure. I believe that this should be amended to the merge commit:

            diff --git a/storage/innobase/handler/i_s.cc b/storage/innobase/handler/i_s.cc
            index 08379e494a0..9773e922316 100644
            --- a/storage/innobase/handler/i_s.cc
            +++ b/storage/innobase/handler/i_s.cc
            @@ -6433,7 +6433,6 @@ i_s_sys_tables_fill_table_stats(
             	while (rec) {
             		const char*	err_msg;
             		dict_table_t*	table_rec;
            -		ulint		ref_count;
             
             		/* Fetch the dict_table_t structure corresponding to
             		this SYS_TABLES record */
            @@ -6441,15 +6440,7 @@ i_s_sys_tables_fill_table_stats(
             			heap, rec, &table_rec,
             			DICT_TABLE_LOAD_FROM_CACHE, &mtr);
             
            -		if (table_rec != NULL) {
            -			ut_ad(err_msg == NULL);
            -
            -			ref_count = table_rec->get_ref_count();
            -
            -			/* Protect the dict_table_t object by incrementing
            -			the reference count. */
            -			table_rec->acquire();
            -		}
            +		ulint ref_count = table_rec ? table_rec->get_ref_count() : 0;
             
             		mutex_exit(&dict_sys->mutex);
             
            @@ -6459,9 +6450,11 @@ i_s_sys_tables_fill_table_stats(
             			}});
             
             		if (table_rec != NULL) {
            +			ut_ad(err_msg == NULL);
             			i_s_dict_fill_sys_tablestats(thd, table_rec, ref_count,
             						     tables->table);
             		} else {
            +			ut_ad(err_msg != NULL);
             			push_warning_printf(thd, Sql_condition::WARN_LEVEL_WARN,
             					    ER_CANT_FIND_SYSTEM_REC, "%s",
             					    err_msg);
            @@ -6474,10 +6467,6 @@ i_s_sys_tables_fill_table_stats(
             		rw_lock_s_lock(&dict_operation_lock);
             		mutex_enter(&dict_sys->mutex);
             
            -		if (table_rec != NULL) {
            -			table_rec->release();
            -		}
            -
             		mtr_start(&mtr);
             		rec = dict_getnext_system(&pcur, &mtr);
             	}
            

            marko Marko Mäkelä added a comment - I pushed a fix to 10.0. There are no merge conflicts when merging it up to 10.2, but an additional code patch is needed in 10.2 to remove the 10.2 specific assertion failure. I believe that this should be amended to the merge commit: diff --git a/storage/innobase/handler/i_s.cc b/storage/innobase/handler/i_s.cc index 08379e494a0..9773e922316 100644 --- a/storage/innobase/handler/i_s.cc +++ b/storage/innobase/handler/i_s.cc @@ -6433,7 +6433,6 @@ i_s_sys_tables_fill_table_stats( while (rec) { const char* err_msg; dict_table_t* table_rec; - ulint ref_count; /* Fetch the dict_table_t structure corresponding to this SYS_TABLES record */ @@ -6441,15 +6440,7 @@ i_s_sys_tables_fill_table_stats( heap, rec, &table_rec, DICT_TABLE_LOAD_FROM_CACHE, &mtr); - if (table_rec != NULL) { - ut_ad(err_msg == NULL); - - ref_count = table_rec->get_ref_count(); - - /* Protect the dict_table_t object by incrementing - the reference count. */ - table_rec->acquire(); - } + ulint ref_count = table_rec ? table_rec->get_ref_count() : 0; mutex_exit(&dict_sys->mutex); @@ -6459,9 +6450,11 @@ i_s_sys_tables_fill_table_stats( }}); if (table_rec != NULL) { + ut_ad(err_msg == NULL); i_s_dict_fill_sys_tablestats(thd, table_rec, ref_count, tables->table); } else { + ut_ad(err_msg != NULL); push_warning_printf(thd, Sql_condition::WARN_LEVEL_WARN, ER_CANT_FIND_SYSTEM_REC, "%s", err_msg); @@ -6474,10 +6467,6 @@ i_s_sys_tables_fill_table_stats( rw_lock_s_lock(&dict_operation_lock); mutex_enter(&dict_sys->mutex); - if (table_rec != NULL) { - table_rec->release(); - } - mtr_start(&mtr); rec = dict_getnext_system(&pcur, &mtr); }

            I pushed the post-merge fix to 10.2 separately, because it was not amended as part of the merge of the fix. This also closes MDEV-14021.

            marko Marko Mäkelä added a comment - I pushed the post-merge fix to 10.2 separately, because it was not amended as part of the merge of the fix. This also closes MDEV-14021 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.