Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-14614

InnoDB: Failing assertion: trx->error_state == DB_SUCCESS or lock wait timeout upon saving statistics

Details

    Description

      Even though the test case is non-concurrent on the surface, it's a subject to race condition under the surface. Run with --repeat=N.

      --source include/have_innodb.inc
      --source include/have_partition.inc
       
      SET GLOBAL innodb_stats_persistent= ON;
      SET GLOBAL innodb_lock_wait_timeout= 3, lock_wait_timeout= 5;
       
      CREATE TABLE t1 (pk BIGINT AUTO_INCREMENT PRIMARY KEY) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;
      INSERT INTO t1 VALUES (NULL);
      SELECT DATABASE();
       
      --error ER_ACCESS_DENIED_ERROR
      --connect (con1,localhost,non_existing_user,,test)
       
      --error ER_NO_SUCH_TABLE
      ALTER TABLE non_existing_table PARTITION BY HASH(pk);
       
      INSERT INTO t1 VALUES (NULL);
      ALTER TABLE t1 PARTITION BY HASH(pk);
      

      10.2 0af52734a790b61

      2017-12-11 04:09:30 0x7f253c338700  InnoDB: Assertion failure in file /data/src/10.2/storage/innobase/que/que0que.cc line 1213
      InnoDB: Failing assertion: trx->error_state == DB_SUCCESS
       
      #5  0x00007f2543ed43fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
      #6  0x0000556979e15f12 in ut_dbg_assertion_failed (expr=0x55697a2ac598 "trx->error_state == DB_SUCCESS", file=0x55697a2ac258 "/data/src/10.2/storage/innobase/que/que0que.cc", line=1213) at /data/src/10.2/storage/innobase/ut/ut0dbg.cc:61
      #7  0x0000556979cf4d98 in que_eval_sql (info=0x7f24e4079c58, sql=0x55697a36ea98 "PROCEDURE RENAME_TABLE_IN_TABLE_STATS () IS\nBEGIN\nUPDATE \"mysql/innodb_table_stats\" SET\ndatabase_name = :new_dbname_utf8,\ntable_name = :new_tablename_utf8\nWHERE\ndatabase_name = :old_dbname_utf8 AND\nta"..., reserve_dict_mutex=0, trx=0x7f253de3d508) at /data/src/10.2/storage/innobase/que/que0que.cc:1213
      #8  0x0000556979ee6628 in dict_stats_exec_sql (pinfo=0x7f24e4079c58, sql=0x55697a36ea98 "PROCEDURE RENAME_TABLE_IN_TABLE_STATS () IS\nBEGIN\nUPDATE \"mysql/innodb_table_stats\" SET\ndatabase_name = :new_dbname_utf8,\ntable_name = :new_tablename_utf8\nWHERE\ndatabase_name = :old_dbname_utf8 AND\nta"..., trx=0x7f253de3d508) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:290
      #9  0x0000556979eed2ab in dict_stats_rename_in_table_stats (old_dbname_utf8=0x7f253c332de0 "test", old_tablename_utf8=0x7f253c332f80 "t1", new_dbname_utf8=0x7f253c332eb0 "test", new_tablename_utf8=0x7f253c333050 "#mysql50##sql2-2a34-9", trx=0x7f253de3d508) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:3562
      #10 0x0000556979eed4ec in dict_stats_rename_table (old_name=0x7f253c333510 "test/t1", new_name=0x7f253c3337a0 "test/#sql2-2a34-9", errstr=0x7f253c333310 "0\n", errstr_sz=512, trx=0x7f253de3d508) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:3650
      #11 0x0000556979c117c3 in ha_innobase::rename_table (this=0x7f24e4014df0, from=0x7f253c333cb0 "./test/t1", to=0x7f253c333ec0 "./test/#sql2-2a34-9") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:14050
      #12 0x00005569798f9e5f in handler::ha_rename_table (this=0x7f24e4014df0, from=0x7f253c333cb0 "./test/t1", to=0x7f253c333ec0 "./test/#sql2-2a34-9") at /data/src/10.2/sql/handler.cc:4321
      #13 0x000055697974879f in mysql_rename_table (base=0x55697bd3bd60, old_db=0x7f24e4012bf0 "test", old_name=0x7f24e40125a8 "t1", new_db=0x7f24e4012bf0 "test", new_name=0x7f253c334c30 "#sql2-2a34-9", flags=2) at /data/src/10.2/sql/sql_table.cc:5306
      #14 0x00005569797536c4 in mysql_alter_table (thd=0x7f24e4000b00, new_db=0x7f24e4012bf0 "test", new_name=0x0, create_info=0x7f253c335e50, table_list=0x7f24e40125e0, alter_info=0x7f253c335da0, order_num=0, order=0x0, ignore=false) at /data/src/10.2/sql/sql_table.cc:9555
      #15 0x00005569797cc478 in Sql_cmd_alter_table::execute (this=0x7f24e4012f30, thd=0x7f24e4000b00) at /data/src/10.2/sql/sql_alter.cc:324
      #16 0x0000556979688538 in mysql_execute_command (thd=0x7f24e4000b00) at /data/src/10.2/sql/sql_parse.cc:6205
      #17 0x000055697968ce62 in mysql_parse (thd=0x7f24e4000b00, rawbuf=0x7f24e40124e8 "ALTER TABLE t1 PARTITION BY HASH(pk)", length=36, parser_state=0x7f253c337200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7896
      #18 0x000055697967ad94 in dispatch_command (command=COM_QUERY, thd=0x7f24e4000b00, packet=0x7f24e408cbb1 "", packet_length=36, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1805
      #19 0x00005569796796f2 in do_command (thd=0x7f24e4000b00) at /data/src/10.2/sql/sql_parse.cc:1360
      #20 0x00005569797c7169 in do_handle_one_connection (connect=0x55697c261590) at /data/src/10.2/sql/sql_connect.cc:1354
      #21 0x00005569797c6ef6 in handle_one_connection (arg=0x55697c261590) at /data/src/10.2/sql/sql_connect.cc:1260
      #22 0x0000556979be588e in pfs_spawn_thread (arg=0x55697c26bb00) at /data/src/10.2/storage/perfschema/pfs.cc:1863
      #23 0x00007f2545ba2494 in start_thread (arg=0x7f253c338700) at pthread_create.c:333
      #24 0x00007f2543f8893f in clone () from /lib/x86_64-linux-gnu/libc.so.6
      

      Sometimes instead of the assertion failure I get

      2017-12-11  3:58:40 139980515936000 [ERROR] InnoDB: Cannot save table statistics for table `test`.`#sql-238e_9#P#p0` /* Partition `p0` */: Lock wait timeout
      

      which also appears to be strange.

      It appears to be a recent regression.

      bb-10.2-ext and 10.3 are also affected.

      Attachments

        Issue Links

          Activity

            I am unable to repeat the problem (the test fails differently for me), but I do have a proposed fix:

            diff --git a/storage/innobase/btr/btr0defragment.cc b/storage/innobase/btr/btr0defragment.cc
            index cdbe47d48f8..86df5a077a8 100644
            --- a/storage/innobase/btr/btr0defragment.cc
            +++ b/storage/innobase/btr/btr0defragment.cc
            @@ -831,6 +831,7 @@ DECLARE_THREAD(btr_defragment_thread)(void*)
             			mtr_commit(&mtr);
             			/* Reaching the end of the index. */
             			dict_stats_empty_defrag_stats(index);
            +			trx->error_state = DB_SUCCESS;
             			ut_d(trx->persistent_stats = true);
             			++trx->will_lock;
             			dberr_t err = dict_stats_save_defrag_stats(index, trx);
            diff --git a/storage/innobase/dict/dict0defrag_bg.cc b/storage/innobase/dict/dict0defrag_bg.cc
            index e43a75b2cf7..5f5edafc9fb 100644
            --- a/storage/innobase/dict/dict0defrag_bg.cc
            +++ b/storage/innobase/dict/dict0defrag_bg.cc
            @@ -213,6 +213,7 @@ dict_stats_process_entry_from_defrag_pool(trx_t* trx)
             	index_id_t	index_id;
             
             	ut_ad(!srv_read_only_mode);
            +	ut_ad(trx->persistent_stats);
             
             	/* pop the first index from the auto defrag pool */
             	if (!dict_stats_defrag_pool_get(&table_id, &index_id)) {
            @@ -242,6 +243,7 @@ dict_stats_process_entry_from_defrag_pool(trx_t* trx)
             	}
             
             	mutex_exit(&dict_sys->mutex);
            +	trx->error_state = DB_SUCCESS;
             	++trx->will_lock;
             	dberr_t err = dict_stats_save_defrag_stats(index, trx);
             
            @@ -294,6 +296,9 @@ dict_stats_save_defrag_summary(dict_index_t* index, trx_t* trx)
             dberr_t
             dict_stats_save_defrag_stats(dict_index_t* index, trx_t* trx)
             {
            +	ut_ad(trx->error_state == DB_SUCCESS);
            +	ut_ad(trx->persistent_stats);
            +
             	if (dict_index_is_ibuf(index)) {
             		return DB_SUCCESS;
             	}
            diff --git a/storage/innobase/dict/dict0stats_bg.cc b/storage/innobase/dict/dict0stats_bg.cc
            index 0f76c73bc96..3f5419a0751 100644
            --- a/storage/innobase/dict/dict0stats_bg.cc
            +++ b/storage/innobase/dict/dict0stats_bg.cc
            @@ -378,6 +378,7 @@ dict_stats_process_entry_from_recalc_pool(trx_t* trx)
             		dict_stats_recalc_pool_add(table);
             
             	} else {
            +		trx->error_state = DB_SUCCESS;
             		++trx->will_lock;
             		dict_stats_update(table, DICT_STATS_RECALC_PERSISTENT, trx);
             		if (trx->state != TRX_STATE_NOT_STARTED) {
            diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
            index a87ab677d04..e7a4532a604 100644
            --- a/storage/innobase/handler/ha_innodb.cc
            +++ b/storage/innobase/handler/ha_innodb.cc
            @@ -6480,6 +6480,7 @@ ha_innobase::open(const char* name, int, uint)
             		}
             		ut_ad(!trx->persistent_stats);
             		ut_d(trx->persistent_stats = true);
            +		trx->error_state = DB_SUCCESS;
             		++trx->will_lock;
             		dict_stats_init(ib_table, trx);
             		innobase_commit_low(trx);
            @@ -13281,6 +13282,7 @@ create_table_info_t::create_table_update_dict()
             	innobase_copy_frm_flags_from_create_info(innobase_table, m_create_info);
             
             	++m_trx->will_lock;
            +	m_trx->error_state = DB_SUCCESS;
             	dict_stats_update(innobase_table, DICT_STATS_EMPTY_TABLE, m_trx);
             	innobase_commit_low(m_trx);
             
            @@ -14045,6 +14047,7 @@ ha_innobase::rename_table(
             		normalize_table_name(norm_from, from);
             		normalize_table_name(norm_to, to);
             
            +		trx->error_state = DB_SUCCESS;
             		++trx->will_lock;
             		ret = dict_stats_rename_table(norm_from, norm_to,
             					      errstr, sizeof errstr, trx);
            diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc
            index d2401bd4e2d..44c40d05b4c 100644
            --- a/storage/innobase/handler/handler0alter.cc
            +++ b/storage/innobase/handler/handler0alter.cc
            @@ -9073,6 +9073,7 @@ ha_innobase::commit_inplace_alter_table(
             	}
             
             	row_mysql_unlock_data_dictionary(trx);
            +	trx->error_state = DB_SUCCESS;
             	++trx->will_lock;
             
             	/* TODO: The following code could be executed
            

            In these places, we are reusing a transaction object that was committed. Transaction commit will not reset the error_state, and sometimes we have error==DB_SUCCESS despite error_state being something else. Apparently this was the case in ha_innobase::rename_table(), most likely because saving the statistics failed for some partition of the table, due to concurrent activity from a background thread that updates persistent statistics.

            marko Marko Mäkelä added a comment - I am unable to repeat the problem (the test fails differently for me), but I do have a proposed fix: diff --git a/storage/innobase/btr/btr0defragment.cc b/storage/innobase/btr/btr0defragment.cc index cdbe47d48f8..86df5a077a8 100644 --- a/storage/innobase/btr/btr0defragment.cc +++ b/storage/innobase/btr/btr0defragment.cc @@ -831,6 +831,7 @@ DECLARE_THREAD(btr_defragment_thread)(void*) mtr_commit(&mtr); /* Reaching the end of the index. */ dict_stats_empty_defrag_stats(index); + trx->error_state = DB_SUCCESS; ut_d(trx->persistent_stats = true); ++trx->will_lock; dberr_t err = dict_stats_save_defrag_stats(index, trx); diff --git a/storage/innobase/dict/dict0defrag_bg.cc b/storage/innobase/dict/dict0defrag_bg.cc index e43a75b2cf7..5f5edafc9fb 100644 --- a/storage/innobase/dict/dict0defrag_bg.cc +++ b/storage/innobase/dict/dict0defrag_bg.cc @@ -213,6 +213,7 @@ dict_stats_process_entry_from_defrag_pool(trx_t* trx) index_id_t index_id; ut_ad(!srv_read_only_mode); + ut_ad(trx->persistent_stats); /* pop the first index from the auto defrag pool */ if (!dict_stats_defrag_pool_get(&table_id, &index_id)) { @@ -242,6 +243,7 @@ dict_stats_process_entry_from_defrag_pool(trx_t* trx) } mutex_exit(&dict_sys->mutex); + trx->error_state = DB_SUCCESS; ++trx->will_lock; dberr_t err = dict_stats_save_defrag_stats(index, trx); @@ -294,6 +296,9 @@ dict_stats_save_defrag_summary(dict_index_t* index, trx_t* trx) dberr_t dict_stats_save_defrag_stats(dict_index_t* index, trx_t* trx) { + ut_ad(trx->error_state == DB_SUCCESS); + ut_ad(trx->persistent_stats); + if (dict_index_is_ibuf(index)) { return DB_SUCCESS; } diff --git a/storage/innobase/dict/dict0stats_bg.cc b/storage/innobase/dict/dict0stats_bg.cc index 0f76c73bc96..3f5419a0751 100644 --- a/storage/innobase/dict/dict0stats_bg.cc +++ b/storage/innobase/dict/dict0stats_bg.cc @@ -378,6 +378,7 @@ dict_stats_process_entry_from_recalc_pool(trx_t* trx) dict_stats_recalc_pool_add(table); } else { + trx->error_state = DB_SUCCESS; ++trx->will_lock; dict_stats_update(table, DICT_STATS_RECALC_PERSISTENT, trx); if (trx->state != TRX_STATE_NOT_STARTED) { diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc index a87ab677d04..e7a4532a604 100644 --- a/storage/innobase/handler/ha_innodb.cc +++ b/storage/innobase/handler/ha_innodb.cc @@ -6480,6 +6480,7 @@ ha_innobase::open(const char* name, int, uint) } ut_ad(!trx->persistent_stats); ut_d(trx->persistent_stats = true); + trx->error_state = DB_SUCCESS; ++trx->will_lock; dict_stats_init(ib_table, trx); innobase_commit_low(trx); @@ -13281,6 +13282,7 @@ create_table_info_t::create_table_update_dict() innobase_copy_frm_flags_from_create_info(innobase_table, m_create_info); ++m_trx->will_lock; + m_trx->error_state = DB_SUCCESS; dict_stats_update(innobase_table, DICT_STATS_EMPTY_TABLE, m_trx); innobase_commit_low(m_trx); @@ -14045,6 +14047,7 @@ ha_innobase::rename_table( normalize_table_name(norm_from, from); normalize_table_name(norm_to, to); + trx->error_state = DB_SUCCESS; ++trx->will_lock; ret = dict_stats_rename_table(norm_from, norm_to, errstr, sizeof errstr, trx); diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc index d2401bd4e2d..44c40d05b4c 100644 --- a/storage/innobase/handler/handler0alter.cc +++ b/storage/innobase/handler/handler0alter.cc @@ -9073,6 +9073,7 @@ ha_innobase::commit_inplace_alter_table( } row_mysql_unlock_data_dictionary(trx); + trx->error_state = DB_SUCCESS; ++trx->will_lock; /* TODO: The following code could be executed In these places, we are reusing a transaction object that was committed. Transaction commit will not reset the error_state, and sometimes we have error==DB_SUCCESS despite error_state being something else. Apparently this was the case in ha_innobase::rename_table(), most likely because saving the statistics failed for some partition of the table, due to concurrent activity from a background thread that updates persistent statistics.

            Since we upgraded from 10.2.11 to 10.2.12, we also have Lock wait timeouts (but no assertion failure) and regressions in the test suite of our software:
            > show full processlist;

            Id User Host db Command Time State Info
            2 system user   NULL Daemon NULL InnoDB purge worker NULL
            1 system user   NULL Daemon NULL InnoDB purge coordinator NULL
            3 system user   NULL Daemon NULL InnoDB purge worker NULL
            4 system user   NULL Daemon NULL InnoDB purge worker NULL
            5 system user   NULL Daemon NULL InnoDB shutdown handler NULL
            164 root localhost test_neo1 Query 47 Opening tables REPLACE INTO config VALUES ('version', '1')
            165 root localhost test_neo2 Query 47 creating table CREATE TABLE IF NOT EXISTS pt (
            rid INT UNSIGNED NOT NULL,
            nid INT NOT NULL,
            state TINYINT UNSIGNED NOT NULL,
            PRIMARY KEY (rid, nid)
            ) ENGINE=InnoDB
            166 root localhost test_neo3 Query 47 creating table CREATE TABLE config (
            name VARBINARY(255) NOT NULL PRIMARY KEY,
            value VARBINARY(255) NULL
            ) ENGINE=InnoDB
            167 root localhost test_neo4 Query 47 creating table CREATE TABLE config (
            name VARBINARY(255) NOT NULL PRIMARY KEY,
            value VARBINARY(255) NULL
            ) ENGINE=InnoDB
            168 root localhost NULL Query 0 init show full processlist

            10 rows in set (0.00 sec)

            jmuchemb Julien Muchembled added a comment - Since we upgraded from 10.2.11 to 10.2.12, we also have Lock wait timeouts (but no assertion failure) and regressions in the test suite of our software : > show full processlist; Id User Host db Command Time State Info 2 system user   NULL Daemon NULL InnoDB purge worker NULL 1 system user   NULL Daemon NULL InnoDB purge coordinator NULL 3 system user   NULL Daemon NULL InnoDB purge worker NULL 4 system user   NULL Daemon NULL InnoDB purge worker NULL 5 system user   NULL Daemon NULL InnoDB shutdown handler NULL 164 root localhost test_neo1 Query 47 Opening tables REPLACE INTO config VALUES ('version', '1') 165 root localhost test_neo2 Query 47 creating table CREATE TABLE IF NOT EXISTS pt ( rid INT UNSIGNED NOT NULL, nid INT NOT NULL, state TINYINT UNSIGNED NOT NULL, PRIMARY KEY (rid, nid) ) ENGINE=InnoDB 166 root localhost test_neo3 Query 47 creating table CREATE TABLE config ( name VARBINARY(255) NOT NULL PRIMARY KEY, value VARBINARY(255) NULL ) ENGINE=InnoDB 167 root localhost test_neo4 Query 47 creating table CREATE TABLE config ( name VARBINARY(255) NOT NULL PRIMARY KEY, value VARBINARY(255) NULL ) ENGINE=InnoDB 168 root localhost NULL Query 0 init show full processlist 10 rows in set (0.00 sec)

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.