[MDEV-14614] InnoDB: Failing assertion: trx->error_state == DB_SUCCESS or lock wait timeout upon saving statistics Created: 2017-12-11  Updated: 2018-01-12  Resolved: 2017-12-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.12, 10.3.3
Fix Version/s: 10.2.12, 10.3.3

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Problem/Incident
is caused by MDEV-14511 Use fewer transactions for updating I... Closed

 Description   

Even though the test case is non-concurrent on the surface, it's a subject to race condition under the surface. Run with --repeat=N.

--source include/have_innodb.inc
--source include/have_partition.inc
 
SET GLOBAL innodb_stats_persistent= ON;
SET GLOBAL innodb_lock_wait_timeout= 3, lock_wait_timeout= 5;
 
CREATE TABLE t1 (pk BIGINT AUTO_INCREMENT PRIMARY KEY) ENGINE=InnoDB ROW_FORMAT=COMPRESSED;
INSERT INTO t1 VALUES (NULL);
SELECT DATABASE();
 
--error ER_ACCESS_DENIED_ERROR
--connect (con1,localhost,non_existing_user,,test)
 
--error ER_NO_SUCH_TABLE
ALTER TABLE non_existing_table PARTITION BY HASH(pk);
 
INSERT INTO t1 VALUES (NULL);
ALTER TABLE t1 PARTITION BY HASH(pk);

10.2 0af52734a790b61

2017-12-11 04:09:30 0x7f253c338700  InnoDB: Assertion failure in file /data/src/10.2/storage/innobase/que/que0que.cc line 1213
InnoDB: Failing assertion: trx->error_state == DB_SUCCESS
 
#5  0x00007f2543ed43fa in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x0000556979e15f12 in ut_dbg_assertion_failed (expr=0x55697a2ac598 "trx->error_state == DB_SUCCESS", file=0x55697a2ac258 "/data/src/10.2/storage/innobase/que/que0que.cc", line=1213) at /data/src/10.2/storage/innobase/ut/ut0dbg.cc:61
#7  0x0000556979cf4d98 in que_eval_sql (info=0x7f24e4079c58, sql=0x55697a36ea98 "PROCEDURE RENAME_TABLE_IN_TABLE_STATS () IS\nBEGIN\nUPDATE \"mysql/innodb_table_stats\" SET\ndatabase_name = :new_dbname_utf8,\ntable_name = :new_tablename_utf8\nWHERE\ndatabase_name = :old_dbname_utf8 AND\nta"..., reserve_dict_mutex=0, trx=0x7f253de3d508) at /data/src/10.2/storage/innobase/que/que0que.cc:1213
#8  0x0000556979ee6628 in dict_stats_exec_sql (pinfo=0x7f24e4079c58, sql=0x55697a36ea98 "PROCEDURE RENAME_TABLE_IN_TABLE_STATS () IS\nBEGIN\nUPDATE \"mysql/innodb_table_stats\" SET\ndatabase_name = :new_dbname_utf8,\ntable_name = :new_tablename_utf8\nWHERE\ndatabase_name = :old_dbname_utf8 AND\nta"..., trx=0x7f253de3d508) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:290
#9  0x0000556979eed2ab in dict_stats_rename_in_table_stats (old_dbname_utf8=0x7f253c332de0 "test", old_tablename_utf8=0x7f253c332f80 "t1", new_dbname_utf8=0x7f253c332eb0 "test", new_tablename_utf8=0x7f253c333050 "#mysql50##sql2-2a34-9", trx=0x7f253de3d508) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:3562
#10 0x0000556979eed4ec in dict_stats_rename_table (old_name=0x7f253c333510 "test/t1", new_name=0x7f253c3337a0 "test/#sql2-2a34-9", errstr=0x7f253c333310 "0\n", errstr_sz=512, trx=0x7f253de3d508) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:3650
#11 0x0000556979c117c3 in ha_innobase::rename_table (this=0x7f24e4014df0, from=0x7f253c333cb0 "./test/t1", to=0x7f253c333ec0 "./test/#sql2-2a34-9") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:14050
#12 0x00005569798f9e5f in handler::ha_rename_table (this=0x7f24e4014df0, from=0x7f253c333cb0 "./test/t1", to=0x7f253c333ec0 "./test/#sql2-2a34-9") at /data/src/10.2/sql/handler.cc:4321
#13 0x000055697974879f in mysql_rename_table (base=0x55697bd3bd60, old_db=0x7f24e4012bf0 "test", old_name=0x7f24e40125a8 "t1", new_db=0x7f24e4012bf0 "test", new_name=0x7f253c334c30 "#sql2-2a34-9", flags=2) at /data/src/10.2/sql/sql_table.cc:5306
#14 0x00005569797536c4 in mysql_alter_table (thd=0x7f24e4000b00, new_db=0x7f24e4012bf0 "test", new_name=0x0, create_info=0x7f253c335e50, table_list=0x7f24e40125e0, alter_info=0x7f253c335da0, order_num=0, order=0x0, ignore=false) at /data/src/10.2/sql/sql_table.cc:9555
#15 0x00005569797cc478 in Sql_cmd_alter_table::execute (this=0x7f24e4012f30, thd=0x7f24e4000b00) at /data/src/10.2/sql/sql_alter.cc:324
#16 0x0000556979688538 in mysql_execute_command (thd=0x7f24e4000b00) at /data/src/10.2/sql/sql_parse.cc:6205
#17 0x000055697968ce62 in mysql_parse (thd=0x7f24e4000b00, rawbuf=0x7f24e40124e8 "ALTER TABLE t1 PARTITION BY HASH(pk)", length=36, parser_state=0x7f253c337200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7896
#18 0x000055697967ad94 in dispatch_command (command=COM_QUERY, thd=0x7f24e4000b00, packet=0x7f24e408cbb1 "", packet_length=36, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1805
#19 0x00005569796796f2 in do_command (thd=0x7f24e4000b00) at /data/src/10.2/sql/sql_parse.cc:1360
#20 0x00005569797c7169 in do_handle_one_connection (connect=0x55697c261590) at /data/src/10.2/sql/sql_connect.cc:1354
#21 0x00005569797c6ef6 in handle_one_connection (arg=0x55697c261590) at /data/src/10.2/sql/sql_connect.cc:1260
#22 0x0000556979be588e in pfs_spawn_thread (arg=0x55697c26bb00) at /data/src/10.2/storage/perfschema/pfs.cc:1863
#23 0x00007f2545ba2494 in start_thread (arg=0x7f253c338700) at pthread_create.c:333
#24 0x00007f2543f8893f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Sometimes instead of the assertion failure I get

2017-12-11  3:58:40 139980515936000 [ERROR] InnoDB: Cannot save table statistics for table `test`.`#sql-238e_9#P#p0` /* Partition `p0` */: Lock wait timeout

which also appears to be strange.

It appears to be a recent regression.

bb-10.2-ext and 10.3 are also affected.



 Comments   
Comment by Marko Mäkelä [ 2017-12-11 ]

I am unable to repeat the problem (the test fails differently for me), but I do have a proposed fix:

diff --git a/storage/innobase/btr/btr0defragment.cc b/storage/innobase/btr/btr0defragment.cc
index cdbe47d48f8..86df5a077a8 100644
--- a/storage/innobase/btr/btr0defragment.cc
+++ b/storage/innobase/btr/btr0defragment.cc
@@ -831,6 +831,7 @@ DECLARE_THREAD(btr_defragment_thread)(void*)
 			mtr_commit(&mtr);
 			/* Reaching the end of the index. */
 			dict_stats_empty_defrag_stats(index);
+			trx->error_state = DB_SUCCESS;
 			ut_d(trx->persistent_stats = true);
 			++trx->will_lock;
 			dberr_t err = dict_stats_save_defrag_stats(index, trx);
diff --git a/storage/innobase/dict/dict0defrag_bg.cc b/storage/innobase/dict/dict0defrag_bg.cc
index e43a75b2cf7..5f5edafc9fb 100644
--- a/storage/innobase/dict/dict0defrag_bg.cc
+++ b/storage/innobase/dict/dict0defrag_bg.cc
@@ -213,6 +213,7 @@ dict_stats_process_entry_from_defrag_pool(trx_t* trx)
 	index_id_t	index_id;
 
 	ut_ad(!srv_read_only_mode);
+	ut_ad(trx->persistent_stats);
 
 	/* pop the first index from the auto defrag pool */
 	if (!dict_stats_defrag_pool_get(&table_id, &index_id)) {
@@ -242,6 +243,7 @@ dict_stats_process_entry_from_defrag_pool(trx_t* trx)
 	}
 
 	mutex_exit(&dict_sys->mutex);
+	trx->error_state = DB_SUCCESS;
 	++trx->will_lock;
 	dberr_t err = dict_stats_save_defrag_stats(index, trx);
 
@@ -294,6 +296,9 @@ dict_stats_save_defrag_summary(dict_index_t* index, trx_t* trx)
 dberr_t
 dict_stats_save_defrag_stats(dict_index_t* index, trx_t* trx)
 {
+	ut_ad(trx->error_state == DB_SUCCESS);
+	ut_ad(trx->persistent_stats);
+
 	if (dict_index_is_ibuf(index)) {
 		return DB_SUCCESS;
 	}
diff --git a/storage/innobase/dict/dict0stats_bg.cc b/storage/innobase/dict/dict0stats_bg.cc
index 0f76c73bc96..3f5419a0751 100644
--- a/storage/innobase/dict/dict0stats_bg.cc
+++ b/storage/innobase/dict/dict0stats_bg.cc
@@ -378,6 +378,7 @@ dict_stats_process_entry_from_recalc_pool(trx_t* trx)
 		dict_stats_recalc_pool_add(table);
 
 	} else {
+		trx->error_state = DB_SUCCESS;
 		++trx->will_lock;
 		dict_stats_update(table, DICT_STATS_RECALC_PERSISTENT, trx);
 		if (trx->state != TRX_STATE_NOT_STARTED) {
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index a87ab677d04..e7a4532a604 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -6480,6 +6480,7 @@ ha_innobase::open(const char* name, int, uint)
 		}
 		ut_ad(!trx->persistent_stats);
 		ut_d(trx->persistent_stats = true);
+		trx->error_state = DB_SUCCESS;
 		++trx->will_lock;
 		dict_stats_init(ib_table, trx);
 		innobase_commit_low(trx);
@@ -13281,6 +13282,7 @@ create_table_info_t::create_table_update_dict()
 	innobase_copy_frm_flags_from_create_info(innobase_table, m_create_info);
 
 	++m_trx->will_lock;
+	m_trx->error_state = DB_SUCCESS;
 	dict_stats_update(innobase_table, DICT_STATS_EMPTY_TABLE, m_trx);
 	innobase_commit_low(m_trx);
 
@@ -14045,6 +14047,7 @@ ha_innobase::rename_table(
 		normalize_table_name(norm_from, from);
 		normalize_table_name(norm_to, to);
 
+		trx->error_state = DB_SUCCESS;
 		++trx->will_lock;
 		ret = dict_stats_rename_table(norm_from, norm_to,
 					      errstr, sizeof errstr, trx);
diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc
index d2401bd4e2d..44c40d05b4c 100644
--- a/storage/innobase/handler/handler0alter.cc
+++ b/storage/innobase/handler/handler0alter.cc
@@ -9073,6 +9073,7 @@ ha_innobase::commit_inplace_alter_table(
 	}
 
 	row_mysql_unlock_data_dictionary(trx);
+	trx->error_state = DB_SUCCESS;
 	++trx->will_lock;
 
 	/* TODO: The following code could be executed

In these places, we are reusing a transaction object that was committed. Transaction commit will not reset the error_state, and sometimes we have error==DB_SUCCESS despite error_state being something else. Apparently this was the case in ha_innobase::rename_table(), most likely because saving the statistics failed for some partition of the table, due to concurrent activity from a background thread that updates persistent statistics.

Comment by Julien Muchembled [ 2018-01-12 ]

Since we upgraded from 10.2.11 to 10.2.12, we also have Lock wait timeouts (but no assertion failure) and regressions in the test suite of our software:
> show full processlist;

Id User Host db Command Time State Info
2 system user   NULL Daemon NULL InnoDB purge worker NULL
1 system user   NULL Daemon NULL InnoDB purge coordinator NULL
3 system user   NULL Daemon NULL InnoDB purge worker NULL
4 system user   NULL Daemon NULL InnoDB purge worker NULL
5 system user   NULL Daemon NULL InnoDB shutdown handler NULL
164 root localhost test_neo1 Query 47 Opening tables REPLACE INTO config VALUES ('version', '1')
165 root localhost test_neo2 Query 47 creating table CREATE TABLE IF NOT EXISTS pt (
rid INT UNSIGNED NOT NULL,
nid INT NOT NULL,
state TINYINT UNSIGNED NOT NULL,
PRIMARY KEY (rid, nid)
) ENGINE=InnoDB
166 root localhost test_neo3 Query 47 creating table CREATE TABLE config (
name VARBINARY(255) NOT NULL PRIMARY KEY,
value VARBINARY(255) NULL
) ENGINE=InnoDB
167 root localhost test_neo4 Query 47 creating table CREATE TABLE config (
name VARBINARY(255) NOT NULL PRIMARY KEY,
value VARBINARY(255) NULL
) ENGINE=InnoDB
168 root localhost NULL Query 0 init show full processlist

10 rows in set (0.00 sec)

Generated at Thu Feb 08 08:14:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.