[MDEV-26516] WSREP: Record locking is disabled in this thread, but the table being modified is not `mysql/wsrep_streaming_log`: `mysql/innodb_table_stats` Created: 2021-09-01  Updated: 2023-10-12

Status: Stalled
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4, 10.5, 10.6, 10.7
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Julius Goryavsky
Resolution: Unresolved Votes: 0
Labels: regression, regression-10.4

Issue Links:
Relates
relates to MDEV-4750 join_outer_innodb.test fails in 10.0-... Closed

 Description   

After I pushed a follow-up to MDEV-4750 to 10.6 to no longer globally set innodb_stats_persistent=OFF in all tests, we got a test failure:

10.6 241e2ba642590e191359466911e7d24427f1993c

galera.galera_var_cluster_address 'innodb' w2 [ fail ]
        Test ended at 2021-08-31 18:25:33
 
CURRENT_TEST: galera.galera_var_cluster_address
2021-08-31 18:25:15 148 [Note] WSREP: Server status change joiner -> joined
2021-08-31 18:25:15 148 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2021-08-31 18:25:15 150 [Note] WSREP: Recovered cluster id d8111f9f-0a87-11ec-b553-db00c63c8236
2021-08-31 18:25:15 150 [ERROR] WSREP: Record locking is disabled in this thread, but the table being modified is not `mysql/wsrep_streaming_log`: `mysql/innodb_table_stats`.
2021-08-31 18:25:15 0x7fe093c1d700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.6.5/storage/innobase/row/row0ins.cc line 3199

The cause of this appears to be that on opening the table, innodb_stats_auto_recalc may kick in while Galera is not expecting this.

I was unable to reproduce this crash locally by running the following:

./mtr --no-reorder galera.galera_set_position_after_dummy_writeset galera.galera_sp_bf_abort galera.galera_sp_insert_parallel galera.galera_split_brain galera.galera_sql_log_bin_zero galera.galera_status_cluster galera.galera_status_local_index galera.galera_status_local_state galera.galera_strict_require_innodb galera.galera_strict_require_primary_key galera.galera_suspend_slave galera.galera_sync_wait_show galera.galera_toi_alter_auto_increment galera.galera_toi_ddl_locking galera.galera_toi_ddl_sequential galera.galera_toi_drop_database galera.galera_toi_ftwrl galera.galera_toi_lock_exclusive galera.galera_toi_lock_shared galera.galera_transaction_read_only galera.galera_transaction_replay galera.galera_truncate galera.galera_truncate_temporary galera.galera_unicode_identifiers galera.galera_unicode_pk galera.galera_update_limit galera.galera_var_OSU_method galera.galera_var_OSU_method2 galera.galera_var_auto_inc_control_off galera.galera_var_certify_nonPK_off galera.galera_var_cluster_address

After the test galera.galera_truncate, the execution of the tests would seem to hang.

As far as I can tell, the table wsrep_streaming_log along with streaming replication was introduced in Galera 4.

I would suggest the following change, and I think that it needs to be applied in 10.4 already.

diff --git a/storage/innobase/dict/dict0stats.cc b/storage/innobase/dict/dict0stats.cc
index d7466ae5f8a..f15a25f2010 100644
--- a/storage/innobase/dict/dict0stats.cc
+++ b/storage/innobase/dict/dict0stats.cc
@@ -3585,6 +3585,11 @@ dict_stats_update(
 			}
 
 			if (dict_stats_auto_recalc_is_enabled(table)) {
+#ifdef WITH_WSREP
+				if (wsrep_thd_skip_locking(current_thd)) {
+					goto transient;
+				}
+#endif
 				return(dict_stats_update(
 						table,
 						DICT_STATS_RECALC_PERSISTENT));
diff --git a/storage/innobase/row/row0ins.cc b/storage/innobase/row/row0ins.cc
index 6f228142cba..761b2adf9ba 100644
--- a/storage/innobase/row/row0ins.cc
+++ b/storage/innobase/row/row0ins.cc
@@ -3185,7 +3185,8 @@ row_ins_clust_index_entry(
 
 #ifdef WITH_WSREP
 	const bool skip_locking
-		= wsrep_thd_skip_locking(thr_get_trx(thr)->mysql_thd);
+		= thr_get_trx(thr)->is_wsrep()
+		&& wsrep_thd_skip_locking(thr_get_trx(thr)->mysql_thd);
 	ulint	flags = index->table->no_rollback() ? BTR_NO_ROLLBACK
 		: (index->table->is_temporary() || skip_locking)
 		? BTR_NO_LOCKING_FLAG : 0;

The second hunk is only there in order to avoid a function call overhead in a rather common scenario that one is running MariaDB Server without Galera replication.


Generated at Thu Feb 08 09:45:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.