Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26516

WSREP: Record locking is disabled in this thread, but the table being modified is not `mysql/wsrep_streaming_log`: `mysql/innodb_table_stats`

    XMLWordPrintable

    Details

      Description

      After I pushed a follow-up to MDEV-4750 to 10.6 to no longer globally set innodb_stats_persistent=OFF in all tests, we got a test failure:

      10.6 241e2ba642590e191359466911e7d24427f1993c

      galera.galera_var_cluster_address 'innodb' w2 [ fail ]
              Test ended at 2021-08-31 18:25:33
       
      CURRENT_TEST: galera.galera_var_cluster_address
      2021-08-31 18:25:15 148 [Note] WSREP: Server status change joiner -> joined
      2021-08-31 18:25:15 148 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      2021-08-31 18:25:15 150 [Note] WSREP: Recovered cluster id d8111f9f-0a87-11ec-b553-db00c63c8236
      2021-08-31 18:25:15 150 [ERROR] WSREP: Record locking is disabled in this thread, but the table being modified is not `mysql/wsrep_streaming_log`: `mysql/innodb_table_stats`.
      2021-08-31 18:25:15 0x7fe093c1d700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.6.5/storage/innobase/row/row0ins.cc line 3199
      

      The cause of this appears to be that on opening the table, innodb_stats_auto_recalc may kick in while Galera is not expecting this.

      I was unable to reproduce this crash locally by running the following:

      ./mtr --no-reorder galera.galera_set_position_after_dummy_writeset galera.galera_sp_bf_abort galera.galera_sp_insert_parallel galera.galera_split_brain galera.galera_sql_log_bin_zero galera.galera_status_cluster galera.galera_status_local_index galera.galera_status_local_state galera.galera_strict_require_innodb galera.galera_strict_require_primary_key galera.galera_suspend_slave galera.galera_sync_wait_show galera.galera_toi_alter_auto_increment galera.galera_toi_ddl_locking galera.galera_toi_ddl_sequential galera.galera_toi_drop_database galera.galera_toi_ftwrl galera.galera_toi_lock_exclusive galera.galera_toi_lock_shared galera.galera_transaction_read_only galera.galera_transaction_replay galera.galera_truncate galera.galera_truncate_temporary galera.galera_unicode_identifiers galera.galera_unicode_pk galera.galera_update_limit galera.galera_var_OSU_method galera.galera_var_OSU_method2 galera.galera_var_auto_inc_control_off galera.galera_var_certify_nonPK_off galera.galera_var_cluster_address
      

      After the test galera.galera_truncate, the execution of the tests would seem to hang.

      As far as I can tell, the table wsrep_streaming_log along with streaming replication was introduced in Galera 4.

      I would suggest the following change, and I think that it needs to be applied in 10.4 already.

      diff --git a/storage/innobase/dict/dict0stats.cc b/storage/innobase/dict/dict0stats.cc
      index d7466ae5f8a..f15a25f2010 100644
      --- a/storage/innobase/dict/dict0stats.cc
      +++ b/storage/innobase/dict/dict0stats.cc
      @@ -3585,6 +3585,11 @@ dict_stats_update(
       			}
       
       			if (dict_stats_auto_recalc_is_enabled(table)) {
      +#ifdef WITH_WSREP
      +				if (wsrep_thd_skip_locking(current_thd)) {
      +					goto transient;
      +				}
      +#endif
       				return(dict_stats_update(
       						table,
       						DICT_STATS_RECALC_PERSISTENT));
      diff --git a/storage/innobase/row/row0ins.cc b/storage/innobase/row/row0ins.cc
      index 6f228142cba..761b2adf9ba 100644
      --- a/storage/innobase/row/row0ins.cc
      +++ b/storage/innobase/row/row0ins.cc
      @@ -3185,7 +3185,8 @@ row_ins_clust_index_entry(
       
       #ifdef WITH_WSREP
       	const bool skip_locking
      -		= wsrep_thd_skip_locking(thr_get_trx(thr)->mysql_thd);
      +		= thr_get_trx(thr)->is_wsrep()
      +		&& wsrep_thd_skip_locking(thr_get_trx(thr)->mysql_thd);
       	ulint	flags = index->table->no_rollback() ? BTR_NO_ROLLBACK
       		: (index->table->is_temporary() || skip_locking)
       		? BTR_NO_LOCKING_FLAG : 0;
      

      The second hunk is only there in order to avoid a function call overhead in a rather common scenario that one is running MariaDB Server without Galera replication.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              sysprg Julius Goryavsky
              Reporter:
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration