[MDEV-35000] dict_table_close() breaks innodb_stats_auto_recalc - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.11, 11.4
Fix Version/s: 10.11.12, 11.4.6, 11.8.2
Component/s: Storage Engine - InnoDB
Labels:
- performance

Description

In an email exchange with maxk, the following performance bottleneck was mentioned:

/** Decrement the count of open handles */

void dict_table_close(dict_table_t *table)

  if (table->get_ref_count() == 1 &&

      dict_stats_is_persistent_enabled(table) &&

      strchr(table->name.m_name, '/'))

    /* It looks like we are closing the last handle. The user could

    have executed FLUSH TABLES in order to have the statistics reloaded

    from the InnoDB persistent statistics tables. We must acquire

    exclusive dict_sys.latch to prevent a race condition with another

    thread concurrently acquiring a handle on the table. */

    dict_sys.lock(SRW_LOCK_CALL);

    if (table->release())

      table->stats_mutex_lock();

      if (table->get_ref_count() == 0)

        dict_stats_deinit(table);

      table->stats_mutex_unlock();

    dict_sys.unlock();

  else

    table->release();

The intention of the exclusive dict_sys.latch is to prevent concurrent ha_innobase::open() from accessing a table whose statistics have been deinitialized when the last handle is being closed.

I am thinking that we should replace this obscure logic that is intended to react on FLUSH TABLES with an additional function that would only be executed as part of FLUSH TABLES right after all table handles have been closed. That function could then acquire exclusive dict_sys.latch and invalidate the cached InnoDB persistent statistics for all tables whose reference count is 0. The intention here is to allow users to force a re-read of mysql.innodb_index_stats and mysql.innodb_tables_stats after they were modified by the DBA.

Attachments

Issue Links

blocks

MDEV-34986 Use RAII classes to manage locks

Stalled

MDEV-36069 Remove dict_table_t::n_ref_count; implement event based table eviction

Open

causes

MDEV-36227 Race condition between ALTER TABLE…EXCHANGE PARTITION and SELECT

Closed

MDEV-36257 mariadbd: 10.11-MDEV-29445G/storage/innobase/handler/ha_innodb.cc:14921: int ha_innobase::info_low(uint, bool): Assertion `ib_table->stat_initialized()' failed.

Closed

MDEV-36373 "[Warning] InnoDB: Recalculation of persistent statistics requested for table ... but the required persistent statistics storage is corrupted", when using tx_read_only

Closed

relates to

MDEV-26599 Assertion `!table->id' failed in btr_search_lazy_free

Open

MDEV-36122 Assertion failure ctx0->old_table->get_ref_count() == 1 in ha_innobase::commit_inplace_alter_table()

Closed

MDEV-34999 ha_innobase::open() should not acquire dict_sys.latch twice

Open

MDEV-36207 Document FLUSH TABLES change in statistics behaviour

Closed

(4 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2024-11-27 14:48

As part of this, I plan to clean up also some data types related to statistics. Because the page number is stored in 32 bits in data files, the size of an index or table in pages must fit in uint32_t.

Marko Mäkelä added a comment - 2024-11-27 14:48 As part of this, I plan to clean up also some data types related to statistics. Because the page number is stored in 32 bits in data files, the size of an index or table in pages must fit in uint32_t .

Marko Mäkelä added a comment - 2025-01-21 13:17

svoj’s idea of overriding Handler_share::~Handler_share() works, but we will need a way to invalidate the table object in the derived class InnoDB_share when it is being evicted from the cache. How to do that, I don’t currently know. Perhaps, have a pointer in dict_table_t, and a dict_table_t* inside InnoDB_share that we’d either set nullptr or update to another table object that will replace the old one on DDL?

In the current state of the code, we can observe a massive amount of heap-use-after-free failures in certain tests, such as innodb.alter_not_null.

Marko Mäkelä added a comment - 2025-01-21 13:17 svoj ’s idea of overriding Handler_share::~Handler_share() works, but we will need a way to invalidate the table object in the derived class InnoDB_share when it is being evicted from the cache. How to do that, I don’t currently know. Perhaps, have a pointer in dict_table_t , and a dict_table_t* inside InnoDB_share that we’d either set nullptr or update to another table object that will replace the old one on DDL? In the current state of the code, we can observe a massive amount of heap-use-after-free failures in certain tests, such as innodb.alter_not_null .

Marko Mäkelä added a comment - 2025-01-24 09:46

The interface handler::get_ha_share_ptr() does seem to be inadequate for partitioned tables, because they are attached to handler::table_share. There is ha_partition::set_ha_share_ref(), and Partition_share::partitions_share_refs does seem to keep track of any underlying Handler_share that could be assigned by invoking handler::set_ha_share_ptr(). We would seem to need a refined interface so that InnoDB can access its derived Handler_share object inside operations that would rebuild a partition such as ha_innobase::truncate() in ALTER TABLE…TRUNCATE PARTITION. The handler::get_ha_share_ptr() would return a Partition_share in this case.

Marko Mäkelä added a comment - 2025-01-24 09:46 The interface handler::get_ha_share_ptr() does seem to be inadequate for partitioned tables, because they are attached to handler::table_share . There is ha_partition::set_ha_share_ref() , and Partition_share::partitions_share_refs does seem to keep track of any underlying Handler_share that could be assigned by invoking handler::set_ha_share_ptr() . We would seem to need a refined interface so that InnoDB can access its derived Handler_share object inside operations that would rebuild a partition such as ha_innobase::truncate() in ALTER TABLE…TRUNCATE PARTITION . The handler::get_ha_share_ptr() would return a Partition_share in this case.

Marko Mäkelä added a comment - 2025-01-24 11:26

Another problem that I am hitting seems to share a root cause with MDEV-17805: The error handling of some DDL operation would invoke ha_innobase::delete_table() on a different object than the one that ha_innobase::create() was invoked with. That different object would not have any associated table_share, so we’d miss a call to Handler_share::~Handler_share() via handler::ha_share. In my Handler_share derived object, I would acquire a dict_table_t reference in order to prevent a premature eviction of the table from the InnoDB dict_sys cache. The dangling table reference would cause some DDL operations to fail, because tables would not be properly dropped. I tried to work around this by the following kind of a condition:

@@ -13523,7 +13511,8 @@ ha_innobase::create(const char *name, TABLE *form, HA_CREATE_INFO *create_info,

         if (!info.table()->is_temporary())

           log_write_up_to(trx->commit_lsn, true);

-          if (info.table()->stats_is_persistent())

+          if (info.table()->stats_is_persistent() &&

+              table_share->tmp_table == NO_TMP_TABLE)

             innodb_share_register(*info.table());

         info.table()->release();

Unfortunately, during a plain CREATE TABLE this function turns out to be invoked with an initial table_share object that was created by init_tmp_table_share(), with table_share->tmp_table == INTERNAL_TMP_TABLE. Because of that, we’d fail to invalidate the InnoDB table statistics on a subsequent FLUSH TABLES, and the test innodb.innodb_stats_auto_recalc_on_nonexistent would return a wrong result.

Marko Mäkelä added a comment - 2025-01-24 11:26 Another problem that I am hitting seems to share a root cause with MDEV-17805 : The error handling of some DDL operation would invoke ha_innobase::delete_table() on a different object than the one that ha_innobase::create() was invoked with. That different object would not have any associated table_share , so we’d miss a call to Handler_share::~Handler_share() via handler::ha_share . In my Handler_share derived object, I would acquire a dict_table_t reference in order to prevent a premature eviction of the table from the InnoDB dict_sys cache. The dangling table reference would cause some DDL operations to fail, because tables would not be properly dropped. I tried to work around this by the following kind of a condition: @@ -13523,7 +13511,8 @@ ha_innobase::create(const char *name, TABLE *form, HA_CREATE_INFO *create_info, if (!info.table()->is_temporary()) { log_write_up_to(trx->commit_lsn, true); - if (info.table()->stats_is_persistent()) + if (info.table()->stats_is_persistent() && + table_share->tmp_table == NO_TMP_TABLE) innodb_share_register(*info.table()); } info.table()->release(); Unfortunately, during a plain CREATE TABLE this function turns out to be invoked with an initial table_share object that was created by init_tmp_table_share() , with table_share->tmp_table == INTERNAL_TMP_TABLE . Because of that, we’d fail to invalidate the InnoDB table statistics on a subsequent FLUSH TABLES , and the test innodb.innodb_stats_auto_recalc_on_nonexistent would return a wrong result.

Marko Mäkelä added a comment - 2025-01-24 14:27

It seems that the above code in ha_innobase::create() only affects the test innodb.innodb_stats_auto_recalc_on_nonexistent and nothing else. The main open question is how to fix this for partitioned tables. We currently do not have any test that would cover reloading the statistics for partitioned tables.

Marko Mäkelä added a comment - 2025-01-24 14:27 It seems that the above code in ha_innobase::create() only affects the test innodb.innodb_stats_auto_recalc_on_nonexistent and nothing else. The main open question is how to fix this for partitioned tables. We currently do not have any test that would cover reloading the statistics for partitioned tables.

Marko Mäkelä added a comment - 2025-02-14 12:28

A support customer reported that for an InnoDB table, statistics were never recalculated for more than 2 years. A plausible explanation is the logic in dict_table_close() that tries to ensure that InnoDB statistics will be reloaded after the statistics tables have been modified using statements like INSERT, UPDATE, or DELETE. The idea was that when the table is reopened after FLUSH TABLE, the statistics would be reloaded.

It is very unlikely that anyone would actually need such logic. A more likely course of action would be to execute ANALYZE TABLE.

The logic in dict_table_close() is harmful in that it will effectively cause frequent resets of the counter dict_table_t::stat_modified_counter and therefore prevent the automatic recalculation of InnoDB persistent statistics, which is enabled by the configuration parameter innodb_stats_auto_recalc or the table attribute STATS_AUTO_RECALC=1.

Marko Mäkelä added a comment - 2025-02-14 12:28 A support customer reported that for an InnoDB table, statistics were never recalculated for more than 2 years. A plausible explanation is the logic in dict_table_close() that tries to ensure that InnoDB statistics will be reloaded after the statistics tables have been modified using statements like INSERT , UPDATE , or DELETE . The idea was that when the table is reopened after FLUSH TABLE , the statistics would be reloaded. It is very unlikely that anyone would actually need such logic. A more likely course of action would be to execute ANALYZE TABLE . The logic in dict_table_close() is harmful in that it will effectively cause frequent resets of the counter dict_table_t::stat_modified_counter and therefore prevent the automatic recalculation of InnoDB persistent statistics, which is enabled by the configuration parameter innodb_stats_auto_recalc or the table attribute STATS_AUTO_RECALC=1 .

Marko Mäkelä added a comment - 2025-02-27 13:20

serg pointed out that before this change, executing the following would cause InnoDB persistent statistics to be reloaded:

RENAME TABLE t TO tmp, tmp TO t;

I revised the change to allow that to work. Only after FLUSH TABLES, InnoDB will cease to reload the statistics.

I removed 10.6 from the target versions and rebased the fix to 10.11 in order to conform to a recent policy change.

Marko Mäkelä added a comment - 2025-02-27 13:20 serg pointed out that before this change, executing the following would cause InnoDB persistent statistics to be reloaded: RENAME TABLE t TO tmp, tmp TO t; I revised the change to allow that to work. Only after FLUSH TABLES , InnoDB will cease to reload the statistics. I removed 10.6 from the target versions and rebased the fix to 10.11 in order to conform to a recent policy change.

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 3 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2024-09-24 09:24

Updated:: 2025-03-27 07:50

Resolved:: 2025-02-28 07:58

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server