[MDEV-15020] Server hangs due to InnoDB persistent statistics or innodb_stats_auto_recalc Created: 2018-01-22  Updated: 2023-08-22  Resolved: 2021-08-31

Status: Closed
Project: MariaDB Server
Component/s: Optimizer, Storage Engine - InnoDB
Affects Version/s: None
Fix Version/s: 10.6.5

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 6
Labels: hang, performance, replication, statistics, upstream

Issue Links:
Blocks
blocks MDEV-11633 Make the InnoDB system tablespace opt... Open
blocks MDEV-11655 Transactional data dictionary Open
blocks MDEV-24579 Error table->get_ref_count() after up... Closed
blocks MDEV-29883 Deadlock between InnoDB statistics up... Closed
Duplicate
is duplicated by MDEV-24632 MariaDB 10.3.27 crash on windows Closed
Relates
relates to MDEV-19556 Support native storage engine samplin... Open
relates to MDEV-23670 Crash during OPTIMIZE TABLE mysql.inn... Closed
relates to MDEV-29883 Deadlock between InnoDB statistics up... Closed
relates to MDEV-31979 Assertion `!internal' failed in void ... Open
relates to MDEV-11658 Simpler, faster IMPORT of InnoDB tables Open
relates to MDEV-13360 too long values in mysql.innodb_table... Closed
relates to MDEV-14583 innodb.innodb-index-online failed in ... Closed
relates to MDEV-14941 Timeouts on persistent statistics tab... Closed
relates to MDEV-18381 Server crashes in ha_innobase::store_... Closed
relates to MDEV-30707 MariaDB can'T be stopped Closed

 Description   

InnoDB in MySQL 5.6 introduced persistent statistics, to help the optimizer choose query plans. The statistics are stored in two tables: mysql.innodb_index_stats and mysql.innodb_table_stats, using the InnoDB-internal SQL parser while holding an exclusive lock on the InnoDB data dictionary cache. This architecture involves a number of workarounds to locking problems. It could also break replication when users modify the statistics tables directly from SQL, because the internal transactions to modify the statistics tables are not replicated.

The natural place to store the statistics is within the data file. If there is a real need to tweak the statistics other than by running ANALYZE TABLE, then an external tool could be written for that.



 Comments   
Comment by Marko Mäkelä [ 2018-01-22 ]

Should we decide to keep using transactions for updating the statistics, there are some work-in-progress patches attached to MDEV-14941.
However, I think that it could make more sense to use only mini-transaction logging.

The .ibd file format change should also involve making the .ibd files self-contained with respect to metadata that is not stored in the SQL layer (a prerequisite for MDEV-11655). That is, we must store inside the .ibd file the root page number of each secondary index (and the location of the fulltext index files).

Comment by Marko Mäkelä [ 2018-10-19 ]

MDEV-15562 will introduce a ‘metadata BLOB’ that is anchored to a hidden record in the clustered index.
We could extend this BLOB so that it would store secondary index root page numbers, the column numbers that are being indexed, and the index statistics.

Comment by Marko Mäkelä [ 2020-01-28 ]

MySQL Bug #83079, which was reported by the former InnoDB QA engineer, indicates that there could be an actual locking problem with the InnoDB persistent statistics.

It seems that the report was mistakenly closed as a duplicate of the bug that was introduced by SX-locks in MySQL 5.7 (which we fixed in MDEV-14637, as noted in MDEV-16850).

The original report states:

This can go on for a few hours and then may end up with an abort on

InnoDB: Error: semaphore wait has lasted > 600 seconds

or suddenly dissipate.
Either way performance is degraded.

Happens for 5.6 and 5.7 alike.

If it were a genuine duplicate of the SX-lock related bugs, then the server should always hang, and it should not affect MySQL version 5.6.

Note: Already MySQL 5.6 enabled both innodb_stats_persistent and innodb_stats_auto_recalc by default.

Comment by Marko Mäkelä [ 2020-04-22 ]

A reasonable course of action, which would address the problems related to the use of the InnoDB internal SQL parser, would be to rewrite the interface to the persistent tables that store the InnoDB statistics. We should try to update the code so that the handler API (or even the MariaDB SQL parser) will be used for accessing the tables. This would also allow the statistics tables to be created in any storage engine.

Comment by Marko Mäkelä [ 2020-05-29 ]

psergey, can you please implement an interface that InnoDB can invoke to read or write mysql.innodb_table_stats and mysql.innodb_index_stats? We would need that so that the problematic interface to the InnoDB internal SQL parser can be removed. Such a change should be technically possible in GA releases, because we would not be changing any file format.

Comment by Sergei Petrunia [ 2020-07-17 ]

implement an interface that InnoDB can invoke to read or write mysql.innodb_table_stats and mysql.innodb_index_stats?

What exactly is wrong with class handler? It probably has a lot of extra stuff. Do I factor index navigation functions out of class handler? I think I don't understand the request. We should schedule a meeting to discuss it.

Comment by Marko Mäkelä [ 2020-09-17 ]

psergey, I need some sample code that would perform the equivalent of SELECT, INSERT, UPDATE, DELETE on those two tables, say, in the beginning of the ha_innobase::write_row() function. (For obvious reasons, that code should be skipped when writing to either of the tables.)

In other words, the above requested code patch to ha_innobase::write_row() would implement a trigger that would be executed on any INSERT into any InnoDB table.

Once I have such proof of concept code, I can adapt and move it to the appropriate place in the InnoDB code base.

Comment by Jan Lindström (Inactive) [ 2020-09-18 ]

marko If transactions (InnoDB) are used this should be designed so that locking of rows/columns do not increase lock waits especially on Galera multi-master case. We could have multiple concurrent Galera appliers executing changes to same table possible causing changes to persistent statistics for same table or same index. Consider for a example two transactions deleting huge amount of rows from a table so that these deletes do not conflict each other. However, their change on table/index statistics could access same row causing a lock wait. Similarly, it is not desired feature that local table/index statistic change is rolled-back because it is holding locks that conflict with applier thread changes (brute force). For Galera it would be better if only one master will update table/index statistics and update on other nodes is handled by Galera replication, this naturally requires these tables to use InnoDB storage engine and some code changes so that applier nodes do not update persistent statistics during normal table operations.

Comment by Marko Mäkelä [ 2020-09-23 ]

jplindst, with this change, the tables could be created in any storage engine (such as Aria or MyISAM). I suppose that moving to other storage engines (which might use table-level locking) might increase the possibility of lock waits. If I remember correctly, in the existing implementation there are some tweaks for disabling Galera replication. The final code can surely be reviewed by the Galera team.

Comment by Sergei Petrunia [ 2021-02-16 ]

Use cases to handle

Let's list the usage scenarios of stat tables:

  • Use1: a select or DML statement starts. It needs to read statistics for the tables it is using.
  • Use2: manual ANALYZE TABLE is performed. It needs to update the statistics data.
  • Use3: a background statistics update is triggered. It collects the new statistics and so needs to update it.
  • Use4: SQL-level usage of stat tables. A user runs a statement that does a read, DML, or even a DDL operation on the statistic table.
  • Use5: a DDL operation on a table causes the statistics to be either updated or removed.

How EITS does it

  • There is code to handle Use1 and Use2.
  • There is no Use3.
  • Use4 has some limitations, e.g.
    • "ERROR 1428 (HY000): You can't combine write-locking of system tables with other tables or lock types", MDEV-7363

Use1: How EITS reads the statistics

TODO.
(As far as I understand, opening stat tables is "hooked" into the process of opening regular table. There is code at SQL layer that injects EITS tables into the list of tables that the statement is opening... This way, opening and locking is done in one shot. It might be hard to repeat this inside innodb?)
TODO^ verify this.

Use2: How EITS updates statistics

Use2 is done as follows:

1. Statistics is collected.
2. sql_statistics.cc: update_statistics_for_table() saves the data.

  start_new_trans new_trans(thd);
  open_stat_tables()
  ...
  stat_file->ha_update_row(record[1], record[0]);
  ...
  thd->commit_whole_transaction_and_close_tables();
  new_trans.restore_old_transaction();

note that the above is run while holding a lock on the table being analyzed.
(Does this cause a conflict for MySQL table locking mechanism?)

Updating statistics inside InnoDB.

Currently InnoDB is using its own internal SQL interpreter to perform statistics updates.
The interpreter calls InnoDB internal APIs to perform operations on statistics data.

The new implementation is expected to use SQL-level objects for doing this. Like get an MDL lock, call open_table (or similar) to get TABLE object and a ha_innobase object. Is this right?

Comment by Marko Mäkelä [ 2021-08-31 ]

Related to MDEV-25919, I refactored the InnoDB code that accesses the persistent statistics tables. We still invoke the InnoDB internal parser, but I believe that the hangs have now been fixed. We will acquire MDL and InnoDB table locks on the statistics tables, as well as MDL on the user table, before locking the dict_sys cache.

Comment by Marko Mäkelä [ 2022-04-08 ]

Side note: I see that MySQL Bug #83079 was incorrectly closed as fixed in MySQL 5.7.23, most likely mistaking it to be a duplicate of a regression that was introduced in MySQL 5.7 (the fix of Oracle Bug #29390736 was the basis of the MariaDB fix of MDEV-14637).

The root cause of these hangs is older (present since the introduction of InnoDB persistent statistics in MySQL 5.6 and MariaDB 10.0). They were fixed in MariaDB 10.6 by MDEV-25919.

Comment by Marko Mäkelä [ 2022-10-26 ]

Some hangs related to all types of InnoDB statistics (including the ‘transient’ ones that STATS_PERSISTENT replaced) were possible due to MDEV-29883.

Generated at Thu Feb 08 08:18:05 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.