[MDEV-14941] Timeouts on persistent statistics tables caused by MDEV-14511 Created: 2018-01-13 Updated: 2020-08-25 Resolved: 2018-01-22 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.2.12, 10.3.3 |
| Fix Version/s: | 10.2.13, 10.3.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Hartmut Holzgraefe | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 2 |
| Labels: | None | ||
| Environment: |
linux, ramfs |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Description |
|
When running tests with plenty of DDL with datadir on RamFS after upgrade from 10.2.11 to 10.2.12 the following errors occur repeatedly: [ERROR] InnoDB: Cannot save table statistics for table `some_dir`.`some_table`: Lock wait timeout then followed by long semaphore wait warnings on the innodb data dictionary mutex:
and finally lots of [ERROR] InnoDB: Cannot save table statistics for table `tmp_1e865f785ceff316`.`access_log`: Operation interrupted after mysqld shutdown was requested |
| Comments |
| Comment by Hartmut Holzgraefe [ 2018-01-13 ] | ||||||||||||||||||||||||||||||||||||
|
Could maybe have been caused by the fix for | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-17 ] | ||||||||||||||||||||||||||||||||||||
|
This is most likely caused by
However, because it is unacceptable to hold the InnoDB data dictionary cache locked while suspending the execution for waiting for a transactional lock (in the mysql.innodb_index_stats or mysql.innodb_table_stats tables) to be released, any lock conflict will immediately be reported as "lock wait timeout". As far as I understand, such failures to update statistics did occur earlier as well, but less frequently, because we would really suspend the thread while holding the dictionary lock. It is not the end of the world if the statistics are missing or a bit off. Recently, I noticed a change in MySQL 5.7.11 (and MariaDB 10.2) that will purposely drop the statistics in a certain case of ALTER TABLE, and rely on a subsequent ha_innobase::open() to update them. It could actually make sense to do this on all ALTER TABLE operations. If we did this, then we would also rename tables in the statistics tables less often (only on RENAME TABLE), and would be less prone to get these errors. I do not think that it makes sense to ever update or rename statistics for the #sql table names. I think that the likelihood of the problem is increased by using partitioning or by executing multiple ALTER TABLE statements in parallel. Apparently, the errors can also occur due to innodb_stats_auto_recalc or the table attribute STATS_AUTO_RECALC=ON:
| ||||||||||||||||||||||||||||||||||||
| Comment by Michael Widenius [ 2018-01-18 ] | ||||||||||||||||||||||||||||||||||||
|
This is expected to be fixed in next 10.2 release. As a temporary workaround one can set innodb_stats_persistent=0 in the my.cnf configuration file to go around this issue. | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-19 ] | ||||||||||||||||||||||||||||||||||||
|
I pushed a patch to bb-10.2-marko for testing. The fix makes RENAME TABLE (also at the end of ALTER TABLE…ALGORITHM=COPY) start by acquiring an exclusive InnoDB table lock. This table lock will not only prevent a conflict with statistics updates happening on the background, but it will also make these operations properly deal with any FOREIGN KEY checks from concurrently active transactions. On the statistics side, internally both analyzing the statistics and writing the statistics will acquire an shared-intention InnoDB table lock. This will conflict with the exclusive table lock from RENAME TABLE. Because we are locking the objects upfront, a locking conflict on the records in mysql.innodb_index_stats and mysql.innodb_table_stats is much smaller. It can still happen, especially on the #sql names used by ALTER TABLE. I saw it occasionally happen when converting a table to partitioned: ALTER TABLE t1 PARTITION BY HASH(pk). It is also possible to get these conflicts when users modify the tables directly. This fix also removes some sleep-and-retry logic from dict_stats_rename_table() that I felt was useless. The impact of these errors should be that some statistics entries may fail to be deleted or renamed on DROP TABLE or RENAME TABLE. The statistics should be automatically recalculated on the background, and they can also be recalculated by executing the ANALYZE TABLE statement. | ||||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2018-01-20 ] | ||||||||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-21 ] | ||||||||||||||||||||||||||||||||||||
|
While testing a fix for the test case, I noticed a bigger problem: Sometimes, we would enter a lock wait for inserting a record into mysql.innodb_index_stats or mysql.innodb_table_stats while holding dict_sys->mutex. This would block much other activity on the server until the lock wait is resolved, often by innodb_lock_wait_timeout, which is 50 seconds by default. I am afraid that fixing all the remaining problems would result in a too big change in a GA release. It seems that I will have to revert most of | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-22 ] | ||||||||||||||||||||||||||||||||||||
|
I attempted to reduce the amount lock conflicts by acquiring transactional locks upfront on the user tables in both the statistics and DDL operations, but it would still not entirely prevent lock conflicts on the mysql.innodb_index_stats and mysql.innodb_table_stats tables. Fixing the remaining problems would require a change that is too intrusive for a GA release series, such as MariaDB 10.2. Therefore, I chose to revert the change | ||||||||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-01-22 ] | ||||||||||||||||||||||||||||||||||||
|
For future reference, I attached my patches that are based on MariaDB 10.2. |