This is most likely caused by MDEV-14511.
Actually the messages are stretching the truth a bit, because any lock wait would immediately be handled as lock wait timeout.
MDEV-14511 tries to avoid some consistency problems related to InnoDB persistent statistics. The persistent statistics are being written by an InnoDB internal SQL interpreter that requires the InnoDB data dictionary cache to be locked.
Before the change, the statistics were written during DDL in separate transactions, which could unnecessarily reduce performance (each commit would require a redo log flush) and break atomicity, because the statistics would be updated separately from the dictionary transaction.
However, because it is unacceptable to hold the InnoDB data dictionary cache locked while suspending the execution for waiting for a transactional lock (in the mysql.innodb_index_stats or mysql.innodb_table_stats tables) to be released, any lock conflict will immediately be reported as "lock wait timeout".
As far as I understand, such failures to update statistics did occur earlier as well, but less frequently, because we would really suspend the thread while holding the dictionary lock. It is not the end of the world if the statistics are missing or a bit off.
Recently, I noticed a change in MySQL 5.7.11 (and MariaDB 10.2) that will purposely drop the statistics in a certain case of ALTER TABLE, and rely on a subsequent ha_innobase::open() to update them. It could actually make sense to do this on all ALTER TABLE operations. If we did this, then we would also rename tables in the statistics tables less often (only on RENAME TABLE), and would be less prone to get these errors. I do not think that it makes sense to ever update or rename statistics for the #sql table names.
I think that the likelihood of the problem is increased by using partitioning or by executing multiple ALTER TABLE statements in parallel.
Apparently, the errors can also occur due to innodb_stats_auto_recalc or the table attribute STATS_AUTO_RECALC=ON:
2018-01-17 7:40:52 8 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest2`: Lock wait timeout
Inserting 100000 records into 'sbtest2'
2018-01-17 7:41:43 13 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest11`: Lock wait timeout
2018-01-17 7:42:34 14 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest3`: Lock wait timeout
Inserting 100000 records into 'sbtest3'
Marko Mäkelä
added a comment - This is most likely caused by MDEV-14511 .
Actually the messages are stretching the truth a bit, because any lock wait would immediately be handled as lock wait timeout.
MDEV-14511 tries to avoid some consistency problems related to InnoDB persistent statistics. The persistent statistics are being written by an InnoDB internal SQL interpreter that requires the InnoDB data dictionary cache to be locked.
Before the change, the statistics were written during DDL in separate transactions, which could unnecessarily reduce performance (each commit would require a redo log flush) and break atomicity, because the statistics would be updated separately from the dictionary transaction.
However, because it is unacceptable to hold the InnoDB data dictionary cache locked while suspending the execution for waiting for a transactional lock (in the mysql.innodb_index_stats or mysql.innodb_table_stats tables) to be released, any lock conflict will immediately be reported as "lock wait timeout".
As far as I understand, such failures to update statistics did occur earlier as well, but less frequently, because we would really suspend the thread while holding the dictionary lock. It is not the end of the world if the statistics are missing or a bit off.
Recently, I noticed a change in MySQL 5.7.11 (and MariaDB 10.2) that will purposely drop the statistics in a certain case of ALTER TABLE , and rely on a subsequent ha_innobase::open() to update them. It could actually make sense to do this on all ALTER TABLE operations. If we did this, then we would also rename tables in the statistics tables less often (only on RENAME TABLE ), and would be less prone to get these errors. I do not think that it makes sense to ever update or rename statistics for the #sql table names.
I think that the likelihood of the problem is increased by using partitioning or by executing multiple ALTER TABLE statements in parallel.
Apparently, the errors can also occur due to innodb_stats_auto_recalc or the table attribute STATS_AUTO_RECALC=ON :
2018-01-17 7:40:52 8 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest2`: Lock wait timeout
Inserting 100000 records into 'sbtest2'
2018-01-17 7:41:43 13 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest11`: Lock wait timeout
2018-01-17 7:42:34 14 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest3`: Lock wait timeout
Inserting 100000 records into 'sbtest3'
This is expected to be fixed in next 10.2 release.
As a temporary workaround one can set innodb_stats_persistent=0 in the my.cnf configuration file to go around this issue.
Michael Widenius
added a comment - This is expected to be fixed in next 10.2 release.
As a temporary workaround one can set innodb_stats_persistent=0 in the my.cnf configuration file to go around this issue.
I pushed a patch to bb-10.2-marko for testing.
I hope that elenst can torture it especially with multiple inserts in parallel to different tables, and also partitioned tables, FOREIGN KEY operations and FULLTEXT INDEX.
The fix makes RENAME TABLE (also at the end of ALTER TABLE…ALGORITHM=COPY) start by acquiring an exclusive InnoDB table lock. This table lock will not only prevent a conflict with statistics updates happening on the background, but it will also make these operations properly deal with any FOREIGN KEY checks from concurrently active transactions.
On the statistics side, internally both analyzing the statistics and writing the statistics will acquire an shared-intention InnoDB table lock. This will conflict with the exclusive table lock from RENAME TABLE. Because we are locking the objects upfront, a locking conflict on the records in mysql.innodb_index_stats and mysql.innodb_table_stats is much smaller. It can still happen, especially on the #sql names used by ALTER TABLE. I saw it occasionally happen when converting a table to partitioned: ALTER TABLE t1 PARTITION BY HASH(pk). It is also possible to get these conflicts when users modify the tables directly.
This fix also removes some sleep-and-retry logic from dict_stats_rename_table() that I felt was useless.
The impact of these errors should be that some statistics entries may fail to be deleted or renamed on DROP TABLE or RENAME TABLE. The statistics should be automatically recalculated on the background, and they can also be recalculated by executing the ANALYZE TABLE statement.
Marko Mäkelä
added a comment - I pushed a patch to bb-10.2-marko for testing.
I hope that elenst can torture it especially with multiple inserts in parallel to different tables, and also partitioned tables, FOREIGN KEY operations and FULLTEXT INDEX .
The fix makes RENAME TABLE (also at the end of ALTER TABLE…ALGORITHM=COPY ) start by acquiring an exclusive InnoDB table lock. This table lock will not only prevent a conflict with statistics updates happening on the background, but it will also make these operations properly deal with any FOREIGN KEY checks from concurrently active transactions.
On the statistics side, internally both analyzing the statistics and writing the statistics will acquire an shared-intention InnoDB table lock. This will conflict with the exclusive table lock from RENAME TABLE . Because we are locking the objects upfront, a locking conflict on the records in mysql.innodb_index_stats and mysql.innodb_table_stats is much smaller. It can still happen, especially on the #sql names used by ALTER TABLE . I saw it occasionally happen when converting a table to partitioned: ALTER TABLE t1 PARTITION BY HASH(pk) . It is also possible to get these conflicts when users modify the tables directly.
This fix also removes some sleep-and-retry logic from dict_stats_rename_table() that I felt was useless.
The impact of these errors should be that some statistics entries may fail to be deleted or renamed on DROP TABLE or RENAME TABLE . The statistics should be automatically recalculated on the background, and they can also be recalculated by executing the ANALYZE TABLE statement.
While testing a fix for the test case, I noticed a bigger problem: Sometimes, we would enter a lock wait for inserting a record into mysql.innodb_index_stats or mysql.innodb_table_stats while holding dict_sys->mutex. This would block much other activity on the server until the lock wait is resolved, often by innodb_lock_wait_timeout, which is 50 seconds by default.
I am afraid that fixing all the remaining problems would result in a too big change in a GA release. It seems that I will have to revert most of MDEV-14511, and try to come up with a revised solution for MDEV-13201.
Marko Mäkelä
added a comment - While testing a fix for the test case, I noticed a bigger problem: Sometimes, we would enter a lock wait for inserting a record into mysql.innodb_index_stats or mysql.innodb_table_stats while holding dict_sys->mutex . This would block much other activity on the server until the lock wait is resolved, often by innodb_lock_wait_timeout , which is 50 seconds by default.
I am afraid that fixing all the remaining problems would result in a too big change in a GA release. It seems that I will have to revert most of MDEV-14511 , and try to come up with a revised solution for MDEV-13201 .
I attempted to reduce the amount lock conflicts by acquiring transactional locks upfront on the user tables in both the statistics and DDL operations, but it would still not entirely prevent lock conflicts on the mysql.innodb_index_stats and mysql.innodb_table_stats tables.
Fixing the remaining problems would require a change that is too intrusive for a GA release series, such as MariaDB 10.2.
Therefore, I chose to revert the change MDEV-14511.
Marko Mäkelä
added a comment - I attempted to reduce the amount lock conflicts by acquiring transactional locks upfront on the user tables in both the statistics and DDL operations, but it would still not entirely prevent lock conflicts on the mysql.innodb_index_stats and mysql.innodb_table_stats tables.
Fixing the remaining problems would require a change that is too intrusive for a GA release series, such as MariaDB 10.2.
Therefore, I chose to revert the change MDEV-14511 .
For future reference, I attached my patches that are based on MariaDB 10.2. 0004-Follow-up-to-MDEV-14941.patch has a known problem (see the commit comment). This set of patches has the problem that transactional lock waits can occur while the InnoDB data dictionary cache is locked.
I believe that to properly fix this, we would have to redesign the storage of InnoDB statistics (MDEV-15020).
Marko Mäkelä
added a comment - For future reference, I attached my patches that are based on MariaDB 10.2 .
0004-Follow-up-to-MDEV-14941.patch has a known problem (see the commit comment). This set of patches has the problem that transactional lock waits can occur while the InnoDB data dictionary cache is locked.
I believe that to properly fix this, we would have to redesign the storage of InnoDB statistics ( MDEV-15020 ).
Could maybe have been caused by the fix for
MDEV-14511in 10.2.12?