[MDEV-14941] Timeouts on persistent statistics tables caused by MDEV-14511 Created: 2018-01-13  Updated: 2020-08-25  Resolved: 2018-01-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.12, 10.3.3
Fix Version/s: 10.2.13, 10.3.5

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Marko Mäkelä
Resolution: Fixed Votes: 2
Labels: None
Environment:

linux, ramfs


Attachments: File 0001-Disable-the-unused-function-dict_stats_rename_index.patch     File 0002-Remove-code-related-to-InnoDB-native-partitioning-in.patch     File 0003-MDEV-14941-Timeouts-on-persistent-statistics-tables-.patch     File 0004-Follow-up-to-MDEV-14941.patch    
Issue Links:
Problem/Incident
is caused by MDEV-14511 Use fewer transactions for updating I... Closed
Relates
relates to MDEV-15020 Server hangs due to InnoDB persistent... Closed

 Description   

When running tests with plenty of DDL with datadir on RamFS after upgrade from 10.2.11 to 10.2.12 the following errors occur repeatedly:

[ERROR] InnoDB: Cannot save table statistics for table `some_dir`.`some_table`: Lock wait timeout

then followed by long semaphore wait warnings on the innodb data dictionary mutex:

Jan  5 11:54:11 bamber-02 mysqld: 2018-01-05 11:54:11 140564011710208 [Warning] InnoDB: A long semaphore wait:
Jan  5 11:54:11 bamber-02 mysqld: --Thread 140563682322176 has waited at ha_innodb.cc line 13396 for 241.00 seconds the semaphore:
Jan  5 11:54:11 bamber-02 mysqld: X-lock on RW-latch at 0x555673f891c0 created in file dict0dict.cc line 1107
Jan  5 11:54:11 bamber-02 mysqld: a writer (thread id 140563590076160) has reserved it in mode  exclusive
Jan  5 11:54:11 bamber-02 mysqld: number of readers 0, waiters flag 1, lock_word: 0
Jan  5 11:54:11 bamber-02 mysqld: Last time read locked in file not yet reserved line 0
Jan  5 11:54:11 bamber-02 mysqld: Last time write locked in file ha_innodb.cc line 13396
Jan  5 11:54:11 bamber-02 mysqld: 2018-01-05 11:54:11 140564011710208 [Warning] InnoDB: A long semaphore wait:

and finally lots of

[ERROR] InnoDB: Cannot save table statistics for table `tmp_1e865f785ceff316`.`access_log`: Operation interrupted

after mysqld shutdown was requested



 Comments   
Comment by Hartmut Holzgraefe [ 2018-01-13 ]

Could maybe have been caused by the fix for MDEV-14511 in 10.2.12?

Comment by Marko Mäkelä [ 2018-01-17 ]

This is most likely caused by MDEV-14511.
Actually the messages are stretching the truth a bit, because any lock wait would immediately be handled as lock wait timeout.

MDEV-14511 tries to avoid some consistency problems related to InnoDB persistent statistics. The persistent statistics are being written by an InnoDB internal SQL interpreter that requires the InnoDB data dictionary cache to be locked.
Before the change, the statistics were written during DDL in separate transactions, which could unnecessarily reduce performance (each commit would require a redo log flush) and break atomicity, because the statistics would be updated separately from the dictionary transaction.

However, because it is unacceptable to hold the InnoDB data dictionary cache locked while suspending the execution for waiting for a transactional lock (in the mysql.innodb_index_stats or mysql.innodb_table_stats tables) to be released, any lock conflict will immediately be reported as "lock wait timeout".

As far as I understand, such failures to update statistics did occur earlier as well, but less frequently, because we would really suspend the thread while holding the dictionary lock. It is not the end of the world if the statistics are missing or a bit off.

Recently, I noticed a change in MySQL 5.7.11 (and MariaDB 10.2) that will purposely drop the statistics in a certain case of ALTER TABLE, and rely on a subsequent ha_innobase::open() to update them. It could actually make sense to do this on all ALTER TABLE operations. If we did this, then we would also rename tables in the statistics tables less often (only on RENAME TABLE), and would be less prone to get these errors. I do not think that it makes sense to ever update or rename statistics for the #sql table names.

I think that the likelihood of the problem is increased by using partitioning or by executing multiple ALTER TABLE statements in parallel.

Apparently, the errors can also occur due to innodb_stats_auto_recalc or the table attribute STATS_AUTO_RECALC=ON:

2018-01-17  7:40:52 8 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest2`: Lock wait timeout
Inserting 100000 records into 'sbtest2'
2018-01-17  7:41:43 13 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest11`: Lock wait timeout
2018-01-17  7:42:34 14 [ERROR] InnoDB: Cannot save table statistics for table `test`.`sbtest3`: Lock wait timeout
Inserting 100000 records into 'sbtest3'

Comment by Michael Widenius [ 2018-01-18 ]

This is expected to be fixed in next 10.2 release.

As a temporary workaround one can set innodb_stats_persistent=0 in the my.cnf configuration file to go around this issue.

Comment by Marko Mäkelä [ 2018-01-19 ]

I pushed a patch to bb-10.2-marko for testing.
I hope that elenst can torture it especially with multiple inserts in parallel to different tables, and also partitioned tables, FOREIGN KEY operations and FULLTEXT INDEX.

The fix makes RENAME TABLE (also at the end of ALTER TABLE…ALGORITHM=COPY) start by acquiring an exclusive InnoDB table lock. This table lock will not only prevent a conflict with statistics updates happening on the background, but it will also make these operations properly deal with any FOREIGN KEY checks from concurrently active transactions.

On the statistics side, internally both analyzing the statistics and writing the statistics will acquire an shared-intention InnoDB table lock. This will conflict with the exclusive table lock from RENAME TABLE. Because we are locking the objects upfront, a locking conflict on the records in mysql.innodb_index_stats and mysql.innodb_table_stats is much smaller. It can still happen, especially on the #sql names used by ALTER TABLE. I saw it occasionally happen when converting a table to partitioned: ALTER TABLE t1 PARTITION BY HASH(pk). It is also possible to get these conflicts when users modify the tables directly.

This fix also removes some sleep-and-retry logic from dict_stats_rename_table() that I felt was useless.

The impact of these errors should be that some statistics entries may fail to be deleted or renamed on DROP TABLE or RENAME TABLE. The statistics should be automatically recalculated on the background, and they can also be recalculated by executing the ANALYZE TABLE statement.

Comment by Elena Stepanova [ 2018-01-20 ]

bb-10.2-marko df55eabe7316

mysqld: /data/src/bb-10.2-marko/storage/innobase/row/row0mysql.cc:4448: dberr_t row_rename_table_for_mysql(const char*, const char*, trx_t*, bool): Assertion `!(table->stats_bg_flag & (1 << 0))' failed.
180121  0:09:07 [ERROR] mysqld got signal 6 ;
 
#7  0x00007fbe1795dee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000055f4721d43f6 in row_rename_table_for_mysql (old_name=0x7fbe085a45a0 "test/t2", new_name=0x7fbe085a43a0 "test/#sql2-5e1e-9", trx=0x7fbe118cf508, commit=true) at /data/src/bb-10.2-marko/storage/innobase/row/row0mysql.cc:4448
#9  0x000055f4720b020d in innobase_rename_table (trx=0x7fbe118cf508, from=0x7fbe085a4cb0 "./test/t2", to=0x7fbe085a4ec0 "./test/#sql2-5e1e-9") at /data/src/bb-10.2-marko/storage/innobase/handler/ha_innodb.cc:13905
#10 0x000055f47209bf81 in ha_innobase::rename_table (this=0x7fbdb4015460, from=0x7fbe085a4cb0 "./test/t2", to=0x7fbe085a4ec0 "./test/#sql2-5e1e-9") at /data/src/bb-10.2-marko/storage/innobase/handler/ha_innodb.cc:13995
#11 0x000055f471d84445 in handler::ha_rename_table (this=0x7fbdb4015460, from=0x7fbe085a4cb0 "./test/t2", to=0x7fbe085a4ec0 "./test/#sql2-5e1e-9") at /data/src/bb-10.2-marko/sql/handler.cc:4317
#12 0x000055f471bd2695 in mysql_rename_table (base=0x55f473fd3fa0, old_db=0x7fbdb4012c00 "test", old_name=0x7fbdb40125b8 "t2", new_db=0x7fbdb4012c00 "test", new_name=0x7fbe085a5c30 "#sql2-5e1e-9", flags=2) at /data/src/bb-10.2-marko/sql/sql_table.cc:5304
#13 0x000055f471bdd5ba in mysql_alter_table (thd=0x7fbdb4000b00, new_db=0x7fbdb4012c00 "test", new_name=0x0, create_info=0x7fbe085a6e50, table_list=0x7fbdb40125f0, alter_info=0x7fbe085a6da0, order_num=0, order=0x0, ignore=false) at /data/src/bb-10.2-marko/sql/sql_table.cc:9553
#14 0x000055f471c565b8 in Sql_cmd_alter_table::execute (this=0x7fbdb4012e40, thd=0x7fbdb4000b00) at /data/src/bb-10.2-marko/sql/sql_alter.cc:324
#15 0x000055f471b121eb in mysql_execute_command (thd=0x7fbdb4000b00) at /data/src/bb-10.2-marko/sql/sql_parse.cc:6201
#16 0x000055f471b16aed in mysql_parse (thd=0x7fbdb4000b00, rawbuf=0x7fbdb40124e8 "ALTER TABLE t2 PARTITION BY KEY() PARTITIONS 2", length=46, parser_state=0x7fbe085a8200, is_com_multi=false, is_next_command=false) at /data/src/bb-10.2-marko/sql/sql_parse.cc:7891
#17 0x000055f471b04aad in dispatch_command (command=COM_QUERY, thd=0x7fbdb4000b00, packet=0x7fbdb4008941 "", packet_length=46, is_com_multi=false, is_next_command=false) at /data/src/bb-10.2-marko/sql/sql_parse.cc:1806
#18 0x000055f471b03410 in do_command (thd=0x7fbdb4000b00) at /data/src/bb-10.2-marko/sql/sql_parse.cc:1360
#19 0x000055f471c512aa in do_handle_one_connection (connect=0x55f474462110) at /data/src/bb-10.2-marko/sql/sql_connect.cc:1335
#20 0x000055f471c51037 in handle_one_connection (arg=0x55f474462110) at /data/src/bb-10.2-marko/sql/sql_connect.cc:1241
#21 0x000055f47207029e in pfs_spawn_thread (arg=0x55f474444b30) at /data/src/bb-10.2-marko/storage/perfschema/pfs.cc:1862
#22 0x00007fbe19634494 in start_thread (arg=0x7fbe085a9700) at pthread_create.c:333
#23 0x00007fbe17a1a93f in clone () from /lib/x86_64-linux-gnu/libc.so.6

--source include/have_innodb.inc
--source include/have_partition.inc
 
SET GLOBAL innodb_stats_persistent= ON;
 
CREATE TABLE t1 (i INT, KEY(i)) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1),(2),(3),(4);
 
CREATE TABLE t2 (pk INT PRIMARY KEY) ENGINE=InnoDB;
INSERT INTO t2 VALUES (1),(2),(3);
 
ALTER TABLE t2 PARTITION BY KEY() PARTITIONS 2;
 
# Cleanup
DROP TABLE t1, t2;

Comment by Marko Mäkelä [ 2018-01-21 ]

While testing a fix for the test case, I noticed a bigger problem: Sometimes, we would enter a lock wait for inserting a record into mysql.innodb_index_stats or mysql.innodb_table_stats while holding dict_sys->mutex. This would block much other activity on the server until the lock wait is resolved, often by innodb_lock_wait_timeout, which is 50 seconds by default.

I am afraid that fixing all the remaining problems would result in a too big change in a GA release. It seems that I will have to revert most of MDEV-14511, and try to come up with a revised solution for MDEV-13201.

Comment by Marko Mäkelä [ 2018-01-22 ]

I attempted to reduce the amount lock conflicts by acquiring transactional locks upfront on the user tables in both the statistics and DDL operations, but it would still not entirely prevent lock conflicts on the mysql.innodb_index_stats and mysql.innodb_table_stats tables.

Fixing the remaining problems would require a change that is too intrusive for a GA release series, such as MariaDB 10.2.

Therefore, I chose to revert the change MDEV-14511.

Comment by Marko Mäkelä [ 2018-01-22 ]

For future reference, I attached my patches that are based on MariaDB 10.2.
0004-Follow-up-to-MDEV-14941.patch has a known problem (see the commit comment). This set of patches has the problem that transactional lock waits can occur while the InnoDB data dictionary cache is locked.
I believe that to properly fix this, we would have to redesign the storage of InnoDB statistics (MDEV-15020).

Generated at Thu Feb 08 08:17:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.