[MDEV-25503] InnoDB hangs on startup during recovery Created: 2021-04-23  Updated: 2021-04-27  Resolved: 2021-04-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.2.38, 10.3.29, 10.4.19, 10.5.10, 10.6.1

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: hang, statistics, upstream


 Description   

InnoDB startup hangs if a DDL transaction needs to be rolled back and a recovered transaction on statistics tables exists:

--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
 
CREATE TABLE t1 (a INT PRIMARY KEY) ENGINE=InnoDB STATS_PERSISTENT=1;
 
connect (con1,localhost,root,,);
BEGIN;
DELETE FROM mysql.innodb_table_stats;
 
connect (con2,localhost,root,,);
SET DEBUG_SYNC='inplace_after_index_build SIGNAL blocked WAIT_FOR ever';
send ALTER TABLE t1 FORCE;
 
connection default;
SET DEBUG_SYNC='now WAIT_FOR blocked';
 
--let $shutdown_timeout=0
--source include/restart_mysqld.inc
 
SELECT * FROM t1;
DROP TABLE t1;

10.4 ee455e6f2e6c9cda921c0801210786123d9f8b95

2021-04-23 19:35:00 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=68804
2021-04-23 19:35:00 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
2021-04-23 19:35:00 0 [Note] InnoDB: Trx id counter is 44
2021-04-23 19:35:00 0 [Note] InnoDB: Starting final batch to recover 11 pages from redo log.

The server would just hang there. The relevant stack trace is as follows:

10.4 ee455e6f2e6c9cda921c0801210786123d9f8b95

#6  0x000055e9179033e9 in lock_wait_suspend_thread (thr=<optimized out>, thr@entry=0x55e9196c0e28) at /mariadb/10.4/storage/innobase/lock/lock0wait.cc:346
#7  0x000055e91797a028 in que_run_threads (thr=<optimized out>) at /mariadb/10.4/storage/innobase/que/que0que.cc:1151
#8  0x000055e91797a7e1 in que_eval_sql (info=info@entry=0x55e9196cda80, sql=<optimized out>, sql@entry=0x55e91808a3ae "PROCEDURE DELETE_FROM_TABLE_STATS () IS\nBEGIN\nDELETE FROM \"mysql/innodb_table_stats\" WHERE\ndatabase_name = :database_name AND\ntable_name = :table_name;\nEND;\n", reserve_dict_mutex=false, trx=trx@entry=0x7f07f0db3330) at /mariadb/10.4/storage/innobase/que/que0que.cc:1217
#9  0x000055e917bb2f27 in dict_stats_exec_sql (pinfo=pinfo@entry=0x55e9196cda80, sql=0x55e91808a3ae "PROCEDURE DELETE_FROM_TABLE_STATS () IS\nBEGIN\nDELETE FROM \"mysql/innodb_table_stats\" WHERE\ndatabase_name = :database_name AND\ntable_name = :table_name;\nEND;\n", trx=0x7f07f0db3330, trx@entry=0x0) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:305
#10 0x000055e917bbb21c in dict_stats_delete_from_table_stats (database_name=<optimized out>, table_name=<optimized out>) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:3506
#11 dict_stats_drop_table (db_and_table=<optimized out>, db_and_table@entry=0x55e91968aad0 "test/#sql-30e5b8_b", errstr=errstr@entry=0x7ffc3a3df060 "\300\327\020\367\a\177", errstr_sz=errstr_sz@entry=1024) at /mariadb/10.4/storage/innobase/dict/dict0stats.cc:3589
#12 0x000055e9179dd6a0 in row_drop_table_for_mysql (name=<optimized out>, name@entry=0x55e91968aad0 "test/#sql-30e5b8_b", trx=<optimized out>, trx@entry=0x7f07f0db2228, sqlcom=sqlcom@entry=SQLCOM_DROP_TABLE, create_failed=false, nonatomic=<optimized out>) at /mariadb/10.4/storage/innobase/row/row0mysql.cc:3470
#13 0x000055e9179dc3a6 in row_mysql_drop_garbage_tables () at /mariadb/10.4/storage/innobase/row/row0mysql.cc:2793
#14 0x000055e9179174e2 in recv_recovery_rollback_active () at /mariadb/10.4/storage/innobase/log/log0recv.cc:3946
#15 0x000055e917a56a8b in srv_start (create_new_db=<optimized out>) at /mariadb/10.4/storage/innobase/srv/srv0start.cc:2182
#16 0x000055e9178649b7 in innodb_init (p=<optimized out>) at /mariadb/10.4/storage/innobase/handler/ha_innodb.cc:4313
#17 0x000055e917663452 in ha_initialize_handlerton (plugin=0x55e919391b90) at /mariadb/10.4/sql/handler.cc:574

A possible fix could be to make trx_rollback_recovered(false) roll back also any transactions that hold locks on mysql.innodb_table_stats or mysql.innodb_index_stats. In that way, no locking conflict would exist and the server should be able to start up.



 Comments   
Comment by Thirunarayanan Balathandayuthapani [ 2021-04-27 ]

Patch is in bb-10.2-thiru

Comment by Marko Mäkelä [ 2021-04-27 ]

Looks OK. Only a minor suggestion:

bool trx_has_stats_table_lock(const trx_t *trx);

should better be:

bool trx_t::has_stats_table_lock() const

Generated at Thu Feb 08 09:38:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.