[MDEV-29846] deadlock on dict_sys.mutex on database drop Created: 2022-10-21  Updated: 2022-11-23

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5
Fix Version/s: 10.5

Type: Bug Priority: Major
Reporter: Vladislav Lesin Assignee: Vladislav Lesin
Resolution: Unresolved Votes: 1
Labels: None

Issue Links:
Relates
relates to MDEV-24258 Merge dict_sys.mutex into dict_sys.latch Closed
relates to MDEV-25506 Atomic DDL: .frm file is removed and ... Closed
relates to MDEV-25691 Simplify handlerton::drop_database fo... Closed

 Description   

There is semaphore timeout, found by Matthias during MDEV-28709 testing(10.5).

The timeout is on "mutex_enter(&dict_sys.mutex)" in "dict_table_open_on_name()".
This mutex was acquired by the thread which executes "while ((table_name = dict_get_first_table_name_in_db(name)))" loop in "row_drop_database_for_mysql()" under that mutex. And execution is so long that semaphore timeout was caught.

The loop does not make any progress, as "dict_get_first_table_name_in_db(name)" constantly returns the same table_name after "row_drop_table_for_mysql(table_name, ...)" returned DB_SUCCESS for it. Table name is "cool_down/#sql-ib22", so it's tmp table. DICT_TF2_TEMPORARY is not set for the table, that is why "row_drop_table_for_mysql(table_name, ...)" pushes the table in background table drop list and returns DB_SUCCESS.

row_drop_tables_for_mysql_in_background() tries to open the table with dict_table_open_on_id() , which, in turns, requests dict_sys.mutex and can't acquire it, as row_drop_database_for_mysql() holds it.

So, we have deadlock, which can't be detected with sync_array_detect_deadlock() , as row_drop_database_for_mysql() waits state changes from row_drop_tables_for_mysql_in_background() , which is waiting for the mutex held by row_drop_database_for_mysql() .



 Comments   
Comment by Vladislav Lesin [ 2022-10-21 ]

10.5 only is analysed, I don't know yet if other versions are affected.

Comment by Marko Mäkelä [ 2022-10-21 ]

DROP DATABASE was rewritten and the "background DROP queue" removed in MariaDB 10.6 as part of MDEV-25506 and MDEV-25691. Also, the combination of what used to be dict_sys.mutex and dict_operation_lock was ultimately replaced with dict_sys.latch in MDEV-24258.

I do not think that 10.6 or later versions should be affected by this. It may not be feasible to spend time fixing this bug in earlier versions.

Generated at Thu Feb 08 10:11:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.