[MDEV-26600] Waiting for table metadata lock Created: 2021-09-13  Updated: 2022-01-29

Status: Open
Project: MariaDB Server
Component/s: Locking, Storage Engine - MyISAM
Affects Version/s: 10.2.22
Fix Version/s: None

Type: Bug Priority: Critical
Reporter: Abhay Singh Assignee: Unassigned
Resolution: Unresolved Votes: 2
Labels: performance
Environment:

NAME="openSUSE Leap"
VERSION="15.1 "
ID="opensuse-leap"
ID_LIKE="suse opensuse"
VERSION_ID="15.1"
PRETTY_NAME="openSUSE Leap 15.1"
ANSI_COLOR="0;32"
CPE_NAME="cpe:/o:opensuse:leap:15.1"
Intel(R) Xeon(R) Gold 6126 CPU @ 2.60GHz
Memory = 13145556 MB


Attachments: Zip Archive mysqlBacktraces.zip     Zip Archive mysqld_full_bt_all_threads_2021-09-18.txt.zip    

 Description   

DML query is stuking with status "Waiting for table metadata lock", don't see any thing in mysql log for this, then engine of the table is MyISAM

14569 cubot localhost mutual Query 7938 Waiting for table metadata lock Truncate Table MST2_S_INVESTOR 0.000
14588 cubot localhost mutual Query 5300 Waiting for table metadata lock Truncate Table MST2_S_INVESTOR 0.000

no other query running on this table which might be causing this lock.
the similar problem happens with drop table also.
All these happens only with certain tables and not all tables



 Comments   
Comment by Daniel Black [ 2021-09-13 ]

Is this repeatable?

Can you install the debug symbols (https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#installing-debug-info-packages-with-zypper) and obtain a backtrace when this occurs again https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/#getting-backtraces-from-a-running-mysqld-process-with-gdb-on-linux ?

Comment by Abhay Singh [ 2021-09-16 ]

mysqlBacktraces.zip
Yes, its repeatable, just now it happened, as you suggested i have attached the backtrace

Comment by Daniel Black [ 2021-09-17 ]

Thanks abhaysingh, I narrowed it down to the below two threads potentially in deadlock.

Thread 100 (Thread 0x7f54973a0700 (LWP 39712)):
#0  0x00007f5b02989c66 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00005623977efb30 in MDL_wait::timed_wait(MDL_context_owner*, timespec*, bool, PSI_stage_info_v1 const*) ()
No symbol table info available.
#2  0x00005623977f1429 in MDL_context::acquire_lock(MDL_request*, double) ()
No symbol table info available.
#3  0x00005623977f1d1e in MDL_context::acquire_locks(I_P_List<MDL_request, I_P_List_adapter<MDL_request, &MDL_request::next_in_list, &MDL_request::prev_in_list>, I_P_List_counter, I_P_List_no_push_back<MDL_request> >*, double) ()
No symbol table info available.
#4  0x00005623976db4b7 in lock_table_names(THD*, DDL_options_st const&, TABLE_LIST*, TABLE_LIST*, unsigned long, unsigned int) ()
No symbol table info available.
#5  0x00005623979e9a42 in Sql_cmd_truncate_table::lock_table(THD*, TABLE_LIST*, bool*) ()
No symbol table info available.
#6  0x00005623979e9b5d in Sql_cmd_truncate_table::truncate_table(THD*, TABLE_LIST*) ()
No symbol table info available.
#7  0x00005623979e9d8e in Sql_cmd_truncate_table::execute(THD*) ()
No symbol table info available.
#8  0x000056239771da5f in mysql_execute_command(THD*) ()
No symbol table info available.
#9  0x00005623977256f9 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) ()
No symbol table info available.
#10 0x00005623977284a7 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) ()
No symbol table info available.
#11 0x0000562397728d66 in do_command(THD*) ()
No symbol table info available.
#12 0x00005623977e7cb4 in do_handle_one_connection(CONNECT*) ()
No symbol table info available.
#13 0x00005623977e7e94 in handle_one_connection ()
No symbol table info available.
#14 0x00007f5b02983569 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#15 0x00007f5b01de09ef in clone () from /lib64/libc.so.6
No symbol table info available.

Thread 102 (Thread 0x7f549620c700 (LWP 41269)):
#0  0x00007f5b02989c66 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x0000562397693b66 in one_thread_per_connection_end(THD*, bool) ()
No symbol table info available.
#2  0x00005623977e7c38 in do_handle_one_connection(CONNECT*) ()
No symbol table info available.
#3  0x00005623977e7e94 in handle_one_connection ()
No symbol table info available.
#4  0x00007f5b02983569 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5  0x00007f5b01de09ef in clone () from /lib64/libc.so.6
No symbol table info available.

If its not that second thread then its one of the other background threads.

This would be easier to read if the debug symbols here installed:

install debuginfo

sudo zypper install MariaDB-server-debuginfo

If you get a chance to install and recapture the trace please do so.

Comment by Abhay Singh [ 2021-09-17 ]

Hi Daniel, the debug info is already installed and I have captured trace after installing
$> rpm -qa | grep -i debuginfo
MariaDB-server-debuginfo-10.2.38-1.x86_64

Comment by Abhay Singh [ 2021-09-17 ]

the two threads you have mentioned are two different cases of the same problem that I have mentioned, one thread is with one table, and in another, it's with another table, both tables are not related and no other queries were running or any other processes running.
Let me know what else I can produce to dig deeper into this issue.
thanks for your time.

Comment by Abhay Singh [ 2021-09-18 ]

here is same case but with different table, see if you can see something common here mysqld_full_bt_all_threads_2021-09-18.txt.zip

Comment by Abhay Singh [ 2021-09-23 ]

Hi Daniel, do you have any updates/suggestions to try.
One thing I have noticed is that after restarting of database server this problem doesn't appear for few runs, so for now as work around am restarting my database server every day before starting jobs.

Comment by Larry Adams [ 2022-01-29 ]

I've been having the same issue on 10.5.8, 10.5.9, and now 10.5.13. What I can confirm is the following scenario.

  • Many concurrent queries are running against an InnoDB table
  • Then, a transaction is queued up that will force an Metadata Lock, for example: OPTIMIZE, TRUNCATE, ALTER, etc., but most frequently OPTIMIZE
  • Existing queries that are Queued are immediately placed into a "Waiting for table metadata lock" State however...
  • Normally just one, but possibly "some" queries that were queued, are marked as Started according to the processlist but that will NEVER FINISH, they have been blocked somehow
  • Running the following: "KILL QUERY ID XXXX" against the OPTIMIZE, TRUNCATE, ALTER will clear any process in the "Waiting for table metadata lock" State and allow them to run. However, the one or more transactions (I've never seen more than one) that started right around the same time as the Metadata lock was started will as stated never finish.

My suspicion is that there is a timing issue around the flagging of processes to the MDL status, and that the query that had just been started, was marked as running, but in fact, under the covers it's blocked (by some internal flag in a private structure). So, it's likely either a lack of a MUTEX or SEMAPHONE LOCK around the MDL/LOCK/UNLOCK process.

It's quite consistent, and I've been having to implement mitigations to keep the system running well when the MDL takes place as there are just too many queries coming from too many directions to be able to "quies" the system before running the OPTIMIZE for example, which is the most frequent MDL transaction I run on the system.

Comment by Larry Adams [ 2022-01-29 ]

Secondly, one potential workaround would be to decrease the WAIT_TIMEOUT or INTERACTIVE_TIMEOUT to something less than what it is, but we have some very long running queries that are commonplace on the system at around the same time as the MDL transactions roll in. So, setting those values low in a global fashion is problematic.

Generated at Thu Feb 08 09:46:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.