I've been having the same issue on 10.5.8, 10.5.9, and now 10.5.13. What I can confirm is the following scenario.
- Many concurrent queries are running against an InnoDB table
- Then, a transaction is queued up that will force an Metadata Lock, for example: OPTIMIZE, TRUNCATE, ALTER, etc., but most frequently OPTIMIZE
- Existing queries that are Queued are immediately placed into a "Waiting for table metadata lock" State however...
- Normally just one, but possibly "some" queries that were queued, are marked as Started according to the processlist but that will NEVER FINISH, they have been blocked somehow
- Running the following: "KILL QUERY ID XXXX" against the OPTIMIZE, TRUNCATE, ALTER will clear any process in the "Waiting for table metadata lock" State and allow them to run. However, the one or more transactions (I've never seen more than one) that started right around the same time as the Metadata lock was started will as stated never finish.
My suspicion is that there is a timing issue around the flagging of processes to the MDL status, and that the query that had just been started, was marked as running, but in fact, under the covers it's blocked (by some internal flag in a private structure). So, it's likely either a lack of a MUTEX or SEMAPHONE LOCK around the MDL/LOCK/UNLOCK process.
It's quite consistent, and I've been having to implement mitigations to keep the system running well when the MDL takes place as there are just too many queries coming from too many directions to be able to "quies" the system before running the OPTIMIZE for example, which is the most frequent MDL transaction I run on the system.
here is same case but with different table, see if you can see something common here mysqld_full_bt_all_threads_2021-09-18.txt.zip