[MDEV-32529] Draft: Serious longer "Waiting for table metadata lock" than lock-wait-timeout value Created: 2023-10-20  Updated: 2023-10-31

Status: Open
Project: MariaDB Server
Component/s: Locking, Views
Affects Version/s: 10.6.16
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Matthias Leich
Resolution: Unresolved Votes: 0
Labels: locking, mdl, timeout


 Description   

origin/bb-10.6-MDEV-32050-rerebase bf0f43d51340d2b7e0cb1681c63605536556f385 2023-10-19T15:48:57+03:00
It is in the moment unknown if the problems are

  • caused by the MDEV-32050 modifications
  • in 10.6 and other main versions too.

Scenario:
1. Start the DB server with lock-wait-timeout=15 and generate some initial data
2. 33 sessions run concurrent some DDL/DML mix
3. During 2. is ongoing PROCESSLISTS with content like

# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] ID -- COMMAND -- TIME -- STATE -- INFO -- RQG_guess
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->20 -- Query -- 459 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = MERGE VIEW view_2 AS SELECT /* QUERY_ID: 6818048 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 9 AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_1
# 2023-10-19T19:11:06 [407978]  /* E_R Thread10 QNO 759 CON_ID 20 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->21 -- Query -- 368 -- Waiting for table metadata lock -- ALTER ALGORITHM = TEMPTABLE VIEW view_1 AS SELECT /* QUERY_ID: 5899008 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 2 AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_2
# 2023-10-19T19:11:06 [407978] WHERE func_2 ( `col_int_key` ) < 7 /* E_R Thread6 QNO 840 CON_ID 21 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->23 -- Query -- 461 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_2 AS SELECT /* QUERY_ID: 10715136 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_1 ( `col_int_key` ) AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_1
# 2023-10-19T19:11:06 [407978]  /* E_R Thread5 QNO 714 CON_ID 23 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->26 -- Query -- 399 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_2 AS SELECT /* QUERY_ID: 2509824 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 7 AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_1
# 2023-10-19T19:11:06 [407978]  /* E_R Thread24 QNO 684 CON_ID 26 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->27 -- Query -- 459 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_2 AS SELECT /* QUERY_ID: 15198720 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ 6 AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_1
# 2023-10-19T19:11:06 [407978]  /* E_R Thread22 QNO 686 CON_ID 27 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->28 -- Query -- 413 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = MERGE VIEW view_2 AS SELECT /* QUERY_ID: 2850048 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_1 ( 3 ) AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_1
# 2023-10-19T19:11:06 [407978] WHERE 5 < 6 /* E_R Thread20 QNO 625 CON_ID 28 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->30 -- Query -- 460 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_1 AS SELECT /* QUERY_ID: 8853504 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_2 ( `col_int_key` ) AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_2
# 2023-10-19T19:11:06 [407978]  /* E_R Thread27 QNO 672 CON_ID 30 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->31 -- Query -- 459 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_1 AS SELECT /* QUERY_ID: 4352256 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_2 ( 3 ) AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_2
# 2023-10-19T19:11:06 [407978]  /* E_R Thread12 QNO 703 CON_ID 31 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->34 -- Query -- 429 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_2 AS SELECT /* QUERY_ID: 13853184 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_2 ( `col_int_key` ) AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_1
# 2023-10-19T19:11:06 [407978] WHERE 4 <> func_2 ( 2 ) /* E_R Thread29 QNO 750 CON_ID 34 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] -->38 -- Query -- 430 -- Waiting for table metadata lock -- CREATE OR REPLACE ALGORITHM = TEMPTABLE VIEW view_1 AS SELECT /* QUERY_ID: 7958784 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_1 ( 2 ) AS `col_int_key`
# 2023-10-19T19:11:06 [407978] FROM view_2
# 2023-10-19T19:11:06 [407978]  /* E_R Thread32 QNO 658 CON_ID 38 */ <--suspicious
# 2023-10-19T19:11:06 [407978] DBServer_e::MySQL::MySQLd::server_is_operable: server[1] Content of processlist ---------- end

were observed. IMHO the DB server should have killed all these queries because of exceeding lock-wait-timeout=15 and release their locks.
I had also the hope that some automatic deadlock detection would kicked in and reduced the number of threads waiting for MDL lock.
4. RQG reacts with running 'SHOW ENGINE INNODB STATUS' and sending SIGABRT
to the server process.

pluto:/data/results/1697741072/LWT_VIOLATION$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio

RQG
===
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
#
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 4f27e561407069c72ce0eaee4cfc27c1cdd5d26c 2023-10-17T20:57:24+02:00
# rqg.pl  : Version 4.4.0 (2023-08)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/runtime/metadata_stability.yy \
# --gendata=conf/runtime/metadata_stability.zz \
# --filter=conf/mariadb/no_flush_no_lock.ff \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --no_mask \
# --queries=10000000 \
# --seed=random \
# --reporters=None \
# --reporters=ErrorLog \
# --reporters=Deadlock \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --mysqld=--loose-innodb_compression_level=1 \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--lock-wait-timeout=15 \
# --mysqld=--innodb-lock-wait-timeout=10 \
# --mysqld=--sql_mode=traditional \
# --mysqld=--innodb_file_per_table=1 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--innodb_stats_persistent=on \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--innodb_random_read_ahead=OFF \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=on \
# --threads=33 \
# --mysqld=--innodb-use-native-aio=0 \
# --mysqld=--loose-gdb \
# --mysqld=--loose-debug-gdb \
# --rr=Extended \
# --rr_options=--wait \
# --mysqld=--innodb_undo_log_truncate=OFF \
# --mysqld=--loose_innodb_change_buffering=deletes \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --mysqld=--innodb_page_size=8K \
# --mysqld=--innodb-buffer-pool-size=8M \
# <local settings>



 Comments   
Comment by Marko Mäkelä [ 2023-10-20 ]

In the trace that I just checked, at the time the server process receives SIGABRT from the environment, the only InnoDB thread is the idle page cleaner:

ssh pluto
rr replay /data/results/1697741072/LWT_VIOLATION/1/rr/latest-trace

Thread 5 (Thread 410046.410832):
#11 __pthread_cond_wait (cond=cond@entry=0x56239fe38cc8 <buf_pool+17416>, mutex=mutex@entry=0x56239fe38be8 <buf_pool+17192>) at pthread_cond_wait.c:638
#12 0x000056239f3e5301 in safe_cond_wait (cond=0x56239fe38cc8 <buf_pool+17416>, mp=0x56239fe38bc0 <buf_pool+17152>, file=file@entry=0x56239f7ea5d8 "/data/Server/bb-10.6-MDEV-32050-rerebase/storage/innobase/buf/buf0flu.cc", line=line@entry=2280) at /data/Server/bb-10.6-MDEV-32050-rerebase/mysys/thr_mutex.c:492
#13 0x000056239f276735 in buf_flush_page_cleaner () at /data/Server/bb-10.6-MDEV-32050-rerebase/storage/innobase/buf/buf0flu.cc:2280

There are several threads waiting in MDL_wait::timed_wait. The last one is right before the server was killed by the environment:

Thread 3 hit Breakpoint 1, MDL_wait::timed_wait (this=this@entry=0x7f65fc023840, owner=0x7f65fc0237d0, abs_timeout=abs_timeout@entry=0x7f65e40572a0, set_status_on_timeout=set_status_on_timeout@entry=false, wait_state_name=0x56239fe12870 <MDL_key::m_namespace_to_wait_state_name+48>) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/mdl.cc:1174
1174	{
(rr) bt
#0  MDL_wait::timed_wait (this=this@entry=0x7f65fc023840, owner=0x7f65fc0237d0, abs_timeout=abs_timeout@entry=0x7f65e40572a0, set_status_on_timeout=set_status_on_timeout@entry=false, wait_state_name=0x56239fe12870 <MDL_key::m_namespace_to_wait_state_name+48>) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/mdl.cc:1174
#1  0x000056239eae3752 in MDL_context::acquire_lock (this=this@entry=0x7f65fc023840, mdl_request=mdl_request@entry=0x7f65fc035760, lock_wait_timeout=<optimized out>) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/mdl.h:482
#2  0x000056239e8ff5cd in open_table_get_mdl_lock (thd=thd@entry=0x7f65fc0236e8, ot_ctx=ot_ctx@entry=0x7f65e4057860, mdl_request=mdl_request@entry=0x7f65fc035760, flags=flags@entry=0, mdl_ticket=mdl_ticket@entry=0x7f65e4057558) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_base.h:562
#3  0x000056239e905a83 in open_table (thd=thd@entry=0x7f65fc0236e8, table_list=table_list@entry=0x7f65fc0352e0, ot_ctx=ot_ctx@entry=0x7f65e4057860) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_base.cc:1864
#4  0x000056239e907616 in open_and_process_table (thd=thd@entry=0x7f65fc0236e8, tables=tables@entry=0x7f65fc0352e0, counter=counter@entry=0x7f65e40578fc, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f65e4057a30, has_prelocking_list=has_prelocking_list@entry=false, ot_ctx=0x7f65e4057860) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_base.cc:3862
#5  0x000056239e909870 in open_tables (thd=thd@entry=0x7f65fc0236e8, options=..., start=start@entry=0x7f65e40578e8, counter=counter@entry=0x7f65e40578fc, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f65e4057a30) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_base.cc:4346
#6  0x000056239e90a067 in open_and_lock_tables (thd=thd@entry=0x7f65fc0236e8, options=..., tables=<optimized out>, derived=derived@entry=true, flags=flags@entry=0, prelocking_strategy=prelocking_strategy@entry=0x7f65e4057a30) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_base.cc:5319
#7  0x000056239ea837b7 in open_and_lock_tables (flags=0, derived=true, tables=<optimized out>, thd=0x7f65fc0236e8) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_base.h:512
#8  mysql_create_view (thd=thd@entry=0x7f65fc0236e8, views=views@entry=0x7f65fc033758, mode=VIEW_CREATE_OR_REPLACE) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_view.cc:466
#9  0x000056239e98ce80 in mysql_execute_command (thd=thd@entry=0x7f65fc0236e8, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_parse.cc:5828
#10 0x000056239e974a13 in mysql_parse (thd=thd@entry=0x7f65fc0236e8, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x7f65e40583a0) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_parse.cc:8050
#11 0x000056239e982f3c in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f65fc0236e8, packet=packet@entry=0x7f65fc01e079 " CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_2 AS SELECT /* QUERY_ID: 10715136 */ /* RESULTSET_SAME_DATA_IN_EVERY_ROW */ func_1 ( `col_int_key` ) AS `col_int_key`\nFROM view_1\n /* E_R Thread5 QNO"..., 
    packet_length=packet_length@entry=218, blocking=blocking@entry=true) at /data/Server/bb-10.6-MDEV-32050-rerebase/sql/sql_class.h:1403

All MDL waiters are:

Thread statement
3 CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_2 … FROM view_1 …
20 CREATE OR REPLACE ALGORITHM = MERGE VIEW view_2 … FROM view_1 …
21 ALTER ALGORITHM = TEMPTABLE VIEW view_1 AS SELECT 2 AS col_int_key FROM view_2 WHERE func_2 ( col_int_key ) < 7
26 CREATE OR REPLACE ALGORITHM = UNDEFINED VIEW view_2 AS SELECT 6 AS col_int_key FROM view_1
27 CREATE OR REPLACE VIEW view_2 AS SELECT … FROM view_1
29 CREATE OR REPLACE VIEW view_1 AS SELECT … FROM view_2
30 CREATE OR REPLACE VIEW view_1 AS SELECT … FROM view_2
33 CREATE OR REPLACE VIEW view_2 AS SELECT … FROM view_1
37 CREATE OR REPLACE VIEW view_1 AS SELECT … FROM view_2

Without knowing this code, I would guess that the minimum test case involves two threads. Apparently, there is no deadlock detector for MDL requests or it is not working. It could also be that the timeout logic is not working or timeout errors are being ignored in this code. I would guess that one of the threads is holding MDL on view_1 and waiting for MDL on view_2, while another is doing the opposite.

Comment by Matthias Leich [ 2023-10-31 ]

In all cases where I hit that problem the server error log contained one or more lines like
[ERROR] Got error 128 when reading table './test/table10_innodb'

Generated at Thu Feb 08 10:32:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.