[MDEV-28930] ALTER TABLE Deadlocks with parallel TL_WRITE Created: 2022-06-22  Updated: 2023-08-16  Resolved: 2023-03-29

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table
Affects Version/s: None
Fix Version/s: 11.2.1

Type: Bug Priority: Critical
Reporter: Brandon Nesterenko Assignee: Sergei Golubchik
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MDEV-28949 Deadlock between online alter and DML Closed
Issue split
split from MDEV-28776 rpl.rpl_mark_optimize_tbl_ddl fails w... Closed
Problem/Incident
is caused by MDEV-16329 Engine-independent online ALTER TABLE Closed

 Description   

When the binary log is enabled, mariadbd can deadlock if an ALTER TABLE and a TL_WRITE MDL operation target the same table. This added call to upgrade_shared_lock is the cause of the problem.

This was first detected by rpl.rpl_mark_optimize_tbl_ddl in preview-10.10-*ddl specific branches. The following MTR test mimics that deadlock without replication using the preview-10.10-ddl Git branch.

--source include/have_innodb.inc
--source include/have_debug_sync.inc
--source include/have_binlog_format_mixed.inc
 
--connect(alter_con, localhost, root,,test)
SET DEBUG_SYNC="alter_table_online_before_lock WAIT_FOR go_for_locking";
--send alter table mysql.global_priv engine=innodb
 
--connect(insert_con, localhost, root,,test)
sleep 1;
--send create user user1@localhost;
 
--connection default
sleep 1;
SET DEBUG_SYNC="now SIGNAL go_for_locking";
 
--connection alter_con
--reap
--connection insert_con
--reap
 
--echo #
--echo # Cleanup
--connection default
SET DEBUG_SYNC="RESET";
DROP USER user1@localhost;
 
--echo # End of test

Stack traces of the deadlocked threads:

Thread 15 (LWP 91653 "mariadbd"):
#0  __futex_abstimed_wait_common64 (cancel=true, private=-1193146584, abstime=0x7f90ac1bdf20, clockid=32656, expected=0, futex_word=0x7f9078000b90) at ../sysdeps/nptl/futex-internal.c:74
#1  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f9078000b90, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7f90ac1bdf20, private=private@entry=0) at ../sysdeps/nptl/futex-internal.c:123
#2  0x00007f90b9cec99e in __pthread_cond_wait_common (abstime=0x7f90ac1bdf20, clockid=0, mutex=0x5585ab93e8a0, cond=0x7f9078000b68) at pthread_cond_wait.c:504
#3  __pthread_cond_timedwait (cond=0x7f9078000b68, mutex=0x5585ab93e8a0, abstime=0x7f90ac1bdf20) at pthread_cond_wait.c:646
#4  0x00005585a8d7e851 in safe_cond_timedwait (cond=0x7f9078000b68, mp=0x5585ab93e878, abstime=0x7f90ac1bdf20, file=0x5585a9310600 "/home/brandon/workspace/server/mysys/my_thr_init.c", line=609) at /home/brandon/workspace/server/mysys/thr_mutex.c:548
#5  0x00005585a8d777c5 in psi_cond_timedwait (that=0x7f9078000b68, mutex=0x5585ab93e878, abstime=0x7f90ac1bdf20, file=0x5585a9310c58 "/home/brandon/workspace/server/mysys/thr_lock.c", line=558) at /home/brandon/workspace/server/mysys/my_thr_init.c:609
#6  0x00005585a8d7ae55 in inline_mysql_cond_timedwait (that=0x7f9078000b68, mutex=0x5585ab93e878, abstime=0x7f90ac1bdf20, src_file=0x5585a9310c58 "/home/brandon/workspace/server/mysys/thr_lock.c", src_line=558) at /home/brandon/workspace/server/include/mysql/psi/mysql_thread.h:1086
#7  0x00005585a8d7b4ea in wait_for_lock (wait=0x5585ab93e948, data=0x7f907801dd88, in_wait_list=0 '\000', lock_wait_timeout=31536000) at /home/brandon/workspace/server/mysys/thr_lock.c:558
#8  0x00005585a8d7bd95 in thr_lock (data=0x7f907801dd88, owner=0x7f9078002b38, lock_wait_timeout=31536000) at /home/brandon/workspace/server/mysys/thr_lock.c:890
#9  0x00005585a8d7c698 in thr_multi_lock (data=0x7f9078014320, count=7, owner=0x7f9078002b38, lock_wait_timeout=31536000) at /home/brandon/workspace/server/mysys/thr_lock.c:1171
#10 0x00005585a864bf84 in mysql_lock_tables (thd=0x7f9078000db8, sql_lock=0x7f90780142c8, flags=133120) at /home/brandon/workspace/server/sql/lock.cc:349
#11 0x00005585a864bd82 in mysql_lock_tables (thd=0x7f9078000db8, tables=0x7f9078014290, count=7, flags=133120) at /home/brandon/workspace/server/sql/lock.cc:301
#12 0x00005585a804d07f in lock_tables (thd=0x7f9078000db8, tables=0x7f90ac1be250, count=7, flags=133120) at /home/brandon/workspace/server/sql/sql_base.cc:5799
#13 0x00005585a803e2fe in Grant_tables::open_and_lock (this=0x7f90ac1c1a60, thd=0x7f9078000db8, which_tables=247, lock_type=TL_WRITE) at /home/brandon/workspace/server/sql/sql_acl.cc:2004
#14 0x00005585a802a3b6 in mysql_create_user (thd=0x7f9078000db8, list=..., handle_as_role=false) at /home/brandon/workspace/server/sql/sql_acl.cc:10836
#15 0x00005585a81021cd in mysql_execute_command (thd=0x7f9078000db8, is_called_from_prepared_stmt=false) at /home/brandon/workspace/server/sql/sql_parse.cc:5328
#16 0x00005585a810abaf in mysql_parse (thd=0x7f9078000db8, rawbuf=0x7f9078013dd0 "create user user1@localhost", length=27, parser_state=0x7f90ac1c23e0) at /home/brandon/workspace/server/sql/sql_parse.cc:8038
#17 0x00005585a80f6f59 in dispatch_command (command=COM_QUERY, thd=0x7f9078000db8, packet=0x7f907800bc29 "create user user1@localhost;", packet_length=28, blocking=true) at /home/brandon/workspace/server/sql/sql_parse.cc:1894
#18 0x00005585a80f593f in do_command (thd=0x7f9078000db8, blocking=true) at /home/brandon/workspace/server/sql/sql_parse.cc:1407
#19 0x00005585a82d5baa in do_handle_one_connection (connect=0x5585ab980858, put_in_cache=true) at /home/brandon/workspace/server/sql/sql_connect.cc:1418
#20 0x00005585a82d583d in handle_one_connection (arg=0x5585ab980858) at /home/brandon/workspace/server/sql/sql_connect.cc:1312
#21 0x00005585a87eebfc in pfs_spawn_thread (arg=0x5585ab980938) at /home/brandon/workspace/server/storage/perfschema/pfs.cc:2201
#22 0x00007f90b9ce6450 in start_thread (arg=0x7f90ac1c3640) at pthread_create.c:473
#23 0x00007f90b987dd53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 14 (LWP 91652 "mariadbd"):
#0  __futex_abstimed_wait_common64 (cancel=true, private=-1407172400, abstime=0x7f90ac204590, clockid=32656, expected=0, futex_word=0x7f9074000fe0) at ../sysdeps/nptl/futex-internal.c:74
#1  __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7f9074000fe0, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x7f90ac204590, private=private@entry=0) --Type <RET> for more, q to quit, c to continue without paging--
at ../sysdeps/nptl/futex-internal.c:123
#2  0x00007f90b9cec99e in __pthread_cond_wait_common (abstime=0x7f90ac204590, clockid=0, mutex=0x7f9074000f30, cond=0x7f9074000fb8) at pthread_cond_wait.c:504
#3  __pthread_cond_timedwait (cond=0x7f9074000fb8, mutex=0x7f9074000f30, abstime=0x7f90ac204590) at pthread_cond_wait.c:646
#4  0x00005585a8d7e851 in safe_cond_timedwait (cond=0x7f9074000fb8, mp=0x7f9074000f08, abstime=0x7f90ac204590, file=0x5585a9310600 "/home/brandon/workspace/server/mysys/my_thr_init.c", line=609) at /home/brandon/workspace/server/mysys/thr_mutex.c:548
#5  0x00005585a8d777c5 in psi_cond_timedwait (that=0x7f9074000fb8, mutex=0x7f9074000f08, abstime=0x7f90ac204590, file=0x5585a8f0eed8 "/home/brandon/workspace/server/sql/mdl.cc", line=1195) at /home/brandon/workspace/server/mysys/my_thr_init.c:609
#6  0x00005585a82e3c93 in inline_mysql_cond_timedwait (that=0x7f9074000fb8, mutex=0x7f9074000f08, abstime=0x7f90ac204590, src_file=0x5585a8f0eed8 "/home/brandon/workspace/server/sql/mdl.cc", src_line=1195) at /home/brandon/workspace/server/include/mysql/psi/mysql_thread.h:1086
#7  0x00005585a82e55a0 in MDL_wait::timed_wait (this=0x7f9074000f08, owner=0x7f9074000e98, abs_timeout=0x7f90ac204590, set_status_on_timeout=false, wait_state_name=0x5585a98b0a50 <MDL_key::m_namespace_to_wait_state_name+48>) at /home/brandon/workspace/server/sql/mdl.cc:1195
#8  0x00005585a82e758f in MDL_context::acquire_lock (this=0x7f9074000f08, mdl_request=0x7f90ac204670, lock_wait_timeout=86400) at /home/brandon/workspace/server/sql/mdl.cc:2379
#9  0x00005585a82e7e01 in MDL_context::upgrade_shared_lock (this=0x7f9074000f08, mdl_ticket=0x7f9074008590, new_type=MDL_EXCLUSIVE, lock_wait_timeout=86400) at /home/brandon/workspace/server/sql/mdl.cc:2586
#10 0x00005585a822aa5c in copy_data_between_tables (thd=0x7f9074000db8, from=0x5585ab93ace8, to=0x7f90740214f8, create=..., ignore=false, order_num=0, order=0x0, copied=0x7f90ac209d90, deleted=0x7f90ac209d98, keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x7f90ac20b2d0, online=true) at /home/brandon/workspace/server/sql/sql_table.cc:12300
#11 0x00005585a822729c in mysql_alter_table (thd=0x7f9074000db8, new_db=0x7f9074005a78, new_name=0x7f9074005e88, create_info=0x7f90ac20c0e0, table_list=0x7f9074014338, alter_info=0x7f90ac20bff0, order_num=0, order=0x0, ignore=false, if_exists=false) at /home/brandon/workspace/server/sql/sql_table.cc:11337
#12 0x00005585a82e111a in Sql_cmd_alter_table::execute (this=0x7f9074014a28, thd=0x7f9074000db8) at /home/brandon/workspace/server/sql/sql_alter.cc:553
#13 0x00005585a8104977 in mysql_execute_command (thd=0x7f9074000db8, is_called_from_prepared_stmt=false) at /home/brandon/workspace/server/sql/sql_parse.cc:5996
#14 0x00005585a810abaf in mysql_parse (thd=0x7f9074000db8, rawbuf=0x7f9074014220 "alter table mysql.global_priv engine=innodb", length=43, parser_state=0x7f90ac20d3e0) at /home/brandon/workspace/server/sql/sql_parse.cc:8038
#15 0x00005585a80f6f59 in dispatch_command (command=COM_QUERY, thd=0x7f9074000db8, packet=0x7f907400bc29 "", packet_length=43, blocking=true) at /home/brandon/workspace/server/sql/sql_parse.cc:1894
#16 0x00005585a80f593f in do_command (thd=0x7f9074000db8, blocking=true) at /home/brandon/workspace/server/sql/sql_parse.cc:1407
#17 0x00005585a82d5baa in do_handle_one_connection (connect=0x5585ab9802c8, put_in_cache=true) at /home/brandon/workspace/server/sql/sql_connect.cc:1418
#18 0x00005585a82d583d in handle_one_connection (arg=0x5585ab9802c8) at /home/brandon/workspace/server/sql/sql_connect.cc:1312
#19 0x00005585a87eebfc in pfs_spawn_thread (arg=0x5585ab9803a8) at /home/brandon/workspace/server/storage/perfschema/pfs.cc:2201
#20 0x00007f90b9ce6450 in start_thread (arg=0x7f90ac20e640) at pthread_create.c:473
#21 0x00007f90b987dd53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



 Comments   
Comment by Nikita Malyavin [ 2022-06-24 ]

Hello bnestere! Yes, I am likely responsible for that. Our claim was that no other algorithms upgrade the lock to MDL_EXCLUSIVE, except ONLINE ALTER TABLE.
I see that the deadlock actually happens though. Will investigate.

Comment by Nikita Malyavin [ 2022-06-28 ]

This is strange. All the tables from CREATE USER request have been granted for acquisition, but it still hangs in thr_lock for some reason...

Comment by Nikita Malyavin [ 2022-06-29 ]

Please review bb-10.10-MDEV-28930 (commit)

Generated at Thu Feb 08 10:04:33 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.