[MDEV-25334] FTWRL/Backup blocks DDL on temporary tables with binlog enabled, assertion fails in Diagnostics_area::set_error_status Created: 2021-04-03  Updated: 2021-04-07  Resolved: 2021-04-07

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Temporary, Locking, Replication
Affects Version/s: 10.4, 10.5, 10.6
Fix Version/s: 10.4.19, 10.5.10

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
is caused by MDEV-23586 Mariabackup: GTID saved for replicati... Closed

 Description   

The two test cases below differ only in FTWRL vs BACKUP STAGE x 2 and cause the same effect, both on release and debug builds.

FTWRL

--source include/have_binlog_format_mixed_or_statement.inc
 
CREATE TEMPORARY TABLE tmp (a INT);
 
--connect (con1,localhost,root,,)
FLUSH TABLES WITH READ LOCK;
 
--connection default
SET lock_wait_timeout= 1;
ALTER TABLE tmp;

BACKUP STAGE

--source include/have_binlog_format_mixed_or_statement.inc
 
CREATE TEMPORARY TABLE tmp (a INT);
 
--connect (con1,localhost,root,,)
BACKUP STAGE START;
BACKUP STAGE BLOCK_COMMIT;
 
--connection default
SET lock_wait_timeout= 1;
ALTER TABLE tmp;

The backup lock taken by con1 blocks ALTER on a temporary table in the connection default and makes it eventually fail with a timeout. It affects debug and non-debug builds alike.

10.4 880baedc

connection default;
SET lock_wait_timeout= 1;
ALTER TABLE tmp;
bug.t                                    [ fail ]
        Test ended at 2021-04-03 20:00:05
 
CURRENT_TEST: bug.t
mysqltest: At line 12: query 'ALTER TABLE tmp' failed: 1205: Lock wait timeout exceeded; try restarting transaction

Debug builds further abort with an assertion failure in diagnostics area:

10.4 880baedc debug

mysqld: /data/src/10.4/sql/sql_error.cc:445: void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*): Assertion `! is_set() || m_can_overwrite_status' failed.
210403 20:02:19 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f081775af36 in __GI___assert_fail (assertion=0x5564e69c1468 "! is_set() || m_can_overwrite_status", file=0x5564e69c1200 "/data/src/10.4/sql/sql_error.cc", line=445, function=0x5564e69c1490 "void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*)") at assert.c:101
#8  0x00005564e5c71870 in Diagnostics_area::set_error_status (this=0x7f07f40067a8, sql_errno=1205, message=0x7f08116dc3a0 "Lock wait timeout exceeded; try restarting transaction", sqlstate=0x5564e69e6e6d "HY000", ucid=..., error_condition=0x0) at /data/src/10.4/sql/sql_error.cc:445
#9  0x00005564e5c4b83f in THD::raise_condition (this=0x7f07f4000d90, sql_errno=1205, sqlstate=0x5564e69e6e6d "HY000", level=Sql_state_errno_level::WARN_LEVEL_ERROR, ucid=..., msg=0x7f08116dc3a0 "Lock wait timeout exceeded; try restarting transaction") at /data/src/10.4/sql/sql_class.cc:1103
#10 0x00005564e5b8cb50 in THD::raise_condition (this=0x7f07f4000d90, sql_errno=1205, sqlstate=0x0, level=Sql_state_errno_level::WARN_LEVEL_ERROR, msg=0x7f08116dc3a0 "Lock wait timeout exceeded; try restarting transaction") at /data/src/10.4/sql/sql_class.h:4400
#11 0x00005564e5b7f530 in my_message_sql (error=1205, str=0x7f08116dc3a0 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /data/src/10.4/sql/mysqld.cc:3346
#12 0x00005564e68dd15f in my_error (nr=1205, MyFlags=0) at /data/src/10.4/mysys/my_error.c:125
#13 0x00005564e5e5f3eb in MDL_context::acquire_lock (this=0x7f07f4000eb0, mdl_request=0x7f08116dc870, lock_wait_timeout=1) at /data/src/10.4/sql/mdl.cc:2365
#14 0x00005564e6173c16 in MYSQL_BIN_LOG::write (this=0x5564e7479300 <mysql_bin_log>, event_info=0x7f08116dcab0, with_annotate=0x0) at /data/src/10.4/sql/log.cc:6393
#15 0x00005564e5c5cd1e in THD::binlog_query (this=0x7f07f4000d90, qtype=THD::STMT_QUERY_TYPE, query_arg=0x7f07f4014888 "ALTER TABLE tmp", query_len=15, is_trans=false, direct=false, suppress_use=false, errcode=0) at /data/src/10.4/sql/sql_class.cc:7209
#16 0x00005564e5da7114 in write_bin_log (thd=0x7f07f4000d90, clear_error=true, query=0x7f07f4014888 "ALTER TABLE tmp", query_length=15, is_trans=false) at /data/src/10.4/sql/sql_table.cc:1987
#17 0x00005564e5dbe074 in mysql_alter_table (thd=0x7f07f4000d90, new_db=0x7f07f4005560, new_name=0x7f07f4005988, create_info=0x7f08116df330, table_list=0x7f07f4014948, alter_info=0x7f08116df270, order_num=0, order=0x0, ignore=false) at /data/src/10.4/sql/sql_table.cc:9815
#18 0x00005564e5e59264 in Sql_cmd_alter_table::execute (this=0x7f07f4015010, thd=0x7f07f4000d90) at /data/src/10.4/sql/sql_alter.cc:520
#19 0x00005564e5ccfd9f in mysql_execute_command (thd=0x7f07f4000d90) at /data/src/10.4/sql/sql_parse.cc:6186
#20 0x00005564e5cd519d in mysql_parse (thd=0x7f07f4000d90, rawbuf=0x7f07f4014888 "ALTER TABLE tmp", length=15, parser_state=0x7f08116e0550, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7989
#21 0x00005564e5cc13f9 in dispatch_command (command=COM_QUERY, thd=0x7f07f4000d90, packet=0x7f07f400ace1 "ALTER TABLE tmp", packet_length=15, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1855
#22 0x00005564e5cbfc61 in do_command (thd=0x7f07f4000d90) at /data/src/10.4/sql/sql_parse.cc:1373
#23 0x00005564e5e4f9c1 in do_handle_one_connection (connect=0x5564e92ebfd0) at /data/src/10.4/sql/sql_connect.cc:1412
#24 0x00005564e5e4f70a in handle_one_connection (arg=0x5564e92ebfd0) at /data/src/10.4/sql/sql_connect.cc:1316
#25 0x00005564e68784e8 in pfs_spawn_thread (arg=0x5564e92381d0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#26 0x00007f0817fdb609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#27 0x00007f0817846293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
Query (0x7f07f4014888): ALTER TABLE tmp
 
Connection ID (thread ID): 5
Status: NOT_KILLED

The problem was introduced in 10.4.16/10.5.7 by this commit:

commit 16ea692ed490a1a2c6dc0b855247d5bd02bd2345 (HEAD)
Author: Monty
Date:   Mon Sep 7 10:38:12 2020 +0300
 
    MDEV-23586 Mariabackup: GTID saved for replication in 10.4.14 is wrong
    
    MDEV-21953 deadlock between BACKUP STAGE BLOCK_COMMIT and parallel
    replication
    
    Fixed by partly reverting MDEV-21953 to put back MDL_BACKUP_COMMIT locking
    before log_and_order.
    
    The original problem for MDEV-21953 was that while a thread was waiting in
    for another threads to commit in 'log_and_order', it had the
    MDL_BACKUP_COMMIT lock. The backup thread was waiting to get the
    MDL_BACKUP_WAIT_COMMIT lock, which blocks all new MDL_BACKUP_COMMIT locks.
    This causes a deadlock as the waited-for thread can never get past the
    MDL_BACKUP_COMMIT lock in ha_commit_trans.
    
    The main part of the bug fix is to release the MDL_BACKUP_COMMIT lock while
    a thread is waiting for other 'previous' threads to commit. This ensures
    that no transactional thread keeps MDL_BACKUP_COMMIT while waiting, which
    ensures that there are no deadlocks anymore.

I can't figure from the commit comment whether the effect with blocking temporary table actions was intentional. The assertion failure obviously wasn't.



 Comments   
Comment by Michael Widenius [ 2021-04-06 ]

Fixed by adding a MDL_BACKUP_COMMIT lock before altering temporary tables whose creation was logged to binary log (in which case the ALTER TABLE must also be logged)

Comment by Michael Widenius [ 2021-04-07 ]

Pushed to 10.4

Generated at Thu Feb 08 09:36:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.