[MDEV-23305] Assertions failed in Diagnostics_area upon actions with transactional Aria table under parallel BACKUP STAGE BLOCK_COMMIT or FTWRL Created: 2020-07-27  Updated: 2021-04-17  Resolved: 2021-04-06

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - Aria
Affects Version/s: 10.4
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Won't Fix Votes: 0
Labels: affects-tests, not-10.5


 Description   

CREATE TABLE t1 (a INT) ENGINE=Aria TRANSACTIONAL=1;
BACKUP STAGE START;
BACKUP STAGE BLOCK_COMMIT;
 
--connect (con1,localhost,root,,test)
SET lock_wait_timeout= 1;
--error 0,ER_LOCK_WAIT_TIMEOUT
UPDATE t1 SET a = 1;
 
# Cleanup
--disconnect con1
--connection default
BACKUP STAGE END;
DROP TABLE t1;

10.4 05d62518

mysqld: /data/src/10.4/sql/sql_error.cc:445: void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*): Assertion `! is_set() || m_can_overwrite_status' failed.
200728  2:35:56 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f6bfc12ef12 in __GI___assert_fail (assertion=0x555ab4660d98 "! is_set() || m_can_overwrite_status", file=0x555ab4660c68 "/data/src/10.4/sql/sql_error.cc", line=445, function=0x555ab4661100 <Diagnostics_area::set_error_status(unsigned int, char const*, char const*, Sql_user_condition_identity const&, Sql_condition const*)::__PRETTY_FUNCTION__> "void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*)") at assert.c:101
#8  0x0000555ab391cd9c in Diagnostics_area::set_error_status (this=0x7f6bd80064d8, sql_errno=1205, message=0x7f6bf628c660 "Lock wait timeout exceeded; try restarting transaction", sqlstate=0x555ab468897d "HY000", ucid=..., error_condition=0x0) at /data/src/10.4/sql/sql_error.cc:445
#9  0x0000555ab38f7f27 in THD::raise_condition (this=0x7f6bd8000af0, sql_errno=1205, sqlstate=0x555ab468897d "HY000", level=Sql_state_errno_level::WARN_LEVEL_ERROR, ucid=..., msg=0x7f6bf628c660 "Lock wait timeout exceeded; try restarting transaction") at /data/src/10.4/sql/sql_class.cc:1100
#10 0x0000555ab383e184 in THD::raise_condition (this=0x7f6bd8000af0, sql_errno=1205, sqlstate=0x0, level=Sql_state_errno_level::WARN_LEVEL_ERROR, msg=0x7f6bf628c660 "Lock wait timeout exceeded; try restarting transaction") at /data/src/10.4/sql/sql_class.h:4372
#11 0x0000555ab3830e46 in my_message_sql (error=1205, str=0x7f6bf628c660 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /data/src/10.4/sql/mysqld.cc:3377
#12 0x0000555ab4555d6e in my_error (nr=1205, MyFlags=0) at /data/src/10.4/mysys/my_error.c:125
#13 0x0000555ab3b01d04 in MDL_context::acquire_lock (this=0x7f6bd8000c10, mdl_request=0x7f6bf628ca00, lock_wait_timeout=1) at /data/src/10.4/sql/mdl.cc:2365
#14 0x0000555ab3cc846f in ha_maria_implicit_commit (thd=0x7f6bd8000af0, new_trn=false) at /data/src/10.4/sql/handler.cc:144
#15 0x0000555ab3979e0d in mysql_execute_command (thd=0x7f6bd8000af0) at /data/src/10.4/sql/sql_parse.cc:6159
#16 0x0000555ab397ec91 in mysql_parse (thd=0x7f6bd8000af0, rawbuf=0x7f6bd8011dd8 "UPDATE t1 SET a = 1", length=19, parser_state=0x7f6bf628d570, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7896
#17 0x0000555ab396b1c6 in dispatch_command (command=COM_QUERY, thd=0x7f6bd8000af0, packet=0x7f6bd80083a1 "UPDATE t1 SET a = 1", packet_length=19, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1835
#18 0x0000555ab3969968 in do_command (thd=0x7f6bd8000af0) at /data/src/10.4/sql/sql_parse.cc:1353
#19 0x0000555ab3af2b0c in do_handle_one_connection (connect=0x555ab74f4520) at /data/src/10.4/sql/sql_connect.cc:1412
#20 0x0000555ab3af285b in handle_one_connection (arg=0x555ab74f4520) at /data/src/10.4/sql/sql_connect.cc:1316
#21 0x0000555ab44f2879 in pfs_spawn_thread (arg=0x555ab7452190) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#22 0x00007f6bfe0b74a4 in start_thread (arg=0x7f6bf628e700) at pthread_create.c:456
#23 0x00007f6bfc1ebd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

No obvious problem on a non-debug build.
Not reproducible on 10.5.
The test case is not applicable to earlier versions due to backup stages.



 Comments   
Comment by Elena Stepanova [ 2020-08-15 ]

Same with SELECT * INTO OUTFILE instead of UPDATE.

Comment by Elena Stepanova [ 2021-04-04 ]

Here is another variation of what I hope to be the same problem, it demonstrates the non-debug side.
In the test case below, connection default which acquired WRITE CONCURRENT lock on t1 before a concurrent connection starts backup stages, now hangs (waits for a lock) upon any action at all. In the test case it's SELECT 1 and UNLOCK TABLES. Eventually it ends with a lock wait timeout, both on debug and non-debug build. On a debug build, it further causes an assertion failure in diagnostics area, either in Diagnostics_area::set_error_status or in Diagnostics_area::set_ok_status.

Test case with SELECT 1

CREATE TABLE t1 (a INT) ENGINE=Aria TRANSACTIONAL=1;
LOCK TABLE t1 WRITE CONCURRENT;
SET lock_wait_timeout= 1;
 
--connect (con1,localhost,root,,test)
BACKUP STAGE START;
BACKUP STAGE BLOCK_COMMIT;
 
--connection default
# Any action at all here fails
# either with lock wait timeout (on non-debug)
# or with a debug assertion failure
SELECT 1;
 
# Cleanup
--connection con1
BACKUP STAGE END;
--disconnect con1
--connection default
UNLOCK TABLES;
DROP TABLE t1;

10.4 880baedc non-debug

mysqltest: At line 13: query 'SELECT 1' failed: 1205: Lock wait timeout exceeded; try restarting transaction

10.4 880baedc debug

mysqld: /data/src/10.4/sql/sql_error.cc:445: void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*): Assertion `! is_set() || m_can_overwrite_status' failed.
210404 16:26:11 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f5f06c70f36 in __GI___assert_fail (assertion=0x55737f470468 "! is_set() || m_can_overwrite_status", file=0x55737f470200 "/data/src/10.4/sql/sql_error.cc", line=445, function=0x55737f470490 "void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*)") at assert.c:101
#8  0x000055737e720870 in Diagnostics_area::set_error_status (this=0x7f5ef00067a8, sql_errno=1205, message=0x7f5f00c406a0 "Lock wait timeout exceeded; try restarting transaction", sqlstate=0x55737f495e6d "HY000", ucid=..., error_condition=0x0) at /data/src/10.4/sql/sql_error.cc:445
#9  0x000055737e6fa83f in THD::raise_condition (this=0x7f5ef0000d90, sql_errno=1205, sqlstate=0x55737f495e6d "HY000", level=Sql_state_errno_level::WARN_LEVEL_ERROR, ucid=..., msg=0x7f5f00c406a0 "Lock wait timeout exceeded; try restarting transaction") at /data/src/10.4/sql/sql_class.cc:1103
#10 0x000055737e63bb50 in THD::raise_condition (this=0x7f5ef0000d90, sql_errno=1205, sqlstate=0x0, level=Sql_state_errno_level::WARN_LEVEL_ERROR, msg=0x7f5f00c406a0 "Lock wait timeout exceeded; try restarting transaction") at /data/src/10.4/sql/sql_class.h:4400
#11 0x000055737e62e530 in my_message_sql (error=1205, str=0x7f5f00c406a0 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /data/src/10.4/sql/mysqld.cc:3346
#12 0x000055737f38c15f in my_error (nr=1205, MyFlags=0) at /data/src/10.4/mysys/my_error.c:125
#13 0x000055737e90e3eb in MDL_context::acquire_lock (this=0x7f5ef0000eb0, mdl_request=0x7f5f00c40a40, lock_wait_timeout=1) at /data/src/10.4/sql/mdl.cc:2365
#14 0x000055737eae1cd6 in ha_maria_implicit_commit (thd=0x7f5ef0000d90, new_trn=false) at /data/src/10.4/sql/handler.cc:144
#15 0x000055737e77f418 in mysql_execute_command (thd=0x7f5ef0000d90) at /data/src/10.4/sql/sql_parse.cc:6247
#16 0x000055737e78419d in mysql_parse (thd=0x7f5ef0000d90, rawbuf=0x7f5ef0014788 "SELECT 1", length=8, parser_state=0x7f5f00c41550, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7989
#17 0x000055737e7703f9 in dispatch_command (command=COM_QUERY, thd=0x7f5ef0000d90, packet=0x7f5ef000abe1 "", packet_length=8, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1855
#18 0x000055737e76ec61 in do_command (thd=0x7f5ef0000d90) at /data/src/10.4/sql/sql_parse.cc:1373
#19 0x000055737e8fe9c1 in do_handle_one_connection (connect=0x557382788c70) at /data/src/10.4/sql/sql_connect.cc:1412
#20 0x000055737e8fe70a in handle_one_connection (arg=0x557382788c70) at /data/src/10.4/sql/sql_connect.cc:1316
#21 0x000055737f3274e8 in pfs_spawn_thread (arg=0x557382761ce0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#22 0x00007f5f074f1609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#23 0x00007f5f06d5c293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
Query (0x7f5ef0014788): SELECT 1
 
Connection ID (thread ID): 4
Status: NOT_KILLED

Test case with UNLOCK TABLES

CREATE TABLE t1 (a INT) ENGINE=Aria TRANSACTIONAL=1;
LOCK TABLE t1 WRITE CONCURRENT;
SET lock_wait_timeout= 1;
 
--connect (con1,localhost,root,,test)
BACKUP STAGE START;
BACKUP STAGE BLOCK_COMMIT;
 
--connection default
# Any action at all here fails
# either with lock wait timeout (on non-debug)
# or with a debug assertion failure
UNLOCK TABLES;
 
# Cleanup
--connection con1
BACKUP STAGE END;
--disconnect con1
--connection default
UNLOCK TABLES;
DROP TABLE t1;

non-debug

connection default;
UNLOCK TABLES;
bug.t2                                   [ fail ]
        Test ended at 2021-04-04 16:27:32
 
CURRENT_TEST: bug.t2
mysqltest: At line 13: query 'UNLOCK TABLES' failed: 1205: Lock wait timeout exceeded; try restarting transaction

debug

mysqld: /data/src/10.4/sql/sql_error.cc:335: void Diagnostics_area::set_ok_status(ulonglong, ulonglong, const char*): Assertion `!is_set() || (m_status == DA_OK_BULK && is_bulk_op())' failed.
210404 16:28:41 [ERROR] mysqld got signal 6 ;
 
#7  0x00007ff368c25f36 in __GI___assert_fail (assertion=0x560e5e4fc378 "!is_set() || (m_status == DA_OK_BULK && is_bulk_op())", file=0x560e5e4fc200 "/data/src/10.4/sql/sql_error.cc", line=335, function=0x560e5e4fc3b0 "void Diagnostics_area::set_ok_status(ulonglong, ulonglong, const char*)") at assert.c:101
#8  0x0000560e5d7ac342 in Diagnostics_area::set_ok_status (this=0x7ff3500067a8, affected_rows=0, last_insert_id=0, message=0x0) at /data/src/10.4/sql/sql_error.cc:335
#9  0x0000560e5d74d76a in my_ok (thd=0x7ff350000d90, affected_rows_arg=0, id=0, message=0x0) at /data/src/10.4/sql/sql_class.h:5062
#10 0x0000560e5d8066f2 in mysql_execute_command (thd=0x7ff350000d90) at /data/src/10.4/sql/sql_parse.cc:5059
#11 0x0000560e5d81019d in mysql_parse (thd=0x7ff350000d90, rawbuf=0x7ff350014788 "UNLOCK TABLES", length=13, parser_state=0x7ff362bf6550, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7989
#12 0x0000560e5d7fc3f9 in dispatch_command (command=COM_QUERY, thd=0x7ff350000d90, packet=0x7ff35000abe1 "UNLOCK TABLES", packet_length=13, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1855
#13 0x0000560e5d7fac61 in do_command (thd=0x7ff350000d90) at /data/src/10.4/sql/sql_parse.cc:1373
#14 0x0000560e5d98a9c1 in do_handle_one_connection (connect=0x560e60472c70) at /data/src/10.4/sql/sql_connect.cc:1412
#15 0x0000560e5d98a70a in handle_one_connection (arg=0x560e60472c70) at /data/src/10.4/sql/sql_connect.cc:1316
#16 0x0000560e5e3b34e8 in pfs_spawn_thread (arg=0x560e6044bce0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#17 0x00007ff3694a6609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#18 0x00007ff368d11293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
Query (0x7f5e64014788): UNLOCK TABLES
 
Connection ID (thread ID): 4
Status: NOT_KILLED

Again, only 10.4 is affected, even though the locks which are held after BACKUP STAGE BLOCK_COMMIT look identical on 10.4 and 10.5:

SELECT * FROM information_schema.metadata_lock_info;
THREAD_ID	LOCK_MODE	LOCK_DURATION	LOCK_TYPE	TABLE_SCHEMA	TABLE_NAME
4	MDL_BACKUP_TRANS_DML	NULL	Backup lock		
5	MDL_BACKUP_WAIT_COMMIT	NULL	Backup lock		
4	MDL_SHARED_WRITE	NULL	Table metadata lock	test	t1

Comment by Michael Widenius [ 2021-04-06 ]

This cannot easily be removed in 10.4

The issue is that ha_maria_implicit_commit() can generate an error if there is long active BACKUP_COMMIT lock (should not happen in any normal MariaDB usage scenario) and the upper level is not catching it. Modifying the code to catch this is a bit tricky.

The effect on non debug builds is that the user get a "Lock timeout" error, but the Aria table changes are committed anyway.

The suggested solutions was to:
1) Ignore any errror any MDL error in ha_maria_implicit_commit()
2) Change the error to a warning (by calling clear_error())

In 10.5 it will not happen as Aria is now working as a fully transactional table and all this problematic code has gone away

In the end Elena choose to let things be as it is.

Comment by Michael Widenius [ 2021-04-06 ]

Agreed with Elena to not fix this issue

Comment by Elena Stepanova [ 2021-04-07 ]

For a record, the same failures happen (in some scenarios) with FLUSH TABLES WITH READ LOCK instead of BACKUP locks; also exclusively on 10.4.
I will modify "Summary" of this issue and will change the component from Backup to Locking to reflect it.
For example:

CREATE TABLE t1 (a INT) ENGINE=Aria;
INSERT INTO t1 VALUES (1),(2);
FLUSH TABLES WITH READ LOCK;
 
--connect (con1,localhost,root,,)
SET lock_wait_timeout= 1;
EXPLAIN SELECT a FROM t1 WHERE a > 0;
 
# Cleanup
--disconnect con1
--connection default
UNLOCK TABLES;
DROP TABLE t1;

10.4 5b71e042 debug

mysqld: /data/src/10.4/sql/sql_error.cc:445: void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*): Assertion `! is_set() || m_can_overwrite_status' failed.
210407 20:23:47 [ERROR] mysqld got signal 6 ;
 
#7  0x00007fdc2e8d5f36 in __GI___assert_fail (assertion=0x5628dd614448 "! is_set() || m_can_overwrite_status", file=0x5628dd6141e0 "/data/src/10.4/sql/sql_error.cc", line=445, function=0x5628dd614470 "void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_user_condition_identity&, const Sql_condition*)") at assert.c:101
#8  0x00005628dc8c4756 in Diagnostics_area::set_error_status (this=0x7fdc0c0067a8, sql_errno=1205, message=0x7fdc2885a6a0 "Lock wait timeout exceeded; try restarting transaction", sqlstate=0x5628dd639e4d "HY000", ucid=..., error_condition=0x0) at /data/src/10.4/sql/sql_error.cc:445
#9  0x00005628dc89e74d in THD::raise_condition (this=0x7fdc0c000d90, sql_errno=1205, sqlstate=0x5628dd639e4d "HY000", level=Sql_state_errno_level::WARN_LEVEL_ERROR, ucid=..., msg=0x7fdc2885a6a0 "Lock wait timeout exceeded; try restarting transaction") at /data/src/10.4/sql/sql_class.cc:1103
#10 0x00005628dc7dfad6 in THD::raise_condition (this=0x7fdc0c000d90, sql_errno=1205, sqlstate=0x0, level=Sql_state_errno_level::WARN_LEVEL_ERROR, msg=0x7fdc2885a6a0 "Lock wait timeout exceeded; try restarting transaction") at /data/src/10.4/sql/sql_class.h:4400
#11 0x00005628dc7d2530 in my_message_sql (error=1205, str=0x7fdc2885a6a0 "Lock wait timeout exceeded; try restarting transaction", MyFlags=0) at /data/src/10.4/sql/mysqld.cc:3338
#12 0x00005628dd52fe57 in my_error (nr=1205, MyFlags=0) at /data/src/10.4/mysys/my_error.c:125
#13 0x00005628dcab207d in MDL_context::acquire_lock (this=0x7fdc0c000eb0, mdl_request=0x7fdc2885aa40, lock_wait_timeout=1) at /data/src/10.4/sql/mdl.cc:2365
#14 0x00005628dcc85a8e in ha_maria_implicit_commit (thd=0x7fdc0c000d90, new_trn=false) at /data/src/10.4/sql/handler.cc:144
#15 0x00005628dc9231bc in mysql_execute_command (thd=0x7fdc0c000d90) at /data/src/10.4/sql/sql_parse.cc:6247
#16 0x00005628dc927f31 in mysql_parse (thd=0x7fdc0c000d90, rawbuf=0x7fdc0c015628 "EXPLAIN SELECT a FROM t1 WHERE a > 0", length=36, parser_state=0x7fdc2885b550, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7989
#17 0x00005628dc914281 in dispatch_command (command=COM_QUERY, thd=0x7fdc0c000d90, packet=0x7fdc0c00abe1 "", packet_length=36, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1855
#18 0x00005628dc912af9 in do_command (thd=0x7fdc0c000d90) at /data/src/10.4/sql/sql_parse.cc:1373
#19 0x00005628dcaa26b9 in do_handle_one_connection (connect=0x5628dff03670) at /data/src/10.4/sql/sql_connect.cc:1412
#20 0x00005628dcaa2402 in handle_one_connection (arg=0x5628dff03670) at /data/src/10.4/sql/sql_connect.cc:1316
#21 0x00005628dd4cb1e0 in pfs_spawn_thread (arg=0x5628dfdfe5f0) at /data/src/10.4/storage/perfschema/pfs.cc:1869
#22 0x00007fdc2f156609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#23 0x00007fdc2e9c1293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

non-debug

EXPLAIN SELECT a FROM t1 WHERE a > 0;
bug.da1                                  [ fail ]
        Test ended at 2021-04-07 20:24:59
 
CURRENT_TEST: bug.da1
mysqltest: At line 7: query 'EXPLAIN SELECT a FROM t1 WHERE a > 0' failed: 1205: Lock wait timeout exceeded; try restarting transaction

Curiously, SELECT itself, without EXPLAIN, doesn't cause the same effect – neither the timeout nor the resulting assertion failure.
10.3 and 10.5 are not affected, with or without EXPLAIN.

Generated at Thu Feb 08 09:21:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.