[MDEV-21398] Deadlock (server hang) or assertion failure in Diagnostics_area::set_error_status upon ALTER under lock Created: 2019-12-26  Updated: 2021-03-02  Resolved: 2020-05-24

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Locking
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: regression

Issue Links:
Relates
relates to MDEV-21502 Server hang or ASAN heap-use-after-fr... Closed
relates to MDEV-23836 Assertion `! is_set() || m_can_overwr... Closed
relates to MDEV-25038 Server crash or ASAN errors in _ma_re... Closed
relates to MDEV-22881 Unexpected errors, corrupt output, Va... Closed

 Description   

CREATE TABLE t1 (a INT) ENGINE=MyISAM;
 
LOCK TABLE t1 WRITE, t1 AS t1a READ;
ALTER TABLE t1 CHANGE COLUMN IF EXISTS x xx INT;
 
# Cleanup
UNLOCK TABLES;
DROP TABLE t1;

The ALTER hangs seemingly forever. lock_wait_timeout doesn't have any effect.

10.2 2ea37b21 all threads from the hanging server

Thread 6 (Thread 0x7f4bb22b5700 (LWP 2899)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000055b8593fc25c in safe_cond_timedwait (cond=0x7f4ba00008c8, mp=0x7f4ba000b930, abstime=0x7f4bb22b3320, file=0x55b8597739b0 "/data/src/10.2/include/mysql/psi/mysql_thread.h", line=1175) at /data/src/10.2/mysys/thr_mutex.c:545
#2  0x000055b8593f8832 in inline_mysql_cond_timedwait (that=0x7f4ba00008c8, mutex=0x7f4ba000b930, abstime=0x7f4bb22b3320, src_file=0x55b8597739e0 "/data/src/10.2/mysys/thr_lock.c", src_line=575) at /data/src/10.2/include/mysql/psi/mysql_thread.h:1175
#3  0x000055b8593f8f07 in wait_for_lock (wait=0x7f4ba000ba00, data=0x7f4ba000c0e0, in_wait_list=0 '\000', lock_wait_timeout=31536000) at /data/src/10.2/mysys/thr_lock.c:575
#4  0x000055b8593f9a60 in thr_lock (data=0x7f4ba000c0e0, owner=0x7f4ba0002420, lock_wait_timeout=31536000) at /data/src/10.2/mysys/thr_lock.c:1013
#5  0x000055b8593fa34f in thr_multi_lock (data=0x7f4ba006ec70, count=1, owner=0x7f4ba0002420, lock_wait_timeout=31536000) at /data/src/10.2/mysys/thr_lock.c:1294
#6  0x000055b858db3bcc in mysql_lock_tables (thd=0x7f4ba0000af0, sql_lock=0x7f4ba006ec50, flags=2155) at /data/src/10.2/sql/lock.cc:351
#7  0x000055b858db39ce in mysql_lock_tables (thd=0x7f4ba0000af0, tables=0x7f4ba007cda8, count=1, flags=2155) at /data/src/10.2/sql/lock.cc:303
#8  0x000055b8589abd42 in Locked_tables_list::reopen_tables (this=0x7f4ba0004558, thd=0x7f4ba0000af0, need_reopen=true) at /data/src/10.2/sql/sql_base.cc:2476
#9  0x000055b858a25e90 in mysql_execute_command (thd=0x7f4ba0000af0) at /data/src/10.2/sql/sql_parse.cc:6002
#10 0x000055b858a2ab64 in mysql_parse (thd=0x7f4ba0000af0, rawbuf=0x7f4ba0012448 "ALTER TABLE t1 CHANGE COLUMN IF EXISTS x xx INT", length=47, parser_state=0x7f4bb22b4200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7740
#11 0x000055b858a18e7f in dispatch_command (command=COM_QUERY, thd=0x7f4ba0000af0, packet=0x7f4ba008d981 "ALTER TABLE t1 CHANGE COLUMN IF EXISTS x xx INT", packet_length=47, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1831
#12 0x000055b858a177d3 in do_command (thd=0x7f4ba0000af0) at /data/src/10.2/sql/sql_parse.cc:1384
#13 0x000055b858b6d2d9 in do_handle_one_connection (connect=0x55b85bce8cd0) at /data/src/10.2/sql/sql_connect.cc:1336
#14 0x000055b858b6d044 in handle_one_connection (arg=0x55b85bce8cd0) at /data/src/10.2/sql/sql_connect.cc:1241
#15 0x000055b85939a162 in pfs_spawn_thread (arg=0x55b85bc345f0) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#16 0x00007f4bb9f094a4 in start_thread (arg=0x7f4bb22b5700) at pthread_create.c:456
#17 0x00007f4bb803dd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
 
Thread 5 (Thread 0x7f4bb2300700 (LWP 2896)):
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x000055b8593fbf52 in safe_cond_wait (cond=0x55b859fa9d80 <COND_slave_background>, mp=0x55b859facf80 <LOCK_slave_background>, file=0x55b859498340 "/data/src/10.2/include/mysql/psi/mysql_thread.h", line=1138) at /data/src/10.2/mysys/thr_mutex.c:491
#2  0x000055b85896444c in inline_mysql_cond_wait (that=0x55b859fa9d80 <COND_slave_background>, mutex=0x55b859facf80 <LOCK_slave_background>, src_file=0x55b859498bd5 "/data/src/10.2/sql/slave.cc", src_line=337) at /data/src/10.2/include/mysql/psi/mysql_thread.h:1138
#3  0x000055b858965091 in handle_slave_background (arg=0x0) at /data/src/10.2/sql/slave.cc:337
#4  0x000055b85939a162 in pfs_spawn_thread (arg=0x55b85bc2d1c0) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#5  0x00007f4bb9f094a4 in start_thread (arg=0x7f4bb2300700) at pthread_create.c:456
#6  0x00007f4bb803dd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
 
Thread 4 (Thread 0x7f4bb234b700 (LWP 2895)):
#0  do_sigwait (set=<optimized out>, set@entry=0x7f4bb234ae30, sig=sig@entry=0x7f4bb234ae00) at ../sysdeps/unix/sysv/linux/sigwait.c:64
#1  0x00007f4bb9f12d0d in __sigwait (set=0x7f4bb234ae30, sig=0x7f4bb234ae00) at ../sysdeps/unix/sysv/linux/sigwait.c:96
#2  0x000055b85893b5f1 in signal_hand (arg=0x0) at /data/src/10.2/sql/mysqld.cc:3511
#3  0x000055b85939a162 in pfs_spawn_thread (arg=0x55b85bc2cd80) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#4  0x00007f4bb9f094a4 in start_thread (arg=0x7f4bb234b700) at pthread_create.c:456
#5  0x00007f4bb803dd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
 
Thread 3 (Thread 0x7f4bb2b4c700 (LWP 2894)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000055b8593fc25c in safe_cond_timedwait (cond=0x55b85a82f8c0 <COND_checkpoint>, mp=0x55b85a82f800 <LOCK_checkpoint>, abstime=0x7f4bb2b4be00, file=0x55b859737300 "/data/src/10.2/include/mysql/psi/mysql_thread.h", line=1175) at /data/src/10.2/mysys/thr_mutex.c:545
#2  0x000055b8592792e2 in inline_mysql_cond_timedwait (that=0x55b85a82f8c0 <COND_checkpoint>, mutex=0x55b85a82f800 <LOCK_checkpoint>, abstime=0x7f4bb2b4be00, src_file=0x55b859737330 "/data/src/10.2/storage/maria/ma_servicethread.c", src_line=116) at /data/src/10.2/include/mysql/psi/mysql_thread.h:1175
#3  0x000055b8592797e4 in my_service_thread_sleep (control=0x55b859deb940 <checkpoint_control>, sleep_time=30000000000) at /data/src/10.2/storage/maria/ma_servicethread.c:115
#4  0x000055b85926e28d in ma_checkpoint_background (arg=0x1e) at /data/src/10.2/storage/maria/ma_checkpoint.c:709
#5  0x000055b85939a162 in pfs_spawn_thread (arg=0x55b85bbfe630) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#6  0x00007f4bb9f094a4 in start_thread (arg=0x7f4bb2b4c700) at pthread_create.c:456
#7  0x00007f4bb803dd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
 
Thread 2 (Thread 0x7f4bb3eb6700 (LWP 2893)):
#0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:225
#1  0x000055b8593fc25c in safe_cond_timedwait (cond=0x55b85a840f20 <COND_timer>, mp=0x55b85a840e60 <LOCK_timer>, abstime=0x7f4bb3eb5eb0, file=0x55b8597744a0 "/data/src/10.2/include/mysql/psi/mysql_thread.h", line=1175) at /data/src/10.2/mysys/thr_mutex.c:545
#2  0x000055b8593fd1d1 in inline_mysql_cond_timedwait (that=0x55b85a840f20 <COND_timer>, mutex=0x55b85a840e60 <LOCK_timer>, abstime=0x7f4bb3eb5eb0, src_file=0x55b8597744d0 "/data/src/10.2/mysys/thr_timer.c", src_line=292) at /data/src/10.2/include/mysql/psi/mysql_thread.h:1175
#3  0x000055b8593fdcde in timer_handler (arg=0x0) at /data/src/10.2/mysys/thr_timer.c:292
#4  0x000055b85939a162 in pfs_spawn_thread (arg=0x55b85baf5be0) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#5  0x00007f4bb9f094a4 in start_thread (arg=0x7f4bb3eb6700) at pthread_create.c:456
#6  0x00007f4bb803dd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
 
Thread 1 (Thread 0x7f4bba320ec0 (LWP 2892)):
#0  0x00007f4bb80348bd in poll () at ../sysdeps/unix/syscall-template.S:84
#1  0x000055b858941384 in handle_connections_sockets () at /data/src/10.2/sql/mysqld.cc:6659
#2  0x000055b858940666 in mysqld_main (argc=128, argv=0x55b85b9d62f0) at /data/src/10.2/sql/mysqld.cc:6125
#3  0x000055b858934e90 in main (argc=6, argv=0x7fff8349c638) at /data/src/10.2/sql/main.cc:25

If the query is killed manually, a debug server aborts with the assertion failure:

mysqld: /data/src/10.2/sql/sql_error.cc:487: void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_condition*): Assertion `! is_set() || m_can_overwrite_status' failed.
191226 16:58:05 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f4bb7f80f12 in __GI___assert_fail (assertion=0x55b8594ae6b0 "! is_set() || m_can_overwrite_status", file=0x55b8594ae570 "/data/src/10.2/sql/sql_error.cc", line=487, function=0x55b8594aea20 <Diagnostics_area::set_error_status(unsigned int, char const*, char const*, Sql_condition const*)::__PRETTY_FUNCTION__> "void Diagnostics_area::set_error_status(uint, const char*, const char*, const Sql_condition*)") at assert.c:101
#8  0x000055b8589f0307 in Diagnostics_area::set_error_status (this=0x7f4ba0005c98, sql_errno=1317, message=0x55b859574590 "Query execution was interrupted", sqlstate=0x55b8594c9d33 "70100", error_condition=0x0) at /data/src/10.2/sql/sql_error.cc:487
#9  0x000055b8589d17f1 in THD::raise_condition (this=0x7f4ba0000af0, sql_errno=1317, sqlstate=0x55b8594c9d33 "70100", level=Sql_condition::WARN_LEVEL_ERROR, msg=0x55b859574590 "Query execution was interrupted") at /data/src/10.2/sql/sql_class.cc:1052
#10 0x000055b85893ba60 in my_message_sql (error=1317, str=0x55b859574590 "Query execution was interrupted", MyFlags=0) at /data/src/10.2/sql/mysqld.cc:3640
#11 0x000055b8593e7495 in my_message (error=1317, str=0x55b859574590 "Query execution was interrupted", MyFlags=0) at /data/src/10.2/mysys/my_error.c:196
#12 0x000055b8589bdb0c in THD::send_kill_message (this=0x7f4ba0000af0) at /data/src/10.2/sql/sql_class.h:3766
#13 0x000055b858db3c58 in mysql_lock_tables (thd=0x7f4ba0000af0, sql_lock=0x7f4ba006ec50, flags=2155) at /data/src/10.2/sql/lock.cc:363
#14 0x000055b858db39ce in mysql_lock_tables (thd=0x7f4ba0000af0, tables=0x7f4ba007cda8, count=1, flags=2155) at /data/src/10.2/sql/lock.cc:303
#15 0x000055b8589abd42 in Locked_tables_list::reopen_tables (this=0x7f4ba0004558, thd=0x7f4ba0000af0, need_reopen=true) at /data/src/10.2/sql/sql_base.cc:2476
#16 0x000055b858a25e90 in mysql_execute_command (thd=0x7f4ba0000af0) at /data/src/10.2/sql/sql_parse.cc:6002
#17 0x000055b858a2ab64 in mysql_parse (thd=0x7f4ba0000af0, rawbuf=0x7f4ba0012448 "ALTER TABLE t1 CHANGE COLUMN IF EXISTS x xx INT", length=47, parser_state=0x7f4bb22b4200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7740
#18 0x000055b858a18e7f in dispatch_command (command=COM_QUERY, thd=0x7f4ba0000af0, packet=0x7f4ba008d981 "ALTER TABLE t1 CHANGE COLUMN IF EXISTS x xx INT", packet_length=47, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1831
#19 0x000055b858a177d3 in do_command (thd=0x7f4ba0000af0) at /data/src/10.2/sql/sql_parse.cc:1384
#20 0x000055b858b6d2d9 in do_handle_one_connection (connect=0x55b85bce8cd0) at /data/src/10.2/sql/sql_connect.cc:1336
#21 0x000055b858b6d044 in handle_one_connection (arg=0x55b85bce8cd0) at /data/src/10.2/sql/sql_connect.cc:1241
#22 0x000055b85939a162 in pfs_spawn_thread (arg=0x55b85bc345f0) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#23 0x00007f4bb9f094a4 in start_thread (arg=0x7f4bb22b5700) at pthread_create.c:456
#24 0x00007f4bb803dd0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Reproducible on 10.2-10.5 with MyISAM and Aria. Not reproducible with InnoDB.

The failure apparently started happening after this commit:

commit 7952f7720f967b43873e4fac13583984803fae3b
Author: Monty
Date:   Tue Oct 15 20:26:40 2019 +0300
 
    MDEV-10748 Server crashes in ha_maria::implicit_commit
    
    Problem was in a combination of LOCK TABLES on several Aria
    tables followed by an ALTER TABLE. After the ALTER TABLE there
    was a force close + reopen of the alter table. The close failed
    because the table was still part of a transaction.
    
    Fixed by calling extra(HA_EXTRA_PREPARE_FOR_FORCED_CLOSE) as
    part of closing the table, which ensures that the table is not
    anymore part of the current transaction.



 Comments   
Comment by Oleksandr Byelkin [ 2020-03-25 ]

It really goes through the code in the cset above (with lock counter 2, which looks incorrect, but I am not sure), then it stuck in:

  /* Lock on the copied half of the lock data array. */
  rc= thr_lock_errno_to_mysql[(int) thr_multi_lock(sql_lock->locks +
                                                   sql_lock->lock_count,
                                                   sql_lock->lock_count,
                                                   &thd->lock_info, timeout)];

in lock.cc mysql_lock_tables()

Comment by Oleksandr Byelkin [ 2020-03-25 ]

OK, it really locked twice, so it is correct, but what was fixed I little understand, so reassign it.

Comment by Michael Widenius [ 2020-05-24 ]

This failure could only happen if one locked the same table multiple times and then did an ALTER TABLE on the table.

The main fix for the problem was to ensure that we mark all instances of the table in the locked_table_list and when we reopen the tables, we first close the marked tables before reopening and locking them.

Generated at Thu Feb 08 09:06:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.