[MDEV-18067] Server crash in backup_end or assertion failure `ticket->m_duration == MDL_EXPLICIT' upon BACKUP STAGE END after FLUSH TABLE with locks Created: 2018-12-23  Updated: 2023-05-16  Resolved: 2019-10-20

Status: Closed
Project: MariaDB Server
Component/s: Backup, Locking
Affects Version/s: 10.4
Fix Version/s: 10.4.9

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Problem/Incident
causes MDEV-20867 Perform careful review of "Server cra... Closed
is caused by MDEV-5336 Implement BACKUP STAGE for safe exter... Closed
Relates
relates to MDEV-15481 [Draft] Server crashed in I_P_List<MD... Closed
relates to MDEV-18068 Assertion `this == ticket->get_ctx()'... Closed
relates to MDEV-18069 Server hang or crash in MDL_lock::inc... Closed

 Description   

CREATE TABLE t1 (pk INT PRIMARY KEY, f INT);
BACKUP STAGE START;
FLUSH TABLE t1 FOR EXPORT;
UNLOCK TABLES;
BACKUP STAGE END;
 
# Cleanup
DROP TABLE t1;

10.4 debug a8eb0c76b

mysqld: /data/src/10.4/sql/mdl.cc:2851: void MDL_context::release_lock(MDL_ticket*): Assertion `ticket->m_duration == MDL_EXPLICIT' failed.
181223 23:37:42 [ERROR] mysqld got signal 6 ;
 
#7  0x00007fb5fe103ee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x00005611310d7d4e in MDL_context::release_lock (this=0x7fb5e8000c20, ticket=0x7fb5e80b4f10) at /data/src/10.4/sql/mdl.cc:2851
#9  0x00005611311acedb in backup_end (thd=0x7fb5e8000b00) at /data/src/10.4/sql/backup.cc:308
#10 0x00005611311ac894 in run_backup_stage (thd=0x7fb5e8000b00, stage=BACKUP_END) at /data/src/10.4/sql/backup.cc:119
#11 0x0000561130f6a94b in mysql_execute_command (thd=0x7fb5e8000b00) at /data/src/10.4/sql/sql_parse.cc:5233
#12 0x0000561130f738e4 in mysql_parse (thd=0x7fb5e8000b00, rawbuf=0x7fb5e8014f98 "BACKUP STAGE END", length=16, parser_state=0x7fb5f8295600, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:8104
#13 0x0000561130f60b0e in dispatch_command (command=COM_QUERY, thd=0x7fb5e8000b00, packet=0x7fb5e800b421 "BACKUP STAGE END", packet_length=16, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1851
#14 0x0000561130f5f532 in do_command (thd=0x7fb5e8000b00) at /data/src/10.4/sql/sql_parse.cc:1396
#15 0x00005611310cb6de in do_handle_one_connection (connect=0x5611335866b0) at /data/src/10.4/sql/sql_connect.cc:1402
#16 0x00005611310cb462 in handle_one_connection (arg=0x5611335866b0) at /data/src/10.4/sql/sql_connect.cc:1308
#17 0x000056113158a338 in pfs_spawn_thread (arg=0x5611334ca970) at /data/src/10.4/storage/perfschema/pfs.cc:1862
#18 0x00007fb5ffbbf494 in start_thread (arg=0x7fb5f8296700) at pthread_create.c:333
#19 0x00007fb5fe1c093f in clone () from /lib/x86_64-linux-gnu/libc.so.6

10.4 non-debug a8eb0c76b

#5  <signal handler called>
#6  0x0000000000000055 in ?? ()
#7  0x000055bd2db3724f in backup_end (thd=0x7f71b00009a8) at /data/src/10.4/sql/backup.cc:308
#8  0x000055bd2db372d8 in run_backup_stage (thd=0x7f71b00009a8, stage=BACKUP_END) at /data/src/10.4/sql/backup.cc:119
#9  0x000055bd2d9d1399 in mysql_execute_command (thd=0x7f71b00009a8) at /data/src/10.4/sql/sql_parse.cc:5233
#10 0x000055bd2d9d79a1 in mysql_parse (thd=0x7f71b00009a8, rawbuf=<optimized out>, length=16, parser_state=0x7f71cc31f640, is_com_multi=<optimized out>, is_next_command=<optimized out>) at /data/src/10.4/sql/sql_parse.cc:8104
#11 0x000055bd2d9da315 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x7f71b00009a8, packet=packet@entry=0x7f71b0009559 "BACKUP STAGE END", packet_length=packet_length@entry=16, is_com_multi=is_com_multi@entry=false, is_next_command=is_next_command@entry=false) at /data/src/10.4/sql/sql_parse.cc:1851
#12 0x000055bd2d9daca0 in do_command (thd=0x7f71b00009a8) at /data/src/10.4/sql/sql_parse.cc:1396
#13 0x000055bd2daacd54 in do_handle_one_connection (connect=connect@entry=0x55bd311490c8) at /data/src/10.4/sql/sql_connect.cc:1402
#14 0x000055bd2daacef4 in handle_one_connection (arg=arg@entry=0x55bd311490c8) at /data/src/10.4/sql/sql_connect.cc:1308
#15 0x000055bd2dda35a4 in pfs_spawn_thread (arg=0x55bd31149328) at /data/src/10.4/storage/perfschema/pfs.cc:1862
#16 0x00007f71d337a494 in start_thread (arg=0x7f71cc320700) at pthread_create.c:333
#17 0x00007f71d197b93f in clone () from /lib/x86_64-linux-gnu/libc.so.6

10.4 7b33a6a1 ASAN

==30879==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000059ec at pc 0x557fdac7557d bp 0x7fcf01608730 sp 0x7fcf01608728
READ of size 4 at 0x6060000059ec thread T5
    #0 0x557fdac7557c in MDL_context::release_lock(MDL_ticket*) /data/src/10.4/sql/mdl.cc:2835
    #1 0x557fdae7837c in backup_end(THD*) /data/src/10.4/sql/backup.cc:308
    #2 0x557fdae775e0 in run_backup_stage(THD*, backup_stages) /data/src/10.4/sql/backup.cc:119
    #3 0x557fda8d6ee4 in mysql_execute_command(THD*) /data/src/10.4/sql/sql_parse.cc:5336
    #4 0x557fda8e9dda in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.4/sql/sql_parse.cc:8205
    #5 0x557fda8c1ce5 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.4/sql/sql_parse.cc:1829
    #6 0x557fda8beb36 in do_command(THD*) /data/src/10.4/sql/sql_parse.cc:1358
    #7 0x557fdac528a5 in do_handle_one_connection(CONNECT*) /data/src/10.4/sql/sql_connect.cc:1399
    #8 0x557fdac5229e in handle_one_connection /data/src/10.4/sql/sql_connect.cc:1302
    #9 0x557fdb819be2 in pfs_spawn_thread /data/src/10.4/storage/perfschema/pfs.cc:1862
    #10 0x7fcf0c9b7493 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7493)
    #11 0x7fcf0ad9d93e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xe893e)
 
0x6060000059ec is located 44 bytes inside of 64-byte region [0x6060000059c0,0x606000005a00)
freed by thread T5 here:
    #0 0x7fcf0cc22477 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x55477)
    #1 0x557fdac7a5d9 in MDL_ticket::~MDL_ticket() /data/src/10.4/sql/mdl.h:667
    #2 0x557fdac6f5b4 in MDL_ticket::destroy(MDL_ticket*) /data/src/10.4/sql/mdl.cc:1014
    #3 0x557fdac754aa in MDL_context::release_lock(enum_mdl_duration, MDL_ticket*) /data/src/10.4/sql/mdl.cc:2820
    #4 0x557fdac7578d in MDL_context::release_locks_stored_before(enum_mdl_duration, MDL_ticket*) /data/src/10.4/sql/mdl.cc:2867
    #5 0x557fdac76542 in MDL_context::release_transactional_locks() /data/src/10.4/sql/mdl.cc:3040
    #6 0x557fda8d6adf in mysql_execute_command(THD*) /data/src/10.4/sql/sql_parse.cc:5275
    #7 0x557fda8e9dda in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.4/sql/sql_parse.cc:8205
    #8 0x557fda8c1ce5 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.4/sql/sql_parse.cc:1829
    #9 0x557fda8beb36 in do_command(THD*) /data/src/10.4/sql/sql_parse.cc:1358
    #10 0x557fdac528a5 in do_handle_one_connection(CONNECT*) /data/src/10.4/sql/sql_connect.cc:1399
    #11 0x557fdac5229e in handle_one_connection /data/src/10.4/sql/sql_connect.cc:1302
    #12 0x557fdb819be2 in pfs_spawn_thread /data/src/10.4/storage/perfschema/pfs.cc:1862
    #13 0x7fcf0c9b7493 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7493)
 
previously allocated by thread T5 here:
    #0 0x7fcf0cc2223f in operator new(unsigned long, std::nothrow_t const&) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x5523f)
    #1 0x557fdac6f51c in MDL_ticket::create(MDL_context*, enum_mdl_type, enum_mdl_duration) /data/src/10.4/sql/mdl.cc:1008
    #2 0x557fdac7259a in MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**) /data/src/10.4/sql/mdl.cc:2084
    #3 0x557fdac731b1 in MDL_context::acquire_lock(MDL_request*, double) /data/src/10.4/sql/mdl.cc:2247
    #4 0x557fdae77a57 in backup_start /data/src/10.4/sql/backup.cc:165
    #5 0x557fdae77564 in run_backup_stage(THD*, backup_stages) /data/src/10.4/sql/backup.cc:104
    #6 0x557fda8d6ee4 in mysql_execute_command(THD*) /data/src/10.4/sql/sql_parse.cc:5336
    #7 0x557fda8e9dda in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.4/sql/sql_parse.cc:8205
    #8 0x557fda8c1ce5 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.4/sql/sql_parse.cc:1829
    #9 0x557fda8beb36 in do_command(THD*) /data/src/10.4/sql/sql_parse.cc:1358
    #10 0x557fdac528a5 in do_handle_one_connection(CONNECT*) /data/src/10.4/sql/sql_connect.cc:1399
    #11 0x557fdac5229e in handle_one_connection /data/src/10.4/sql/sql_connect.cc:1302
    #12 0x557fdb819be2 in pfs_spawn_thread /data/src/10.4/storage/perfschema/pfs.cc:1862
    #13 0x7fcf0c9b7493 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7493)
 
Thread T5 created by T0 here:
    #0 0x7fcf0cbf0bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    #1 0x557fdb81a1aa in spawn_thread_v1 /data/src/10.4/storage/perfschema/pfs.cc:1912
    #2 0x557fda609846 in inline_mysql_thread_create /data/src/10.4/include/mysql/psi/mysql_thread.h:1268
    #7 0x557fda61e973 in mysqld_main(int, char**) /data/src/10.4/sql/mysqld.cc:5876
    #8 0x557fda6076cf in main /data/src/10.4/sql/main.cc:25
    #9 0x7fcf0acd52b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
 
SUMMARY: AddressSanitizer: heap-use-after-free /data/src/10.4/sql/mdl.cc:2835 MDL_context::release_lock(MDL_ticket*)
Shadow bytes around the buggy address:
  0x0c0c7fff8ae0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8af0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8b20: fa fa fa fa fa fa fa fa fa fa fa fa fd fd fd fd
=>0x0c0c7fff8b30: fd fd fd fd fa fa fa fa fd fd fd fd fd[fd]fd fd
  0x0c0c7fff8b40: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8b50: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c7fff8b60: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff8b70: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8b80: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==30879==ABORTING

Another test case, with FTWRL instead of FOR EXPORT:

CREATE TABLE t1 (a INT);
CREATE TABLE t2 (b INT);
BACKUP STAGE START;
FLUSH TABLES t1 WITH READ LOCK;
--error ER_BACKUP_LOCK_IS_ACTIVE
LOCK TABLE t2 WRITE;
BACKUP STAGE END;
 
# Cleanup
UNLOCK TABLES;
DROP TABLE t1, t2;



 Comments   
Comment by Elena Stepanova [ 2018-12-25 ]

Here is another stack trace that popped up a couple of times under similar circumstances. I'm not sure if it's the same problem, when I try to reproduce it, I'm getting one of MDEV-18067, MDEV-18068, MDEV-18069.

10.4 a8eb0c76bf72b

#3  <signal handler called>
#4  0x000056023ad1cad4 in I_P_List_iterator<MDL_ticket, I_P_List<MDL_ticket, I_P_List_adapter<MDL_ticket, &MDL_ticket::next_in_lock, &MDL_ticket::prev_in_lock>, I_P_List_null_counter, I_P_List_fast_push_back<MDL_ticket> > >::operator++ (this=0x7fdcecbd5c70) at /home/travis/src/sql/sql_plist.h:197
#5  0x000056023ad183d7 in MDL_lock::Ticket_list::clear_bit_if_not_in_list (this=0x56023deb4068, type=MDL_TYPE_END) at /home/travis/src/sql/mdl.cc:1196
#6  0x000056023ad18803 in MDL_lock::Ticket_list::remove_ticket (this=0x56023deb4068, ticket=0x7fdce82c8ce0) at /home/travis/src/sql/mdl.cc:1289
#7  0x000056023ad1a5ab in MDL_context::upgrade_shared_lock (this=0x7fdce8352ae0, mdl_ticket=0x7fdce85432b0, new_type=MDL_TYPE_END, lock_wait_timeout=10) at /home/travis/src/sql/mdl.cc:2547
#8  0x000056023adf0111 in backup_reset_alter_copy_lock (thd=0x7fdce83529c0) at /home/travis/src/sql/backup.cc:354
#9  0x000056023ac8ce9a in copy_data_between_tables (thd=0x7fdce83529c0, from=0x7fdce82ae620, to=0x7fdce820bf80, create=..., ignore=true, order_num=0, order=0x0, copied=0x7fdcecbd6b28, deleted=0x7fdcecbd6b30, keys_onoff=Alter_info::LEAVE_AS_IS, alter_ctx=0x7fdcecbd7930) at /home/travis/src/sql/sql_table.cc:10529
#10 0x000056023ac8a9c4 in mysql_alter_table (thd=0x7fdce83529c0, new_db=0x7fdce8357088, new_name=0x7fdce8357458, create_info=0x7fdcecbd8520, table_list=0x7fdce8360af8, alter_info=0x7fdcecbd8460, order_num=0, order=0x0, ignore=true) at /home/travis/src/sql/sql_table.cc:9871
#11 0x000056023ad14620 in Sql_cmd_alter_table::execute (this=0x7fdce8361278, thd=0x7fdce83529c0) at /home/travis/src/sql/sql_alter.cc:497
#12 0x000056023abb1c75 in mysql_execute_command (thd=0x7fdce83529c0) at /home/travis/src/sql/sql_parse.cc:6302
#13 0x000056023abb6e7a in mysql_parse (thd=0x7fdce83529c0, rawbuf=0x7fdce8360928 "ALTER IGNORE TABLE `oltp1` ADD `c` VARBINARY(51204) NULL /*!100303 INVISIBLE */, ALGORITHM=COPY, LOCK=EXCLUSIVE  /* QNO 5719 CON_ID 19 */", length=137, parser_state=0x7fdcecbd9650, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:8104
#14 0x000056023aba3d64 in dispatch_command (command=COM_QUERY, thd=0x7fdce83529c0, packet=0x7fdce82d3231 "ALTER IGNORE TABLE `oltp1` ADD `c` VARBINARY(51204) NULL /*!100303 INVISIBLE */, ALGORITHM=COPY, LOCK=EXCLUSIVE  /* QNO 5719 CON_ID 19 */ ", packet_length=138, is_com_multi=false, is_next_command=false) at /home/travis/src/sql/sql_parse.cc:1851
#15 0x000056023aba2781 in do_command (thd=0x7fdce83529c0) at /home/travis/src/sql/sql_parse.cc:1396
#16 0x000056023ad0e660 in do_handle_one_connection (connect=0x56023eccea20) at /home/travis/src/sql/sql_connect.cc:1402
#17 0x000056023ad0e3b1 in handle_one_connection (arg=0x56023eccea20) at /home/travis/src/sql/sql_connect.cc:1308
#18 0x00007fdcf0f236ba in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#19 0x00007fdcf03b841d in clone () from /lib/x86_64-linux-gnu/libc.so.6

Comment by Elena Stepanova [ 2019-08-21 ]

Variation without run_backup_stage in the stack trace:

--connect (con1,localhost,root,,)
CREATE TABLE t1 (a INT);
BACKUP STAGE START;
FLUSH TABLES t1 WITH READ LOCK;
--disconnect con1
 
# Cleanup
--connection default
DROP TABLE t1;

10.4 non-debug c5bc0ced

#5  <signal handler called>
#6  0x0000000000000055 in ?? ()
#7  0x000055cc687ea69f in backup_end (thd=thd@entry=0x7fce8c0009a8) at /data/src/10.4/sql/backup.cc:313
#8  0x000055cc68627ddb in THD::cleanup (this=this@entry=0x7fce8c0009a8) at /data/src/10.4/sql/sql_class.cc:1543
#9  0x000055cc685b77f2 in unlink_thd (thd=thd@entry=0x7fce8c0009a8) at /data/src/10.4/sql/mysqld.cc:2627
#10 0x000055cc685b79b8 in one_thread_per_connection_end (thd=0x7fce8c0009a8, put_in_cache=<optimized out>) at /data/src/10.4/sql/mysqld.cc:2776
#11 0x000055cc6874818f in do_handle_one_connection (connect=connect@entry=0x55cc6b980e98) at /data/src/10.4/sql/sql_connect.cc:1419
#12 0x000055cc68748354 in handle_one_connection (arg=arg@entry=0x55cc6b980e98) at /data/src/10.4/sql/sql_connect.cc:1306
#13 0x000055cc68cf13f4 in pfs_spawn_thread (arg=0x55cc6b9179e8) at /data/src/10.4/storage/perfschema/pfs.cc:1862
#14 0x00007fceaa80f4a4 in start_thread (arg=0x7fce9edd6700) at pthread_create.c:456
#15 0x00007fcea8d57d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

Comment by Michael Widenius [ 2019-10-20 ]

The problem was that "FLUSH TABLES table_name" combined with UNLOCK TABLES
calls MDL_context::set_transaction_duration_for_all_locks(), which
changed backup_locks from MDL_EXPLICT to MDL_TRANSACTION.

Fixed by ensuring that set_transaction_duration_for_all_locks() doesn't
touch BACKUP locks.

Generated at Thu Feb 08 08:41:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.