[MDEV-18069] Server hang or crash in MDL_lock::incompatible_granted_types_bitmap or ASAN heap-use-after-free in MDL_ticket::has_stronger_or_equal_type Created: 2018-12-23  Updated: 2023-05-16  Resolved: 2019-10-20

Status: Closed
Project: MariaDB Server
Component/s: Backup, Locking
Affects Version/s: 10.4
Fix Version/s: 10.4.9

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 1
Labels: affects-tests

Attachments: HTML File hang_threads_full    
Issue Links:
Problem/Incident
causes MDEV-20867 Perform careful review of "Server cra... Closed
is caused by MDEV-5336 Implement BACKUP STAGE for safe exter... Closed
Relates
relates to MDEV-18067 Server crash in backup_end or asserti... Closed
relates to MDEV-18068 Assertion `this == ticket->get_ctx()'... Closed
relates to MDEV-15481 [Draft] Server crashed in I_P_List<MD... Closed

 Description   

Note: It's most likely the same problem as MDEV-18067, but since the assertion is different, I'm filing it separately just in case.

The following test case fails on an ASAN build:

CREATE TABLE t1 (a INT);
BACKUP STAGE START;
FLUSH TABLES t1 WITH READ LOCK;
UNLOCK TABLES;
BACKUP STAGE BLOCK_COMMIT;
 
# Cleanup
BACKUP STAGE END;
DROP TABLE t1;

10.4 ASAN a8eb0c76bf

==13044==ERROR: AddressSanitizer: heap-use-after-free on address 0x6060000059f8 at pc 0x563748f3c800 bp 0x7f9c8933d6d0 sp 0x7f9c8933d6c8
READ of size 8 at 0x6060000059f8 thread T5
    #0 0x563748f3c7ff in MDL_ticket::has_stronger_or_equal_type(enum_mdl_type) const /data/src/10.4/sql/mdl.cc:1893
    #1 0x563748f3f60d in MDL_context::upgrade_shared_lock(MDL_ticket*, enum_mdl_type, double) /data/src/10.4/sql/mdl.cc:2532
    #2 0x563749108f17 in backup_flush /data/src/10.4/sql/backup.cc:198
    #3 0x5637491087cb in run_backup_stage(THD*, backup_stages) /data/src/10.4/sql/backup.cc:110
    #4 0x563748bc4437 in mysql_execute_command(THD*) /data/src/10.4/sql/sql_parse.cc:5233
    #5 0x563748bd6580 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.4/sql/sql_parse.cc:8104
    #6 0x563748bb0ae4 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.4/sql/sql_parse.cc:1851
    #7 0x563748badb6c in do_command(THD*) /data/src/10.4/sql/sql_parse.cc:1396
    #8 0x563748f23f06 in do_handle_one_connection(CONNECT*) /data/src/10.4/sql/sql_connect.cc:1402
    #9 0x563748f23912 in handle_one_connection /data/src/10.4/sql/sql_connect.cc:1308
    #10 0x563749a896e7 in pfs_spawn_thread /data/src/10.4/storage/perfschema/pfs.cc:1862
    #11 0x7f9c9577c493 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7493)
    #12 0x7f9c93d7d93e in __clone (/lib/x86_64-linux-gnu/libc.so.6+0xe893e)
 
0x6060000059f8 is located 56 bytes inside of 64-byte region [0x6060000059c0,0x606000005a00)
freed by thread T5 here:
    #0 0x7f9c959e7477 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x55477)
    #1 0x563748f4559f in MDL_ticket::~MDL_ticket() /data/src/10.4/sql/mdl.h:667
    #2 0x563748f3a7b8 in MDL_ticket::destroy(MDL_ticket*) /data/src/10.4/sql/mdl.cc:1017
    #3 0x563748f405cc in MDL_context::release_lock(enum_mdl_duration, MDL_ticket*) /data/src/10.4/sql/mdl.cc:2836
    #4 0x563748f408af in MDL_context::release_locks_stored_before(enum_mdl_duration, MDL_ticket*) /data/src/10.4/sql/mdl.cc:2883
    #5 0x563748f41664 in MDL_context::release_transactional_locks() /data/src/10.4/sql/mdl.cc:3056
    #6 0x563748bc4032 in mysql_execute_command(THD*) /data/src/10.4/sql/sql_parse.cc:5172
    #7 0x563748bd6580 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.4/sql/sql_parse.cc:8104
    #8 0x563748bb0ae4 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.4/sql/sql_parse.cc:1851
    #9 0x563748badb6c in do_command(THD*) /data/src/10.4/sql/sql_parse.cc:1396
    #10 0x563748f23f06 in do_handle_one_connection(CONNECT*) /data/src/10.4/sql/sql_connect.cc:1402
    #11 0x563748f23912 in handle_one_connection /data/src/10.4/sql/sql_connect.cc:1308
    #12 0x563749a896e7 in pfs_spawn_thread /data/src/10.4/storage/perfschema/pfs.cc:1862
    #13 0x7f9c9577c493 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7493)
 
previously allocated by thread T5 here:
    #0 0x7f9c959e723f in operator new(unsigned long, std::nothrow_t const&) (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x5523f)
    #1 0x563748f3a720 in MDL_ticket::create(MDL_context*, enum_mdl_type, enum_mdl_duration) /data/src/10.4/sql/mdl.cc:1011
    #2 0x563748f3d6bc in MDL_context::try_acquire_lock_impl(MDL_request*, MDL_ticket**) /data/src/10.4/sql/mdl.cc:2100
    #3 0x563748f3e2d3 in MDL_context::acquire_lock(MDL_request*, double) /data/src/10.4/sql/mdl.cc:2263
    #4 0x563749108c87 in backup_start /data/src/10.4/sql/backup.cc:165
    #5 0x563749108794 in run_backup_stage(THD*, backup_stages) /data/src/10.4/sql/backup.cc:104
    #6 0x563748bc4437 in mysql_execute_command(THD*) /data/src/10.4/sql/sql_parse.cc:5233
    #7 0x563748bd6580 in mysql_parse(THD*, char*, unsigned int, Parser_state*, bool, bool) /data/src/10.4/sql/sql_parse.cc:8104
    #8 0x563748bb0ae4 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool, bool) /data/src/10.4/sql/sql_parse.cc:1851
    #9 0x563748badb6c in do_command(THD*) /data/src/10.4/sql/sql_parse.cc:1396
    #10 0x563748f23f06 in do_handle_one_connection(CONNECT*) /data/src/10.4/sql/sql_connect.cc:1402
    #11 0x563748f23912 in handle_one_connection /data/src/10.4/sql/sql_connect.cc:1308
    #12 0x563749a896e7 in pfs_spawn_thread /data/src/10.4/storage/perfschema/pfs.cc:1862
    #13 0x7f9c9577c493 in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x7493)
 
Thread T5 created by T0 here:
    #0 0x7f9c959b5bba in pthread_create (/usr/lib/x86_64-linux-gnu/libasan.so.1+0x23bba)
    #1 0x563749a89caf in spawn_thread_v1 /data/src/10.4/storage/perfschema/pfs.cc:1912
    #2 0x5637489087b6 in inline_mysql_thread_create /data/src/10.4/include/mysql/psi/mysql_thread.h:1268
    #3 0x56374891e9d4 in create_thread_to_handle_connection(CONNECT*) /data/src/10.4/sql/mysqld.cc:6436
    #4 0x56374891f0d9 in create_new_thread(CONNECT*) /data/src/10.4/sql/mysqld.cc:6506
    #5 0x56374891f469 in handle_accepted_socket(st_mysql_socket, st_mysql_socket) /data/src/10.4/sql/mysqld.cc:6623
    #6 0x5637489200b5 in handle_connections_sockets() /data/src/10.4/sql/mysqld.cc:6788
    #7 0x56374891de91 in mysqld_main(int, char**) /data/src/10.4/sql/mysqld.cc:6058
    #8 0x56374890663f in main /data/src/10.4/sql/main.cc:25
    #9 0x7f9c93cb52b0 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x202b0)
 
SUMMARY: AddressSanitizer: heap-use-after-free /data/src/10.4/sql/mdl.cc:1893 MDL_ticket::has_stronger_or_equal_type(enum_mdl_type) const
Shadow bytes around the buggy address:
  0x0c0c7fff8ae0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8af0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8b00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8b10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c0c7fff8b20: fa fa fa fa fa fa fa fa fa fa fa fa fd fd fd fd
=>0x0c0c7fff8b30: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd[fd]
  0x0c0c7fff8b40: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8b50: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
  0x0c0c7fff8b60: fd fd fd fd fa fa fa fa fd fd fd fd fd fd fd fd
  0x0c0c7fff8b70: fa fa fa fa fd fd fd fd fd fd fd fd fa fa fa fa
  0x0c0c7fff8b80: fd fd fd fd fd fd fd fd fa fa fa fa fd fd fd fd
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Contiguous container OOB:fc
  ASan internal:           fe
==13044==ABORTING

On a non-ASAN build it fails at a later stage with MDEV-18067 or hangs with the stack trace hang_threads_full.

More complicated but similar test cases can crash on non-ASAN builds with the stack trace below, while producing the same ASAN errors on ASAN builds.

10.4 a8eb0c76bf

#3  <signal handler called>
#4  0x0000555c589fe4f9 in MDL_lock::incompatible_granted_types_bitmap (this=0x7efc581018a0) at /data/src/10.4/sql/mdl.cc:543
#5  0x0000555c589faf63 in MDL_ticket::has_stronger_or_equal_type (this=0x7efc680915c0, type=MDL_SHARED) at /data/src/10.4/sql/mdl.cc:1893
#6  0x0000555c589fc476 in MDL_context::upgrade_shared_lock (this=0x7efc68000c20, mdl_ticket=0x7efc680915c0, new_type=MDL_SHARED, lock_wait_timeout=2) at /data/src/10.4/sql/mdl.cc:2532
#7  0x0000555c58ad1bb9 in backup_flush (thd=0x7efc68000b00) at /data/src/10.4/sql/backup.cc:198
#8  0x0000555c58ad1861 in run_backup_stage (thd=0x7efc68000b00, stage=BACKUP_FLUSH) at /data/src/10.4/sql/backup.cc:110
#9  0x0000555c5888f94b in mysql_execute_command (thd=0x7efc68000b00) at /data/src/10.4/sql/sql_parse.cc:5233
#10 0x0000555c588988e4 in mysql_parse (thd=0x7efc68000b00, rawbuf=0x7efc68013b88 "BACKUP STAGE FLUSH /* QNO 10058 CON_ID 15 */", length=44, parser_state=0x7efccc85d650, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:8104
#11 0x0000555c58885b0e in dispatch_command (command=COM_QUERY, thd=0x7efc68000b00, packet=0x7efc6800b321 "BACKUP STAGE FLUSH /* QNO 10058 CON_ID 15 */ ", packet_length=45, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1851
#12 0x0000555c58884532 in do_command (thd=0x7efc68000b00) at /data/src/10.4/sql/sql_parse.cc:1396
#13 0x0000555c589f06de in do_handle_one_connection (connect=0x555c5cdc7050) at /data/src/10.4/sql/sql_connect.cc:1402
#14 0x0000555c589f0462 in handle_one_connection (arg=0x555c5cdc7050) at /data/src/10.4/sql/sql_connect.cc:1308
#15 0x00007efce3dae494 in start_thread (arg=0x7efccc85e700) at pthread_create.c:333
#16 0x00007efce23af93f in clone () from /lib/x86_64-linux-gnu/libc.so.6



 Comments   
Comment by Michael Widenius [ 2019-10-20 ]

The problem was that "FLUSH TABLES table_name" combined with UNLOCK TABLES
calls MDL_context::set_transaction_duration_for_all_locks(), which
changed backup_locks from MDL_EXPLICT to MDL_TRANSACTION.

Fixed by ensuring that set_transaction_duration_for_all_locks() doesn't
touch BACKUP locks.

Generated at Thu Feb 08 08:41:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.