[MDEV-29504] AddressSanitizer: heap-use-after-free storage/innobase/lock/lock0lock.cc:3558 in lock_table Created: 2022-09-09  Updated: 2022-11-08  Resolved: 2022-11-08

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - InnoDB
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.11.2, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Critical
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: mdl, race, rr-profile-analyzed

Attachments: File TBR-1600.cfg     File TBR-1600.yy     File TBR-1600.zz    
Issue Links:
Duplicate
is duplicated by MDEV-29850 Assertion `table->get_ref_count() > 0... Closed
Relates
relates to MDEV-26217 Failing assertion: list.count > 0 in ... Closed
relates to MDEV-26554 Table-rebuilding DDL on parent table ... Closed
relates to MDEV-29849 Assertion `lock->trx == this' failed ... Closed

 Description   

origin/10.6 8ff10969996a5349e906fab44b45c6e3ec53eea0 2022-08-24T17:06:57+03:00
Build with RelWithDebInfo, ASAN, InnoBD with -O0 other stuff mostly -O2
 
The RQG test fiddles with FOREIGN KEYs.
 
# 2022-08-29T12:28:42 [2696739] | ==2700042==ERROR: AddressSanitizer: heap-use-after-free on address 0x616001114f18 at pc 0x55858daec68b bp 0x7fb8b7a9b960 sp 0x7fb8b7a9b950
# 2022-08-29T12:28:42 [2696739] | READ of size 8 at 0x616001114f18 thread T21
# 2022-08-29T12:28:42 [2696739] |     #0 0x55858daec68a in lock_table(dict_table_t*, dict_table_t* const*, lock_mode, que_thr_t*) /data/Server/10.6B/storage/innobase/lock/lock0lock.cc:3558
# 2022-08-29T12:28:42 [2696739] |     #1 0x55858dc1f3ef in row_ins_check_foreign_constraint(unsigned long, dict_foreign_t*, dict_table_t*, dtuple_t*, que_thr_t*) /data/Server/10.6B/storage/innobase/row/row0ins.cc:1557
# 2022-08-29T12:28:42 [2696739] |     #2 0x55858dca9375 in row_upd_check_references_constraints /data/Server/10.6B/storage/innobase/row/row0upd.cc:252
# 2022-08-29T12:28:42 [2696739] |     #3 0x55858dcb29a9 in row_upd_del_mark_clust_rec /data/Server/10.6B/storage/innobase/row/row0upd.cc:2535
# 2022-08-29T12:28:42 [2696739] |     #4 0x55858dcb32de in row_upd_clust_step /data/Server/10.6B/storage/innobase/row/row0upd.cc:2669
# 2022-08-29T12:28:42 [2696739] |     #5 0x55858dcb3b3a in row_upd /data/Server/10.6B/storage/innobase/row/row0upd.cc:2791
# 2022-08-29T12:28:42 [2696739] |     #6 0x55858dcb42a2 in row_upd_step(que_thr_t*) /data/Server/10.6B/storage/innobase/row/row0upd.cc:2933
# 2022-08-29T12:28:42 [2696739] |     #7 0x55858dc54b71 in row_update_for_mysql(row_prebuilt_t*) /data/Server/10.6B/storage/innobase/row/row0mysql.cc:1695
# 2022-08-29T12:28:42 [2696739] |     #8 0x55858d9c9f57 in ha_innobase::delete_row(unsigned char const*) /data/Server/10.6B/storage/innobase/handler/ha_innodb.cc:8786
# 2022-08-29T12:28:42 [2696739] |     #9 0x55858d10b5da in handler::ha_delete_row(unsigned char const*) /data/Server/10.6B/sql/handler.cc:7714
# 2022-08-29T12:28:42 [2696739] |     #10 0x55858d636764 in mysql_delete(THD*, TABLE_LIST*, Item*, SQL_I_List<st_order>*, unsigned long long, unsigned long long, select_result*) /data/Server/10.6B/sql/sql_delete.cc:830
# 2022-08-29T12:28:42 [2696739] |     #11 0x55858c99f210 in mysql_execute_command(THD*, bool) /data/Server/10.6B/sql/sql_parse.cc:4807
# 2022-08-29T12:28:42 [2696739] |     #12 0x55858c962e2a in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/10.6B/sql/sql_parse.cc:8030
# 2022-08-29T12:28:42 [2696739] |     #13 0x55858c962e2a in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/10.6B/sql/sql_parse.cc:7952
# 2022-08-29T12:28:42 [2696739] |     #14 0x55858c98de91 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/Server/10.6B/sql/sql_parse.cc:1896
# 2022-08-29T12:28:42 [2696739] |     #15 0x55858c9942a2 in do_command(THD*, bool) /data/Server/10.6B/sql/sql_parse.cc:1409
# 2022-08-29T12:28:42 [2696739] |     #16 0x55858cd330d3 in do_handle_one_connection(CONNECT*, bool) /data/Server/10.6B/sql/sql_connect.cc:1418
# 2022-08-29T12:28:42 [2696739] |     #17 0x55858cd338dc in handle_one_connection /data/Server/10.6B/sql/sql_connect.cc:1312
# 2022-08-29T12:28:42 [2696739] |     #18 0x7fb8e9c5c608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
# 2022-08-29T12:28:42 [2696739] |     #19 0x7fb8e982f292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)
...
Query (0x62b000126238): DELETE FROM parent WHERE a = 3 /* E_R Thread2 QNO 43346 CON_ID 17 */
Status: NOT_KILLED
 
ssh pluto
gdb -c /data/results/1661771621/TBR-1589/1/data/core /data/Server_bin/10.6B_asan_O2_O0/bin/mysqld
#10 0x00007fb8e9ecc1ab in __asan_report_load8 ()
   from /lib/x86_64-linux-gnu/libasan.so.5
#11 0x000055858daec68b in lock_table (table=0x61700004cd70, 
    fktable=0x616001114f18, mode=LOCK_IS, thr=0x624000214170)
    at /data/Server/10.6B/storage/innobase/lock/lock0lock.cc:3558
#12 0x000055858dc1f3f0 in row_ins_check_foreign_constraint (check_ref=0, 
    foreign=0x616001114ef0, table=0x617000168c70, entry=0x61600121ae10, 
    thr=0x624000214170)
(gdb) p table.name
$2 = {m_name = 0x6170000c3280 "\023", static part_suffix = "#P#"}
 
# GIT_SHOW: HEAD -> experimental, origin/experimental 32e1b1a96b072f6a71c564d3e2d50d9cf2fb3561 2022-08-24T17:04:19+02:00
# rqg.pl  : Version 4.0.6 (2022-05)
#
# $RQG_HOME/rqg.pl \
# --gendata=conf/mariadb/fk_truncate.zz \
# --grammar=conf/mariadb/fk_truncate.yy \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--innodb_file_per_table=0 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-innodb-sync-debug \
# --mysqld=--innodb_stats_persistent=off \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=2 \
# --mysqld=--innodb_use_native_aio=1 \
# --mysqld=--loose_innodb_change_buffering=changes \
# --mysqld=--innodb_rollback_on_timeout=OFF \
# --vardir_type=fast \
# --mysqld=--innodb_page_size=64K \
# --mysqld=--innodb-buffer-pool-size=256M \
# <local settings>



 Comments   
Comment by Marko Mäkelä [ 2022-09-12 ]

mleich reproduced this with https://rr-project.org. Here is my analysis.

ssh pluto
rr replay /data/results/1662756815/TBR-1600/1/rr/latest-trace

continue
watch -l *(char*)0x0c2e80036d4c
reverse-continue
reverse-continue
backtrace
thread apply 3 backtrace

Thread 3 would be the thread that reports heap-use-after-free. The output is:

10.6 8ff10969996a5349e906fab44b45c6e3ec53eea0

(rr) backtrace
#0  __memset_avx2_unaligned_erms () at ../sysdeps/x86_64/multiarch/memset-vec-unaligned-erms.S:192
#1  0x00007f35bb78c9d9 in ?? () from /lib/x86_64-linux-gnu/libasan.so.5
#2  0x00007f35bb86d799 in free () from /lib/x86_64-linux-gnu/libasan.so.5
#3  0x000055e6c936dd9e in mem_heap_block_free (heap=0x6110001ec340, block=0x6170001f6800) at /data/Server/10.6B/storage/innobase/mem/mem0mem.cc:416
#4  0x000055e6c9647a56 in mem_heap_free (heap=0x6110001ec340) at /data/Server/10.6B/storage/innobase/include/mem0mem.inl:419
#5  0x000055e6c96496a1 in dict_mem_table_free (table=0x6170001f6870) at /data/Server/10.6B/storage/innobase/dict/dict0mem.cc:234
#6  0x000055e6c9619866 in dict_sys_t::remove (this=0x55e6ca9bc6c0 <dict_sys>, table=0x6170001f6870, lru=false, keep=false) at /data/Server/10.6B/storage/innobase/dict/dict0dict.cc:1941
#7  0x000055e6c968247e in trx_t::commit (this=0x327b1465a440, deleted=std::vector of length 0, capacity 0) at /data/Server/10.6B/storage/innobase/dict/drop.cc:272
#8  0x000055e6c921614b in ha_innobase::create (this=0x61d0007454b8, name=0x61900095de20 "test/child", form=0x61900095d398, create_info=0x19b379082690, file_per_table=false, trx=0x327b1465a440)
    at /data/Server/10.6B/storage/innobase/handler/ha_innodb.cc:13304
#9  0x000055e6c91ebefa in ha_innobase::truncate (this=0x61d0007454b8) at /data/Server/10.6B/storage/innobase/handler/ha_innodb.cc:14031
#10 0x000055e6c85918d0 in Sql_cmd_truncate_table::handler_truncate (this=<optimized out>, thd=<optimized out>, table_ref=<optimized out>, is_tmp_table=<optimized out>)
    at /data/Server/10.6B/sql/sql_truncate.cc:255
#11 0x000055e6c85943e0 in Sql_cmd_truncate_table::truncate_table (this=0x62b0000c4a60, thd=0x62b0000bd218, table_ref=0x62b0000c4368) at /data/Server/10.6B/sql/sql_truncate.cc:507
#12 0x000055e6c8594ace in Sql_cmd_truncate_table::execute (this=0x62b0000c4a60, thd=0x62b0000bd218) at /data/Server/10.6B/sql/sql_truncate.cc:573
#13 0x000055e6c81a2c21 in mysql_execute_command (thd=0x62b0000bd218, is_called_from_prepared_stmt=<optimized out>) at /data/Server/10.6B/sql/sql_parse.cc:5997
(rr) thread apply 3 backtrace
 
Thread 3 (Thread 1804093.1816646):
#11 __pthread_cond_timedwait (cond=0x327b146599d0, mutex=0x55e6cb370ec0 <lock_sys+128>, abstime=0x24a433bc37f0) at pthread_cond_wait.c:656
#12 0x000055e6c92ecae1 in lock_wait (thr=0x624000262170) at /data/Server/10.6B/storage/innobase/lock/lock0lock.cc:1838
#13 0x000055e6c94293b8 in row_ins_check_foreign_constraint (check_ref=0, foreign=0x61600060b1f0, table=0x617000045670, entry=0x616000439f10, thr=0x624000262170) at /data/Server/10.6B/storage/innobase/row/row0ins.cc:1827
#14 0x000055e6c94b2376 in row_upd_check_references_constraints (node=0x61f000171120, pcur=0x6110001006c0, table=0x617000045670, index=0x61600004d4f0, offsets=0x24a433bc4580, thr=0x624000262170, mtr=0x24a433bc4c40) at /data/Server/10.6B/storage/innobase/row/row0upd.cc:252
#15 0x000055e6c94bab44 in row_upd_clust_rec_by_insert (node=0x61f000171120, index=0x61600004d4f0, thr=0x624000262170, referenced=true, foreign=false, mtr=0x24a433bc4c40) at /data/Server/10.6B/storage/innobase/row/row0upd.cc:2314

If the statement TRUNCATE TABLE child in Thread 14 were protected by a metadata lock (MDL) on all that are connected via FOREIGN KEY constraints, then the UPDATE parent SET a = 2217 in Thread 3 could not execute simultaneously.

Comment by Marko Mäkelä [ 2022-09-12 ]

The work-around that was implemented in MDEV-26554 also covered ha_innobase::truncate(), but apparently it is insufficient. It would be great if necessary and sufficient metadata locks were acquired for DDL operations, so that such work-arounds would not be needed.

Comment by Marko Mäkelä [ 2022-09-12 ]

MDEV-26217 was a follow-up to the work-around MDEV-26554.

An exclusive InnoDB table lock was acquired in ha_innobase::truncate() and released as part of committing the InnoDB transaction of the TRUNCATE operation:

void trx_t::commit(std::vector<pfs_os_file_t> &deleted)
{
  ut_ad(dict_operation);
  commit_persist();
  if (dict_operation)
  {
    ut_ad(dict_sys.locked());
    lock_sys.wr_lock(SRW_LOCK_CALL);
    mutex_lock();
    lock_release_on_drop(this);
        dict_sys.remove(table);
    lock_sys.wr_unlock();

After this point of time, the lock_wait() on the child table could resume execution. In the rr replay trace that I analyzed, it would (wrongly) return DB_SUCCESS and a table S lock on the freed table object (referring to the table before TRUNCATE TABLE child) was wrongly created. The heap-use-after-free would be reported once the UPDATE transaction is rolled back due to a duplicate key error on the primary key of the parent table.

I can’t think of any low-performance-overhead InnoDB work-around of the missing MDL on DDL. The costly dict_table_t::n_foreign_key_checks_running was removed as part of MDEV-21175.

Comment by Matthias Leich [ 2022-09-12 ]

Some more data about the simplified test which was used for
ssh pluto
rr replay /data/results/1662756815/TBR-1600/1/rr/latest-trace

origin/HEAD, origin/10.6 8ff10969996a5349e906fab44b45c6e3ec53eea0 2022-08-24T17:06:57+03:00
build with ASAN+RelWithDebInfo, InnoDB with -O0 and most other with -O2

You will need most probably
- serious less than 30 RQG runs for one replay  in case rr is not involved
- 7000 till 30000 RQG runs in case rr is involved
even if the simplified version of the test is used.
 
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental, origin/experimental 870d6d8aa40c27a3b53e445e8fc86aae51cba19c 2022-08-30T20:57:51+02:00
# rqg.pl  : Version 4.0.6 (2022-05)
#
# $RQG_HOME/rqg.pl \
# --duration=300 \
# --queries=10000000 \
# --no_mask \
# --seed=random \
# --engine=InnoDB \
# --gendata=TBR-1600.zz \
# --rpl_mode=none \
# --max_gd_duration=1200 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--log-output=none \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--loose-plugin-load-add=provider_lz4.so \
# --mysqld=--innodb_page_size=64K \
# --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--log-bin \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --mysqld=--loose-innodb_file_per_table=0 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-max-statement-time=30 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--innodb-buffer-pool-size=256M \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose_innodb_change_buffering=changes \
# --mysqld=--wait_timeout=28800 \
# --reporters=Backtrace,Deadlock1,ErrorLog \
# --validators=None \
# --grammar=TBR-1600.yy \
# --threads=3 \
# <local settings>
 
# ./RR_REPLAY.sh TBR-1600.cfg <path to your MariaDB installation>
 
You might replay a SEGV
# 2022-09-11T16:54:38 [3687192] | Thread 2 (Thread 3689123.3689837):
# 2022-09-11T16:54:38 [3687192] | #0  0x000056019ff401af in init_fts_doc_id_for_ref (table=0x6170001404f0, depth=0x7e41419e42d0) at /data/Server/10.6B/storage/innobase/row/row0mysql.cc:1596
# 2022-09-11T16:54:38 [3687192] | #1  0x000056019ff4065e in row_update_for_mysql (prebuilt=0x61f000151cf0) at /data/Server/10.6B/storage/innobase/row/row0mysql.cc:1645
# 2022-09-11T16:54:38 [3687192] | #2  0x000056019fcb5453 in ha_innobase::update_row (this=0x61d0004326b8, old_row=0x6190004231d0 "\377\001\002", new_row=0x6190004231c8 "\377\333\016") at /data/Server/10.6B/storage/innobase/handler/ha_innodb.cc:8675
# 2022-09-11T16:54:38 [3687192] | #3  0x000056019f3f6ac8 in handler::ha_update_row (this=0x61d0004326b8, old_data=<optimized out>, new_data=0x6190004231c8 "\377\333\016") at /data/Server/10.6B/sql/handler.cc:7635
# 2022-09-11T16:54:38 [3687192] | #4  0x000056019ef42ab9 in mysql_update (thd=thd@entry=0x62b0000af218, table_list=<optimized out>, fields=..., values=..., conds=<optimized out>, order_num=<optimized out>, order=<optimized out>, limit=18446744073709551615, ignore=<optimized out>, found_return=<optimized out>, updated_return=<optimized out>) at /data/Server/10.6B/sql/sql_update.cc:1087
# 2022-09-11T16:54:38 [3687192] | #5  0x000056019ec8de4e in mysql_execute_command (thd=0x62b0000af218, is_called_from_prepared_stmt=<optimized out>) at /data/Server/10.6B/sql/sql_limit.h:85
# 2022-09-11T16:54:38 [3687192] | #6  0x000056019ec4ee2b in mysql_parse (rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>, thd=0x62b0000af218) at /data/Server/10.6B/sql/sql_parse.cc:8030
# 2022-09-11T16:54:38 [3687192] | #7  mysql_parse (thd=0x62b0000af218, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /data/Server/10.6B/sql/sql_parse.cc:7952
# 2022-09-11T16:54:38 [3687192] | #8  0x000056019ec79e92 in dispatch_command (command=<optimized out>, thd=0x62b0000af218, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /data/Server/10.6B/sql/sql_class.h:1358
# 2022-09-11T16:54:38 [3687192] | #9  0x000056019ec802a3 in do_command (thd=0x62b0000af218, blocking=blocking@entry=true) at /data/Server/10.6B/sql/sql_parse.cc:1409
# 2022-09-11T16:54:38 [3687192] | #10 0x000056019f01f0d4 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x6080000029b8, put_in_cache=put_in_cache@entry=true) at /data/Server/10.6B/sql/sql_connect.cc:1418
# 2022-09-11T16:54:38 [3687192] | #11 0x000056019f01f8dd in handle_one_connection (arg=0x6080000029b8) at /data/Server/10.6B/sql/sql_connect.cc:1312
# 2022-09-11T16:54:38 [3687192] | #12 0x00007daa403df609 in start_thread (arg=<optimized out>) at pthread_create.c:477
# 2022-09-11T16:54:38 [3687192] | #13 0x000032f4599cc293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
instead.

Comment by Matthias Leich [ 2022-10-21 ]

Maybe the bug is a sibling of MDEV-29850

Comment by Sergei Golubchik [ 2022-10-22 ]

I've no idea how to repeat it and the "rr analysis" doesn't help. Please, provide a repeatable test case. For example, if you understand what's happening from rr, you should be able to repeat exactly the same configuration with DEBUG_SYNC.

Generated at Thu Feb 08 10:09:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.