[MDEV-26979] heap-use-after-free or SIGSEGV when accessing INNODB_SYS_TABLESTATS during DDL Created: 2021-11-05  Updated: 2022-06-28  Resolved: 2022-06-27

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.5
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.1

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile-analyzed


 Description   

Bad effect detected by [~elenst] and easy reproduced after hint of [~marko].
origin/10.6 3ab358f0f1c852c71a539bb2b18b3cf828cc0007 2021-11-04T10:22:06+01:00
 
In case something like
   SELECT * FROM information_schema.innodb_sys_tablestats ;
gets frequent executed within some average test running a lot DDL
than the following might show up
 
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 55483][rr 1428002 55487]==1428002==ERROR: AddressSanitizer: heap-use-after-free on address 0x618000078955 at pc 0x55df0e4f1f7a bp 0x7f882ec63b30 sp 0x7f882ec63b20
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 55490][rr 1428002 55492]READ of size 1 at 0x618000078955 thread T30
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 55996]2021-11-05  6:52:55 32 [Note] InnoDB: Online DDL : Start reading clustered index of the table and create temporary files
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 61951]    #0 0x55df0e4f1f79 in dict_sys_t::find(dict_table_t const*) /data/Server/10.6I/storage/innobase/include/dict0dict.h:1513
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 61965]    #1 0x55df0e4f2160 in dict_sys_t::allow_eviction(dict_table_t*) (/data/Server_bin/10.6I_asan/bin/mariadbd+0x28e3160)
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 61975]    #2 0x55df0e4d0eef in i_s_sys_tables_fill_table_stats /data/Server/10.6I/storage/innobase/handler/i_s.cc:5129
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 61993]    #3 0x55df0d4d1d99 in get_schema_tables_result(JOIN*, enum_schema_table_state) /data/Server/10.6I/sql/sql_show.cc:8842
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62032]    #4 0x55df0d3b25e0 in JOIN::exec_inner() /data/Server/10.6I/sql/sql_select.cc:4694
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62034]    #5 0x55df0d3b05c5 in JOIN::exec() /data/Server/10.6I/sql/sql_select.cc:4515
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62042]    #6 0x55df0d3b41c6 in mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*) /data/Server/10.6I/sql/sql_select.cc:4994
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62044]    #7 0x55df0d38928e in handle_select(THD*, LEX*, select_result*, unsigned long) /data/Server/10.6I/sql/sql_select.cc:545
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62078]    #8 0x55df0d303e25 in execute_sqlcom_select /data/Server/10.6I/sql/sql_parse.cc:6256
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62080]    #9 0x55df0d2f2b38 in mysql_execute_command(THD*, bool) /data/Server/10.6I/sql/sql_parse.cc:3946
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62082]    #10 0x55df0d30de8b in mysql_parse(THD*, char*, unsigned int, Parser_state*) /data/Server/10.6I/sql/sql_parse.cc:8030
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62084]    #11 0x55df0d2e6108 in dispatch_command(enum_server_command, THD*, char*, unsigned int, bool) /data/Server/10.6I/sql/sql_parse.cc:1896
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62086]    #12 0x55df0d2e34e0 in do_command(THD*, bool) /data/Server/10.6I/sql/sql_parse.cc:1404
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62092]    #13 0x55df0d6e2f32 in do_handle_one_connection(CONNECT*, bool) /data/Server/10.6I/sql/sql_connect.cc:1418
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62102]    #14 0x55df0d6e27c3 in handle_one_connection /data/Server/10.6I/sql/sql_connect.cc:1312
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62110]    #15 0x7f8863a0f608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62112]    #16 0x7f88635e2292 in __clone (/lib/x86_64-linux-gnu/libc.so.6+0x122292)
....
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 62554]SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/10.6I/storage/innobase/include/dict0dict.h:1513 in dict_sys_t::find(dict_table_t const*)
# 2021-11-05T06:55:57 [1426338] | Query (0x62b000214238): SELECT * FROM information_schema.innodb_sys_tablestats
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 78184]
# 2021-11-05T06:55:57 [1426338] | Connection ID (thread ID): 33
# 2021-11-05T06:55:57 [1426338] | [rr 1428002 78186]Status: KILL_TIMEOUT
 
sdp:/data/Results/1636120301/TBR-1218/dev/shm/vardir/1636120301/56/1/rr
 
RQG
-------
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental, origin/experimental fe8ba343b097bcb54fe31445e5d5699546b53e66 2021-11-03T20:19:53+01:00
# rqg.pl  : Version 3.4 (2021-09)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/runtime/alter_online.yy \
# --gendata=conf/runtime/alter_online.zz \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --redefine=ElenaMarko.yy \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--loose-innodb-sync-debug \
# --mysqld=--innodb_stats_persistent=off \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=33 \
# --mysqld=--innodb-use-native-aio=0 \
# --rr=Extended \
# --rr_options=--wait \
# --mysqld=--innodb_undo_tablespaces=3 \
# --mysqld=--innodb_undo_log_truncate=ON \
# --mysqld=--innodb_page_size=8K \
# --mysqld=--innodb-buffer-pool-size=256M \
# --no_mask \
# <certain local settings>
 
ElenaMarko.yy contains
query_add:
    SELECT * FROM information_schema.innodb_sys_tablestats ;



 Comments   
Comment by Alice Sherepa [ 2022-04-05 ]

I am getting probably the same, but without ASAN - there is the assertion `find(table)' in dict_sys_t::allow_eviction

mysqld: /git/10.9/storage/innobase/include/dict0dict.h:1531: void dict_sys_t::allow_eviction(dict_table_t*): Assertion `find(table)' failed.
220405 10:25:19 [ERROR] mysqld got signal 6 ;
 
Server version: 10.9.0-MariaDB-debug-log
 
??:0(__assert_fail)[0x7fd4b5646102]
include/dict0dict.h:1532(dict_sys_t::allow_eviction(dict_table_t*))[0x55976087c994]
handler/i_s.cc:5145(i_s_sys_tables_fill_table_stats(THD*, TABLE_LIST*, Item*))[0x5597608716c9]
sql/sql_show.cc:8883(get_schema_tables_result(JOIN*, enum_schema_table_state))[0x5597600ffb06]
sql/sql_select.cc:4732(JOIN::exec_inner())[0x55976008e76c]
sql/sql_select.cc:4555(JOIN::exec())[0x55976008dad3]
sql/sql_select.cc:5036(mysql_select(THD*, TABLE_LIST*, List<Item>&, Item*, unsigned int, st_order*, st_order*, Item*, st_order*, unsigned long long, select_result*, st_select_lex_unit*, st_select_lex*))[0x55976008f419]
sql/sql_select.cc:570(handle_select(THD*, LEX*, select_result*, unsigned long))[0x55976007e6ae]
sql/sql_parse.cc:6260(execute_sqlcom_select(THD*, TABLE_LIST*))[0x5597600279d4]
sql/sql_parse.cc:3950(mysql_execute_command(THD*, bool))[0x55976001ec3f]
sql/sp_head.cc:3835(sp_instr_stmt::exec_core(THD*, unsigned int*))[0x55975ff216f1]
sql/sp_head.cc:3561(sp_lex_keeper::reset_lex_and_exec_core(THD*, unsigned int*, bool, sp_instr*))[0x55975ff20a6a]
sql/sp_head.cc:3741(sp_instr_stmt::execute(THD*, unsigned int*))[0x55975ff212ad]
sql/sp_head.cc:1438(sp_head::execute(THD*, bool))[0x55975ff1a7a9]
sql/sp_head.cc:2425(sp_head::execute_procedure(THD*, List<Item>*))[0x55975ff1d3a5]
sql/sql_parse.cc:3029(do_execute_sp(THD*, sp_head*))[0x55976001be94]
sql/sql_parse.cc:3275(Sql_cmd_call::execute(THD*))[0x55976001cac8]
sql/sql_parse.cc:5995(mysql_execute_command(THD*, bool))[0x55976002669c]
sql/sql_parse.cc:8035(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x55976002c78f]
sql/sql_parse.cc:1897(dispatch_command(enum_server_command, THD*, char*, unsigned int, bool))[0x559760018ced]
sql/sql_parse.cc:1403(do_command(THD*, bool))[0x5597600175ba]
sql/sql_connect.cc:1418(do_handle_one_connection(CONNECT*, bool))[0x5597601e956c]
sql/sql_connect.cc:1314(handle_one_connection)[0x5597601e9202]
perfschema/pfs.cc:2203(pfs_spawn_thread)[0x559760719bcf]
nptl/pthread_create.c:487(start_thread)[0x7fd4b5b03fa3]
x86_64/clone.S:97(clone)[0x7fd4b570eeff]

Comment by Marko Mäkelä [ 2022-06-27 ]

Sorry, I forgot to check that trace before it was removed. Could a new trace be produced?

Comment by Matthias Leich [ 2022-06-27 ]

pluto:/data/results/1656332839/TBR-1218-MDEV-26979$ _RR_TRACE_DIR=./1/rr/ rr replay
 
Maybe a sibling of the problem
SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/10.6Y/storage/innobase/handler/i_s.cc:5034 in i_s_dict_fill_sys_tablestats
pluto:/data/results/1656332839/TBR-1557$ _RR_TRACE_DIR=./1/rr/ rr replay

Comment by Marko Mäkelä [ 2022-06-27 ]

This is a race condition between a DDL statement that is rebuilding or removing a table, and access to INFORMATION_SCHEMA.INNODB_SYS_TABLESTATS. The call to dict_sys_t::prevent_eviction() will only detach a table from dict_sys.table_LRU and attach it to dict_sys.table_non_LRU. It will not prevent the table from being removed from the cache as part of DDL transaction commit.

The fix is to copy all data from table_rec to table->to_fill->field before releasing dict_sys.latch. In that way, we can also remove the hack to prevent table eviction and acquire dict_sys.latch in shared mode.

Generated at Thu Feb 08 09:49:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.