[MDEV-28462]  AddressSanitizer: use-after-poison dict_index_t::get_n_nullable(unsigned long) const Created: 2022-05-03  Updated: 2023-11-16  Resolved: 2022-11-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.11.2, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Major
Reporter: Thirunarayanan Balathandayuthapani Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: race, regression

Issue Links:
Relates
relates to MDEV-11369 Instant add column for InnoDB Closed
relates to MDEV-15562 Instant DROP COLUMN or changing the o... Closed
relates to MDEV-24258 Merge dict_sys.mutex into dict_sys.latch Closed
relates to MDEV-27700 SUMMARY: AddressSanitizer: heap-use-a... Closed

 Description   

One thread is trying to add a new column into the table. In the meantime, other thread
does hash table validation, it builds the offset and tries to access the column which was newly
added by instant operation.

(gdb) t 120
[Switching to thread 120 (Thread 0x7fea7f813700 (LWP 556985))]
#0  __lll_lock_wait (futex=futex@entry=0x56260051f1a8 <buf_pool+40>, private=0) at lowlevellock.c:52
52	lowlevellock.c: No such file or directory.
(gdb) where
#0  __lll_lock_wait (futex=futex@entry=0x56260051f1a8 <buf_pool+40>, private=0) at lowlevellock.c:52
#1  0x00007feaa4cbe235 in __GI___pthread_mutex_lock (mutex=0x56260051f1a8 <buf_pool+40>) at ../nptl/pthread_mutex_lock.c:135
#2  0x00005625fe88cf1c in safe_mutex_lock (mp=0x56260051f180 <buf_pool>, my_flags=0, 
    file=0x5625ff635340 "/data/Server/bb-10.6-thiru/storage/innobase/buf/buf0lru.cc", line=767)
    at /data/Server/bb-10.6-thiru/mysys/thr_mutex.c:290
#3  0x00005625fe4e99db in inline_mysql_mutex_lock (that=0x56260051f180 <buf_pool>, 
    src_file=0x5625ff635340 "/data/Server/bb-10.6-thiru/storage/innobase/buf/buf0lru.cc", src_line=767)
    at /data/Server/bb-10.6-thiru/include/mysql/psi/mysql_thread.h:750
#4  0x00005625fe4ef60a in buf_page_make_young (bpage=0x7fea97d37ea0) at /data/Server/bb-10.6-thiru/storage/innobase/buf/buf0lru.cc:767
#5  0x00005625fe48865c in buf_page_make_young_if_needed (bpage=0x7fea97d37ea0)
    at /data/Server/bb-10.6-thiru/storage/innobase/include/buf0buf.h:315
#6  0x00005625fe4a46b7 in buf_page_get_low (page_id=..., zip_size=0, rw_latch=1, guess=0x7fea97d37ea0, mode=10, mtr=0x7fea7f80b1d0, 
    err=0x7fea7f809a90, allow_ibuf_merge=false) at /data/Server/bb-10.6-thiru/storage/innobase/buf/buf0buf.cc:2937
#7  0x00005625fe4a5366 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=1, guess=0x7fea97d37ea0, mode=10, mtr=0x7fea7f80b1d0, 
    err=0x7fea7f809a90, allow_ibuf_merge=false) at /data/Server/bb-10.6-thiru/storage/innobase/buf/buf0buf.cc:3045
#8  0x00005625fe43f9f4 in btr_cur_search_to_nth_level_func (index=0x616000038e08, level=0, tuple=0x616000cc5208, mode=PAGE_CUR_LE, latch_mode=2, 
    cursor=0x7fea7f80adc0, ahi_latch=0x0, mtr=0x7fea7f80b1d0, autoinc=0) at /data/Server/bb-10.6-thiru/storage/innobase/btr/btr0cur.cc:1602
#9  0x00005625fe222405 in btr_pcur_open_low (index=0x616000038e08, level=0, tuple=0x616000cc5208, mode=PAGE_CUR_LE, latch_mode=2, 
    cursor=0x7fea7f80adc0, autoinc=0, mtr=0x7fea7f80b1d0) at /data/Server/bb-10.6-thiru/storage/innobase/include/btr0pcur.inl:369
#10 0x00005625fe230e12 in row_ins_clust_index_entry_low (flags=0, mode=2, index=0x616000038e08, n_uniq=2, entry=0x616000cc5208, n_ext=0, 
    thr=0x6280006a0d48) at /data/Server/bb-10.6-thiru/storage/innobase/row/row0ins.cc:2584
#11 0x00005625fe234306 in row_ins_clust_index_entry (index=0x616000038e08, entry=0x616000cc5208, thr=0x6280006a0d48, n_ext=0)
    at /data/Server/bb-10.6-thiru/storage/innobase/row/row0ins.cc:3137
#12 0x00005625fe234c05 in row_ins_index_entry (index=0x616000038e08, entry=0x616000cc5208, thr=0x6280006a0d48)
    at /data/Server/bb-10.6-thiru/storage/innobase/row/row0ins.cc:3263
#13 0x00005625fe235c75 in row_ins_index_entry_step (node=0x6280006a0a90, thr=0x6280006a0d48)
    at /data/Server/bb-10.6-thiru/storage/innobase/row/row0ins.cc:3431
#14 0x00005625fe2366c9 in row_ins (node=0x6280006a0a90, thr=0x6280006a0d48) at /data/Server/bb-10.6-thiru/storage/innobase/row/row0ins.cc:3578
#15 0x00005625fe2377f9 in row_ins_step (thr=0x6280006a0d48) at /data/Server/bb-10.6-thiru/storage/innobase/row/row0ins.cc:3724
#16 0x00005625fe1ad62b in que_thr_step (thr=0x6280006a0d48) at /data/Server/bb-10.6-thiru/storage/innobase/que/que0que.cc:632
#17 0x00005625fe1adb80 in que_run_threads_low (thr=0x6280006a0d48) at /data/Server/bb-10.6-thiru/storage/innobase/que/que0que.cc:709
#18 0x00005625fe1add22 in que_run_threads (thr=0x6280006a0d48) at /data/Server/bb-10.6-thiru/storage/innobase/que/que0que.cc:729
#19 0x00005625fe1ae048 in que_eval_sql (info=0x616007048608, 
    sql=0x5625ff3bdf20 "PROCEDURE ADD_COL () IS\nBEGIN\nINSERT INTO SYS_COLUMNS VALUES(:id,:pos,:name,:mtype,:prtype,:len,:base);\nEND;\n", 
    trx=0x7fea98e10c40) at /data/Server/bb-10.6-thiru/storage/innobase/que/que0que.cc:768
#20 0x00005625fdf78dbe in innodb_insert_sys_columns (table_id=43, pos=9, field_name=0x6190013eb37c "col_text_copy", mtype=5, prtype=524540, 
    len=10, n_base=0, trx=0x7fea98e10c40, update=false) at /data/Server/bb-10.6-thiru/storage/innobase/handler/handler0alter.cc:5245
--Type <RET> for more, q to quit, c to continue without paging--
#21 0x00005625fdf7b609 in innobase_instant_try (ha_alter_info=0x7fea7f80db90, ctx=0x62b000199518, altered_table=0x7fea7f80e290, 
    table=0x619000974098, trx=0x7fea98e10c40) at /data/Server/bb-10.6-thiru/storage/innobase/handler/handler0alter.cc:5807
#22 0x00005625fdfcd2be in commit_try_norebuild (ha_alter_info=0x7fea7f80db90, ctx=0x62b000199518, altered_table=0x7fea7f80e290, 
    old_table=0x619000974098, trx=0x7fea98e10c40, table_name=0x61b000214d55 "t3")
    at /data/Server/bb-10.6-thiru/storage/innobase/handler/handler0alter.cc:10436
#23 0x00005625fdfa6d61 in ha_innobase::commit_inplace_alter_table (this=0x61d0015afeb8, altered_table=0x7fea7f80e290, 
    ha_alter_info=0x7fea7f80db90, commit=true) at /data/Server/bb-10.6-thiru/storage/innobase/handler/handler0alter.cc:11162
#24 0x00005625fd60679a in handler::ha_commit_inplace_alter_table (this=0x61d0015afeb8, altered_table=0x7fea7f80e290, 
    ha_alter_info=0x7fea7f80db90, commit=true) at /data/Server/bb-10.6-thiru/sql/handler.cc:5218
#25 0x00005625fd080092 in mysql_inplace_alter_table (thd=0x62b00018f218, table_list=0x62b000196418, table=0x619000974098, 
    altered_table=0x7fea7f80e290, ha_alter_info=0x7fea7f80db90, target_mdl_request=0x7fea7f80dc90, ddl_log_state=0x7fea7f80d9b0, 
    trigger_param=0x7fea7f80e700, alter_ctx=0x7fea7f80f1a0) at /data/Server/bb-10.6-thiru/sql/sql_table.cc:7471
#26 0x00005625fd093a5b in mysql_alter_table (thd=0x62b00018f218, new_db=0x62b000193c18, new_name=0x62b000194030, create_info=0x7fea7f810650, 
    table_list=0x62b000196418, alter_info=0x7fea7f810520, order_num=0, order=0x0, ignore=false, if_exists=false)
    at /data/Server/bb-10.6-thiru/sql/sql_table.cc:10267
#27 0x00005625fd222a62 in Sql_cmd_alter_table::execute (this=0x62b000196c68, thd=0x62b00018f218)
    at /data/Server/bb-10.6-thiru/sql/sql_alter.cc:542
#28 0x00005625fce2448e in mysql_execute_command (thd=0x62b00018f218, is_called_from_prepared_stmt=false)
    at /data/Server/bb-10.6-thiru/sql/sql_parse.cc:6012
#29 0x00005625fce30876 in mysql_parse (thd=0x62b00018f218, 
    rawbuf=0x62b000196238 "ALTER TABLE t3 ADD COLUMN IF NOT EXISTS col_text_copy TEXT, LOCK = EXCLUSIVE, ALGORITHM = NOCOPY  /* E_R Thread6 QNO 4576 CON_ID 968 */", length=135, parser_state=0x7fea7f811b20) at /data/Server/bb-10.6-thiru/sql/sql_parse.cc:8045
#30 0x00005625fce08c6f in dispatch_command (command=COM_QUERY, thd=0x62b00018f218, 
    packet=0x629001f6d219 " ALTER TABLE t3 ADD COLUMN IF NOT EXISTS col_text_copy TEXT, LOCK = EXCLUSIVE, ALGORITHM = NOCOPY  /* E_R Thread6 QNO 4576 CON_ID 968 */ ", packet_length=137, blocking=true) at /data/Server/bb-10.6-thiru/sql/sql_parse.cc:1912
#31 0x00005625fce05ea5 in do_command (thd=0x62b00018f218, blocking=true) at /data/Server/bb-10.6-thiru/sql/sql_parse.cc:1409
#32 0x00005625fd208e98 in do_handle_one_connection (connect=0x6080000dc5b8, put_in_cache=true)
    at /data/Server/bb-10.6-thiru/sql/sql_connect.cc:1418
#33 0x00005625fd208724 in handle_one_connection (arg=0x6080000036b8) at /data/Server/bb-10.6-thiru/sql/sql_connect.cc:1312
#34 0x00007feaa4cbb609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#35 0x00007feaa488e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
 
t 1:
 
 
(gdb) t 1
[Switching to thread 1 (Thread 0x7fea7faac700 (LWP 556970))]
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
56	../sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
(gdb) where
#0  __pthread_kill (threadid=<optimized out>, signo=6) at ../sysdeps/unix/sysv/linux/pthread_kill.c:56
#1  0x00005625fe882e57 in my_write_core (sig=6) at /data/Server/bb-10.6-thiru/mysys/stacktrace.c:424
#2  0x00005625fd5e4803 in handle_fatal_signal (sig=6) at /data/Server/bb-10.6-thiru/sql/signal_handler.cc:345
#3  <signal handler called>
#4  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#5  0x00007feaa4791859 in __GI_abort () at abort.c:79
#6  0x00007feaa527e6a2 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#7  0x00007feaa528924c in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#8  0x00007feaa526a8ec in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#9  0x00007feaa526a363 in ?? () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#10 0x00007feaa526b1ab in __asan_report_load8 () from /usr/lib/x86_64-linux-gnu/libasan.so.5
#11 0x00005625fdfad0b8 in dict_index_t::get_n_nullable (this=0x616002113908, n_prefix=11)
    at /data/Server/bb-10.6-thiru/storage/innobase/include/dict0mem.h:1239
#12 0x00005625fe1d65a5 in rec_init_offsets_comp_ordinary<>(const rec_t *, const dict_index_t *, rec_offs *, ulint, const dict_col_t::def_t *, rec_leaf_format) (rec=0x7fea97e21e72 "\200", index=0x616002113908, offsets=0x7fea7faa8740, n_core=8, def_val=0x0, format=REC_LEAF_INSTANT)
    at /data/Server/bb-10.6-thiru/storage/innobase/rem/rem0rec.cc:338
#13 0x00005625fe1c8ec0 in rec_init_offsets (rec=0x7fea97e21e72 "\200", index=0x616002113908, n_core=8, offsets=0x7fea7faa8740)
    at /data/Server/bb-10.6-thiru/storage/innobase/rem/rem0rec.cc:650
#14 0x00005625fe1cb5f1 in rec_get_offsets_func (rec=0x7fea97e21e72 "\200", index=0x616002113908, offsets=0x7fea7faa8740, n_core=8, n_fields=1, 
    file=0x5625ff5f5120 "/data/Server/bb-10.6-thiru/storage/innobase/btr/btr0sea.cc", line=2243, heap=0x7fea7faa84a0)
    at /data/Server/bb-10.6-thiru/storage/innobase/rem/rem0rec.cc:947
#15 0x00005625fe487aa2 in btr_search_hash_table_validate (hash_table_id=3) at /data/Server/bb-10.6-thiru/storage/innobase/btr/btr0sea.cc:2243
#16 0x00005625fe48848e in btr_search_validate () at /data/Server/bb-10.6-thiru/storage/innobase/btr/btr0sea.cc:2337
#17 0x00005625fdf13be0 in ha_innobase::check (this=0x61d001c926b8, thd=0x62b00016c218, check_opt=0x62b000171618)
    at /data/Server/bb-10.6-thiru/storage/innobase/handler/ha_innodb.cc:15380
#18 0x00005625fd604607 in handler::ha_check (this=0x61d001c926b8, thd=0x62b00016c218, check_opt=0x62b000171618)
    at /data/Server/bb-10.6-thiru/sql/handler.cc:4949
#19 0x00005625fd23c7d2 in mysql_admin_table (thd=0x62b00016c218, tables=0x62b000173370, check_opt=0x62b000171618, 
    operator_name=0x5625ffbf0840 <msg_check>, lock_type=TL_READ_NO_INSERT, org_open_for_modify=false, repair_table_use_frm=false, 
    extra_open_options=32, prepare_func=0x0, operator_func=
    (int (handler::*)(class handler * const, class THD *, HA_CHECK_OPT *)) 0x5625fd60417e <handler::ha_check(THD*, st_ha_check_opt*)>, 
    view_operator_func=0x5625fd11a3aa <view_check(THD*, TABLE_LIST*, st_ha_check_opt*)>, is_cmd_replicated=false)
    at /data/Server/bb-10.6-thiru/sql/sql_admin.cc:875
#20 0x00005625fd2412f9 in Sql_cmd_check_table::execute (this=0x62b000173a68, thd=0x62b00016c218)
    at /data/Server/bb-10.6-thiru/sql/sql_admin.cc:1486
#21 0x00005625fce2448e in mysql_execute_command (thd=0x62b00016c218, is_called_from_prepared_stmt=false)
    at /data/Server/bb-10.6-thiru/sql/sql_parse.cc:6012
--Type <RET> for more, q to quit, c to continue without paging--
#22 0x00005625fce30876 in mysql_parse (thd=0x62b00016c218, 
    rawbuf=0x62b000173238 "CHECK TABLE t2 EXTENDED  /* E_R Thread10 QNO 4385 CON_ID 986 */", length=63, parser_state=0x7fea7faaab20)
    at /data/Server/bb-10.6-thiru/sql/sql_parse.cc:8045
#23 0x00005625fce08c6f in dispatch_command (command=COM_QUERY, thd=0x62b00016c218, packet=0x629001d1f219 "", packet_length=65, blocking=true)
    at /data/Server/bb-10.6-thiru/sql/sql_parse.cc:1912
#24 0x00005625fce05ea5 in do_command (thd=0x62b00016c218, blocking=true) at /data/Server/bb-10.6-thiru/sql/sql_parse.cc:1409
#25 0x00005625fd208e98 in do_handle_one_connection (connect=0x6080000dceb8, put_in_cache=true)
    at /data/Server/bb-10.6-thiru/sql/sql_connect.cc:1418
#26 0x00005625fd208724 in handle_one_connection (arg=0x6080000035b8) at /data/Server/bb-10.6-thiru/sql/sql_connect.cc:1312
#27 0x00007feaa4cbb609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#28 0x00007feaa488e293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
(gdb) f 15
#15 0x00005625fe487aa2 in btr_search_hash_table_validate (hash_table_id=3) at /data/Server/bb-10.6-thiru/storage/innobase/btr/btr0sea.cc:2243
2243				offsets = rec_get_offsets(
 
(gdb) p block->index->table->name
$3 = {m_name = 0x6020000299b0 "test/t3", static part_suffix = "#P#"}
 
 
 
(gdb) f 11
#11 0x00005625fdfad0b8 in dict_index_t::get_n_nullable (this=0x616002113908, n_prefix=11)
    at /data/Server/bb-10.6-thiru/storage/innobase/include/dict0mem.h:1239
1239				const dict_col_t* col = fields[n_prefix].col;
(gdb) p n_prefix
$4 = 11
(gdb) p fields[11]
$5 = {col = 0x6200000f5228, name = {m_name = 0x61c000351e2b "col_text_copy"}, prefix_len = 0, fixed_len = 0}

t 120 is adding the new column 'col_text_copy' into the table via instant alter algorithm.
I think it is race condition between instant alter + hash table validation



 Comments   
Comment by Marko Mäkelä [ 2022-11-15 ]

This occurred once more on a recent 10.6, this time between CHECK TABLE t4 and an instant ALTER TABLE on a different table:

ALTER IGNORE TABLE t3 ADD COLUMN col_text_copy TEXT FIRST, ALGORITHM = NOCOPY, LOCK = DEFAULT;

The heap-use-after-poison is reported on dict_index_t::get_n_nullable() on the PRIMARY key of table t3. The ALTER TABLE was executing in the following:

#21 0x0000558a963123de in innodb_insert_sys_columns (table_id=21, pos=pos@entry=0, field_name=0x619000433411 "col_text_copy", mtype=5, prtype=524540, len=10, n_base=0, trx=0x7fb7e5409440, update=false) at /data/Server/bb-10.6-MDEV-29835C/storage/innobase/handler/handler0alter.cc:5384
#22 0x0000558a9634559a in innobase_instant_try (ha_alter_info=ha_alter_info@entry=0x7fb7cdeeeda0, ctx=ctx@entry=0x62b000198d60, altered_table=altered_table@entry=0x7fb7cdeef4a0, table=table@entry=0x619000cc4c98, trx=trx@entry=0x7fb7e5409440) at /data/Server/bb-10.6-MDEV-29835C/storage/innobase/handler/handler0alter.cc:5946
#23 0x0000558a9634c1a7 in commit_try_norebuild (ha_alter_info=ha_alter_info@entry=0x7fb7cdeeeda0, ctx=ctx@entry=0x62b000198d60, altered_table=altered_table@entry=0x7fb7cdeef4a0, old_table=<optimized out>, trx=trx@entry=0x7fb7e5409440, table_name=<optimized out>) at /data/Server/bb-10.6-MDEV-29835C/storage/innobase/handler/handler0alter.cc:10559
#24 0x0000558a9630ed98 in ha_innobase::commit_inplace_alter_table (this=<optimized out>, altered_table=<optimized out>, ha_alter_info=<optimized out>, commit=<optimized out>) at /data/Server/bb-10.6-MDEV-29835C/storage/innobase/handler/handler0alter.cc:11308

I see that btr_search_hash_table_validate() is already invoking btr_search_x_lock_all(). Possibly, the cache modification part of ALTER TABLE should invoke btr_search_s_lock_all(). A simpler alternative would be that inside the loop in btr_search_hash_table_validate(), dict_sys.freeze() will be invoked before dereferencing any buf_block_t::index:

diff --git a/storage/innobase/btr/btr0sea.cc b/storage/innobase/btr/btr0sea.cc
index 140fac851de..4b168d76517 100644
--- a/storage/innobase/btr/btr0sea.cc
+++ b/storage/innobase/btr/btr0sea.cc
@@ -2211,6 +2211,7 @@ btr_search_hash_table_validate(ulint hash_table_id)
 			invokes btr_search_drop_page_hash_index(). */
 			ut_a(block->page.state() == buf_page_t::REMOVE_HASH);
 state_ok:
+			dict_sys.freeze(SRW_LOCK_CALL);
 			ut_ad(!dict_index_is_ibuf(block->index));
 			ut_ad(block->page.id().space()
 			      == block->index->table->space_id);
@@ -2225,6 +2226,7 @@ btr_search_hash_table_validate(ulint hash_table_id)
 				btr_search_get_n_fields(block->curr_n_fields,
 							block->curr_n_bytes),
 				&heap);
+			dict_sys.unfreeze();
 
 			const ulint	fold = rec_fold(
 				node->data, offsets,

Comment by Matthias Leich [ 2022-11-15 ]

For the case mentioned by Marko
# git clone https://github.com/mleich1/rqg --branch experimental RQG
#
# GIT_SHOW: HEAD -> experimental c0cd00de14dd52daa87b155e44a5e4a6f9e67e4d 2022-09-22T16:32:22+02:00
# rqg.pl  : Version 4.0.6 (2022-09)
#
# $RQG_HOME/rqg.pl \
# --grammar=conf/mariadb/table_stress_innodb_nocopy1.yy \
# --gendata=conf/mariadb/table_stress.zz \
# --gendata_sql=conf/mariadb/table_stress.sql \
# --reporters=RestartConsistency \
# --mysqld=--loose-innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --no-mask \
# --queries=10000000 \
# --seed=random \
# --reporters=Backtrace \
# --reporters=ErrorLog \
# --reporters=Deadlock1 \
# --validators=None \
# --mysqld=--log_output=none \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --engine=InnoDB \
# --restart_timeout=240 \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--loose-file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--plugin-load-add=provider_lzo.so \
# --mysqld=--plugin-load-add=provider_bzip2.so \
# --mysqld=--plugin-load-add=provider_lzma.so \
# --mysqld=--plugin-load-add=provider_snappy.so \
# --mysqld=--plugin-load-add=provider_lz4.so \
# --duration=300 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--innodb_file_per_table=0 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--innodb_stats_persistent=off \
# --mysqld=--innodb_adaptive_hash_index=on \
# --mysqld=--log-bin \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-innodb_evict_tables_on_commit_debug=off \
# --mysqld=--loose-max-statement-time=30 \
# --threads=33 \
# --mysqld=--innodb_use_native_aio=1 \
# --mysqld=--loose_innodb_change_buffering=inserts \
# --mysqld=--innodb_rollback_on_timeout=ON \
# --mysqld=--innodb_page_size=16K \
# --mysqld=--innodb-buffer-pool-size=8M \
# <local settings>
pluto:/data/results/1668176455/TBR-1462/

Comment by Marko Mäkelä [ 2022-11-15 ]

I realized that my suggested patch is incorrect and may cause server hangs. dict_sys.latch should be acquired before any adaptive hash index latches. We could acquire the shared dict_sys.latch at the start of btr_search_hash_table_validate(), but then a single CHECK TABLE could block pretty much the entire server when the data dictionary cache needs to be modified while it is executing.

A better fix should actually be to acquire the AHI latch that covers the clustered index, in instant ADD/DROP/reorder column.

Comment by Marko Mäkelä [ 2022-11-16 ]

Sure enough, my incorrect fix did cause a deadlock in testing. Stack traces:

Thread 13 (Thread 2270493.2271152 (mariadbd)):
#14 buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /data/Server/bb-10.6-MDEV-29835/storage/innobase/buf/buf0lru.cc:402
#15 0x000055fdcae309e6 in buf_block_alloc () at /data/Server/bb-10.6-MDEV-29835/storage/innobase/include/buf0buf.inl:93
#16 btr_search_check_free_space_in_heap (index=index@entry=0x616000038b08) at /data/Server/bb-10.6-MDEV-29835/storage/innobase/btr/btr0sea.cc:181
#17 0x000055fdcae39956 in btr_search_update_hash_on_insert (cursor=cursor@entry=0x7f423dfe81d0, ahi_latch=<optimized out>) at /data/Server/bb-10.6-MDEV-29835/storage/innobase/btr/btr0sea.cc:1951
#42 0x000055fdc9501db9 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x62b0000fc218, packet=packet@entry=0x629000c0d219 " ALTER TABLE t4 MODIFY COLUMN col_string CHAR(19) NULL  /* E_R Thread1 QNO 90 CON_ID 16 */ ", packet_length=packet_length@entry=91, blocking=blocking@entry=true) at /data/Server/bb-10.6-MDEV-29835/sql/sql_parse.cc:1896
Thread 14 (Thread 2270493.2271153 (mariadbd)):
#15 dict_sys_t::freeze (this=<optimized out>) at /data/Server/bb-10.6-MDEV-29835/storage/innobase/include/dict0dict.h:1547
#16 btr_search_hash_table_validate (hash_table_id=hash_table_id@entry=0) at /data/Server/bb-10.6-MDEV-29835/storage/innobase/btr/btr0sea.cc:2214
#17 0x000055fdcae33d5f in btr_search_validate () at /data/Server/bb-10.6-MDEV-29835/storage/innobase/btr/btr0sea.cc:2318
#18 0x000055fdca8441a4 in ha_innobase::check (this=0x61d000d642b8, thd=<optimized out>, check_opt=0x62b00010f648) at /data/Server/bb-10.6-MDEV-29835/storage/innobase/handler/ha_innodb.cc:15340

Thread 13 is holding exclusive dict_sys.latch and waiting for buf_pool.mutex that Thread 14 (CHECK TABLE) is holding. Thread 14 is waiting for a shared dict_sys.latch.

thiru, can you please try to implement the fix to instant ALTER that I suggested?

Comment by Thirunarayanan Balathandayuthapani [ 2022-11-17 ]

Patch is in bb-10.6-MDEV-28462

Comment by Thirunarayanan Balathandayuthapani [ 2022-11-21 ]

10.3 patch is in bb-10.3-MDEV-28462

Comment by Marko Mäkelä [ 2022-11-22 ]

Thank you, the patches look good, except that they are missing

#ifdef BTR_CUR_HASH_ADAPT

to ensure that a build with cmake -DWITH_INNODB_AHI=OFF will succeed.

I do not think that it is necessary to check that the adaptive hash index is enabled at runtime. We can just unconditionally acquire the latch.

Generated at Thu Feb 08 10:00:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.