[MDEV-24011] InnoDB: Failing assertion: index_cache->words == NULL in fts0fts.cc line 551 Created: 2020-10-22  Updated: 2023-04-19  Resolved: 2023-04-19

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.35, 10.2.38
Fix Version/s: 10.4.29

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: fulltext, not-10.5+, rr-profile-analyzed

Attachments: File TBR-201.zz     File TBR-201_micro.yy     File simp_TBR-201.cfg    

 Description   

Assert hit during RQG testing
[rr 206526 1387520]2020-10-22 07:21:06 0x640000225700[rr 206526 1387523]  InnoDB: Assertion failure in file /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0fts.cc line 551
[rr 206526 1387525]InnoDB: Failing assertion: index_cache->words == NULL
 
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000023c358907859 in __GI_abort () at abort.c:79
#2  0x0000559c5a27ccee in ut_dbg_assertion_failed (expr=expr@entry=0x559c5b1f8040 "index_cache->words == NULL", file=file@entry=0x559c5b1f7ae0 "/home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0fts.cc", line=line@entry=551)
    at /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/ut/ut0dbg.cc:60
#3  0x0000559c5a5caec9 in fts_index_cache_init (allocator=0x61a000033a00, index_cache=0x623000011dc0) at /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0fts.cc:551
#4  0x0000559c5a5edd27 in fts_cache_init (cache=cache@entry=0x61a000033708) at /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0fts.cc:578
#5  0x0000559c5a5efaec in fts_sync_commit (sync=sync@entry=0x61a000033a28) at /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0fts.cc:4195
#6  0x0000559c5a5f0e9a in fts_sync (sync=0x61a000033a28, unlock_cache=<optimized out>, wait=wait@entry=false) at /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0fts.cc:4371
#7  0x0000559c5a5f12e9 in fts_sync_table (table=table@entry=0x618000030d08, wait=wait@entry=false) at /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0fts.cc:4417
#8  0x0000559c5a6024bc in fts_optimize_sync_table (table=0x618000030d08) at /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0opt.cc:2773
#9  0x0000559c5a616431 in fts_optimize_thread (arg=0x60c00003aa80) at /home/mleich/Server/bb-10.2-MDEV-23693/storage/innobase/fts/fts0opt.cc:2872
#10 0x000078fc46a47609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x000023c358a04103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
 
origin/bb-10.2-MDEV-23693 8205065535e46a001884379cdf2ca4e48ed91bbf 2020-10-21T21:33:49+05:30 compiled with debug+ASAN+Og
 
RQG
===
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \
--gendata=conf/engines/innodb/full_text_search.zz \
--short_column_names \
--grammar=conf/engines/innodb/full_text_search.yy \
--redefine=conf/mariadb/alter_table.yy \
--redefine=conf/mariadb/instant_add.yy \
--redefine=conf/mariadb/modules/alter_table_columns.yy \
--redefine=conf/mariadb/sp.yy \
--redefine=conf/mariadb/bulk_insert.yy \
--redefine=conf/mariadb/modules/foreign_keys.yy \
--redefine=conf/mariadb/modules/locks.yy \
--redefine=conf/mariadb/modules/sql_mode.yy \
--redefine=conf/mariadb/redefine_temporary_tables.yy \
--redefine=conf/mariadb/versioning.yy \
--redefine=conf/mariadb/sequences.yy \
--mysqld=--innodb_use_native_aio=1 \
--mysqld=--innodb_lock_schedule_algorithm=fcfs \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--connect_timeout=60 \
--mysqld=--interactive_timeout=28800 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--net_read_timeout=30 \
--mysqld=--net_write_timeout=60 \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--wait_timeout=28800 \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--innodb-lock-wait-timeout=50 \
--no-mask \
--queries=10000000 \
--seed=random \
--reporters=Backtrace \
--reporters=ErrorLog \
--reporters=Deadlock1 \
--validators=None \
--mysqld=--log_output=none \
--mysqld=--log-bin \
--mysqld=--log_bin_trust_function_creators=1 \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--engine=InnoDB \
--restart_timeout=120 \
--duration=400 \
--mysqld=--loose-innodb_fatal_semaphore_wait_threshold=400 \
--mysqld=--innodb_stats_persistent=on \
--mysqld=--loose-max-statement-time=30 \
--threads=33 \
--mysqld=--innodb_page_size=32K \
--mysqld=--innodb-buffer-pool-size=24M \
--duration=400 \
--no_mask \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_ \
--rr=Extended \
--rr_options=--chaos



 Comments   
Comment by Matthias Leich [ 2020-10-22 ]

2021-04-12    This trace is invalid.
   rr:/home/mleich/RQG/storage/1603300607/TBR-201/dev/shm/vardir/1603300607/17/1/rr
   _RR_TRACE_DIR="." rr replay --mark-stdio mysqld-0
 
    The archive is /home/mleich/RQG/storage/1603300607//010965.tgz
RQG error pattern
[ 'TBR-201'    , 'InnoDB: Failing assertion: index_cache->words == NULL' ],

Comment by Matthias Leich [ 2021-04-12 ]

The rr trace above is gone or invalid.
New rr trace based on simplified test (one session only)
origin/10.2 5a3151bcda2f279091a1e2f049a7d11cd91065d6 2021-04-09T12:01:42+05:30  10.2.38
pluto:/data/Results/1618235423/TBR-201-MDEV-24011/dev/shm/vardir/1618235423/70/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio
 
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x0000777f3c6db859 in __GI_abort () at abort.c:79
#2  0x0000555bd5277374 in ut_dbg_assertion_failed (expr=0x555bd5c17a40 "index_cache->words == NULL", file=0x555bd5c17700 "/data/Server/10.2/storage/innobase/fts/fts0fts.cc", line=551) at /data/Server/10.2/storage/innobase/ut/ut0dbg.cc:60
#3  0x0000555bd5438328 in fts_index_cache_init (allocator=0x616000063bc8, index_cache=0x622000013cd0) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:551
#4  0x0000555bd5438881 in fts_cache_init (cache=0x6160000639f0) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:596
#5  0x0000555bd5449fe2 in fts_sync_commit (sync=0x61a0000324f0) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:4199
#6  0x0000555bd544b169 in fts_sync (sync=0x61a0000324f0, unlock_cache=true, wait=false) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:4376
#7  0x0000555bd544b59e in fts_sync_table (table=0x6170000421f0, wait=false) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:4422
#8  0x0000555bd546d019 in fts_optimize_sync_table (table=0x6170000421f0) at /data/Server/10.2/storage/innobase/fts/fts0opt.cc:2773
#9  0x0000555bd546d54f in fts_optimize_thread (arg=0x606000001460) at /data/Server/10.2/storage/innobase/fts/fts0opt.cc:2872
#10 0x00007f95e0b21609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#11 0x0000777f3c7d8293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr)
 
Other bad effects hit when simplifying the RQG test
    # 2021-04-09T17:59:46 [1635157] #3  <signal handler called>
    # 2021-04-09T17:59:46 [1635157] #4  0x0000555f1b931eea in fts_sync (sync=0x61a00034e6f0, unlock_cache=true, wait=false) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:4337
    # 2021-04-09T17:59:46 [1635157] #5  0x0000555f1b93259e in fts_sync_table (table=0x617000074a70, wait=false) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:4422
    # 2021-04-09T17:59:46 [1635157] #6  0x0000555f1b954019 in fts_optimize_sync_table (table=0x617000074a70) at /data/Server/10.2/storage/innobase/fts/fts0opt.cc:2773
    # 2021-04-09T17:59:46 [1635157] #7  0x0000555f1b95454f in fts_optimize_thread (arg=0x6060000014c0) at /data/Server/10.2/storage/innobase/fts/fts0opt.cc:2872
    # 2021-04-09T17:59:46 [1635157] #8  0x00007f001fb70609 in start_thread
    [ 'TBR-986' , '#3  <signal handler called>.+#4  .{1,20}in fts_sync.+#5  .{1,20}in fts_sync_table.+#6  .{1,20}in fts_optimize_sync_table.+#7  .{1,20}in fts_optimize_thread.+#8  .{1,20}in start_thread' ],
    # 2021-04-09T11:58:25 [3251089] | 2021-04-09 11:58:11 0x7f5f480ca700  InnoDB: Assertion failure in file /data/Server/10.2/storage/innobase/fts/fts0fts.cc line 557
    # 2021-04-09T11:58:25 [3251089] | InnoDB: Failing assertion: index_cache->doc_stats == NULL
    [ 'TBR-985' , 'InnoDB: Assertion failure in file .{1,200}fts0fts.cc.+InnoDB: Failing assertion: index_cache->doc_stats == NULL' ],
    # 2021-04-09T10:57:27 [809292] | 2021-04-09 10:57:01 0x7f00f5ee9700  InnoDB: Assertion failure in file /data/Server/10.2/storage/innobase/srv/srv0start.cc line 2955
    # 2021-04-09T10:57:27 [809292] | InnoDB: Failing assertion: table->data_dir_path
    [ 'TBR-984-MDEV-12251' , 'InnoDB: Assertion failure in file .{1,200}srv0start.cc.+InnoDB: Failing assertion: table->data_dir_path' ],
    # 2021-04-09T10:28:10 [4095146] | SUMMARY: AddressSanitizer: heap-use-after-free /data/Server/10.2/storage/innobase/fts/fts0fts.cc:3984 in fts_sync_write_words
    [ 'TBR-983' , 'SUMMARY: AddressSanitizer: heap-use-after-free .{1,200}fts0fts.cc:.{1,10}in fts_sync_write_words' ],
 
I do not claim that all these other bad effects are caused by the same mistake in InnoDB code like the current MDEV-24011.
But maybe a fix here will remove some of them.
 
git clone https://github.com/mleich1/rqg --branch experimental RQG
./RR_REPLAY.sh simp_TBR-201.cfg <path to binaries> TBR-201_micro.yy
 
A build made with
-  ./util/bld_asan_O2_O0.sh replays quite fast
-  ./util/bld_asan.sh did not replay even though having several hundred RQG run
 
 

Comment by Marko Mäkelä [ 2021-04-13 ]

OK to push after mleich has tested it.

Comment by Thirunarayanan Balathandayuthapani [ 2021-04-15 ]

Thread 34 hit Hardware watchpoint 4: -location $12->used
Old value = 18
New value = 19
ib_vector_push (vec=0x61a000032558, elem=0x0) at /data/Server/10.2/storage/innobase/include/ut0vec.ic:270
270		return(last);
(rr) where
#0  ib_vector_push (vec=0x61a000032558, elem=0x0) at /data/Server/10.2/storage/innobase/include/ut0vec.ic:270
#1  0x0000555bd543a5c7 in fts_cache_index_cache_create (table=0x6170000421f0, index=0x61600064e3f0)
    at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:999
#2  0x0000555bd536cbae in dict_index_build_internal_fts (table=0x6170000421f0, index=0x61600064d4f0)
    at /data/Server/10.2/storage/innobase/dict/dict0dict.cc:2850
#3  0x0000555bd536812b in dict_index_add_to_cache (table=0x6170000421f0, index=@0x61600064d828: 0x61600064d4f0, page_no=4294967295, 
    add_v=0x0) at /data/Server/10.2/storage/innobase/dict/dict0dict.cc:2100
#4  0x0000555bd5356553 in dict_create_index_step (thr=0x61a000009e90) at /data/Server/10.2/storage/innobase/dict/dict0crea.cc:1485
#5  0x0000555bd50af0bb in que_thr_step (thr=0x61a000009e90) at /data/Server/10.2/storage/innobase/que/que0que.cc:1051
#6  0x0000555bd50af326 in que_run_threads_low (thr=0x61a000009e90) at /data/Server/10.2/storage/innobase/que/que0que.cc:1103
#7  0x0000555bd50af664 in que_run_threads (thr=0x61a000009e90) at /data/Server/10.2/storage/innobase/que/que0que.cc:1143
#8  0x0000555bd511b009 in row_merge_create_index_graph (trx=0x248304f05928, table=0x6170000421f0, 
    index=@0x7f95d98b1490: 0x61600064d4f0, add_v=0x0) at /data/Server/10.2/storage/innobase/row/row0merge.cc:4337
#9  0x0000555bd511b64f in row_merge_create_index (trx=0x248304f05928, table=0x6170000421f0, index_def=0x61d00046f0f0, add_v=0x0)
    at /data/Server/10.2/storage/innobase/row/row0merge.cc:4408
#10 0x0000555bd4f84022 in prepare_inplace_alter_table_dict (ha_alter_info=0x7f95d98b2830, altered_table=0x61e00028b088, 
    old_table=0x61e00013e088, table_name=0x6190002e94b5 "table100_innodb_int_autoinc", flags=33, flags2=84, fts_doc_id_col=19, 
    add_fts_doc_id=false, add_fts_doc_id_idx=false) at /data/Server/10.2/storage/innobase/handler/handler0alter.cc:4859
#11 0x0000555bd4f8b8b6 in ha_innobase::prepare_inplace_alter_table (this=0x61c0001e00a8, altered_table=0x61e00028b088, 
    ha_alter_info=0x7f95d98b2830) at /data/Server/10.2/storage/innobase/handler/handler0alter.cc:6072
#12 0x0000555bd4648a91 in mysql_inplace_alter_table (thd=thd@entry=0x62a0000f0208, table_list=<optimized out>, 
    table=table@entry=0x61e00013e088, altered_table=altered_table@entry=0x61e00028b088, 
    ha_alter_info=ha_alter_info@entry=0x7f95d98b2830, inplace_supported=<optimized out>, alter_ctx=<optimized out>, 
    target_mdl_request=0x7f95d98b2990) at /data/Server/10.2/sql/sql_table.cc:7425
#13 0x0000555bd466993e in mysql_alter_table (thd=thd@entry=0x62a0000f0208, new_db=<optimized out>, new_name=<optimized out>, 
    create_info=create_info@entry=0x7f95d98b4450, table_list=<optimized out>, table_list@entry=0x62b00001c468, 
    alter_info=alter_info@entry=0x7f95d98b4360, order_num=<optimized out>, order=<optimized out>, ignore=<optimized out>)
    at /data/Server/10.2/sql/sql_table.cc:9627
#14 0x0000555bd475fb6d in Sql_cmd_alter_table::execute (this=<optimized out>, thd=0x62a0000f0208)
    at /data/Server/10.2/sql/sql_alter.cc:333
#15 0x0000555bd4475fef in mysql_execute_command (thd=0x62a0000f0208) at /data/Server/10.2/sql/sql_parse.cc:6020
#16 0x0000555bd44876c5 in mysql_parse (thd=thd@entry=0x62a0000f0208, 
    rawbuf=0x62b00001c228 "ALTER TABLE `table100_innodb_int_autoinc` ADD FULLTEXT INDEX idx_0 (`c16`) /* E_R Thread1 QNO 8 CON_ID 19 */"--Type <RET> for more, q to quit, c to continue without paging--
, length=<optimized out>, parser_state=parser_state@entry=0x7f95d98b76d0, is_com_multi=is_com_multi@entry=false, 
    is_next_command=is_next_command@entry=false) at /data/Server/10.2/sql/sql_parse.cc:7794
#17 0x0000555bd448e36f in dispatch_command (command=COM_QUERY, thd=0x62a0000f0208, packet=<optimized out>, 
    packet_length=<optimized out>, is_com_multi=<optimized out>, is_next_command=<optimized out>)
    at /data/Server/10.2/sql/sql_class.h:1109
#18 0x0000555bd4492293 in do_command (thd=0x62a0000f0208) at /data/Server/10.2/sql/sql_parse.cc:1381
#19 0x0000555bd4756227 in do_handle_one_connection (connect=<optimized out>) at /data/Server/10.2/sql/sql_connect.cc:1336
#20 0x0000555bd47565fd in handle_one_connection (arg=<optimized out>) at /data/Server/10.2/sql/sql_connect.cc:1241
#21 0x00007f95e0b21609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#22 0x0000777f3c7d8293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) t 20
[Switching to thread 20 (Thread 3707564.3708660)]
#0  0x000059f77772cd68 in ?? () from /lib/x86_64-linux-gnu/libasan.so.5
(rr) where
#0  0x000059f77772cd68 in ?? () from /lib/x86_64-linux-gnu/libasan.so.5
#1  0x000059f77770be60 in calloc () from /lib/x86_64-linux-gnu/libasan.so.5
#2  0x0000555bd527a1e7 in rbt_create (sizeof_value=32, compare=0x0) at /data/Server/10.2/storage/innobase/ut/ut0rbt.cc:788
#3  0x0000555bd527a114 in rbt_create_arg_cmp (sizeof_value=32, 
    compare=0x555bd4f07d89 <innobase_fts_text_cmp(void const*, void const*, void const*)>, 
    cmp_arg=0x555bd64f4120 <my_charset_utf8_general_ci>) at /data/Server/10.2/storage/innobase/ut/ut0rbt.cc:764
#4  0x0000555bd543836c in fts_index_cache_init (allocator=0x616000063bc8, index_cache=0x622000013b20)
    at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:553
#5  0x0000555bd5438881 in fts_cache_init (cache=0x6160000639f0) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:596
#6  0x0000555bd5449fe2 in fts_sync_commit (sync=0x61a0000324f0) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:4199
#7  0x0000555bd544b169 in fts_sync (sync=0x61a0000324f0, unlock_cache=true, wait=false)
    at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:4376
#8  0x0000555bd544b59e in fts_sync_table (table=0x6170000421f0, wait=false) at /data/Server/10.2/storage/innobase/fts/fts0fts.cc:4422
#9  0x0000555bd546d019 in fts_optimize_sync_table (table=0x6170000421f0) at /data/Server/10.2/storage/innobase/fts/fts0opt.cc:2773
#10 0x0000555bd546d54f in fts_optimize_thread (arg=0x606000001460) at /data/Server/10.2/storage/innobase/fts/fts0opt.cc:2872
#11 0x00007f95e0b21609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#12 0x0000777f3c7d8293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

This issue happens when race condition happens when DDL and fts optimize thread.
DDL adds the new index to fts cache. At the same time, fts optimize thread
clears the cache and reinitialize it. I thought of the solution to
take init lock in cache before reinitializing it.

diff --git a/storage/innobase/fts/fts0fts.cc b/storage/innobase/fts/fts0fts.cc
index a74b5083128..f72b65b3028 100644
--- a/storage/innobase/fts/fts0fts.cc
+++ b/storage/innobase/fts/fts0fts.cc
@@ -4194,9 +4194,7 @@ fts_sync_commit(
 
        /* We need to do this within the deleted lock since fts_delete() can
        attempt to add a deleted doc id to the cache deleted id array. */
-       fts_cache_clear(cache);
-       DEBUG_SYNC_C("fts_deleted_doc_ids_clear");
-       fts_cache_init(cache);
+       fts_cache_reinit(cache);
        rw_lock_x_unlock(&cache->lock);
 
        if (UNIV_LIKELY(error == DB_SUCCESS)) {
@@ -6453,3 +6451,11 @@ fts_check_corrupt(
                dict_table_close(aux_table, FALSE, FALSE);
        }
 }
+
+void fts_cache_reinit(fts_cache_t* cache)
+{
+  rw_lock_x_lock(&cache->init_lock);
+  fts_cache_clear(cache);
+  fts_cache_init(cache);
+  rw_lock_x_unlock(&cache->init_lock);
+}

But the above solution creates dict_sys mutex hang and it leads to crash.

Because fts_cache_clear() needs dict_sys mutex to clear the cache and it holds
the cache init_lock. But dict_index_build_internal_fts() holds dict_sys mutex and
trying to acquire cache init lock. So it leads to hang.

Comment by Thirunarayanan Balathandayuthapani [ 2023-04-10 ]

Patch is in bb-10.4-MDEV-24011. This problem shouldn't exist in 10.5 onwards. Since fts_sync() takes mdl on the table. It shouldn't allow any DDL to happen on the same table.

Comment by Marko Mäkelä [ 2023-04-11 ]

OK to push after this has been tested by mleich.

Generated at Thu Feb 08 09:26:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.