[MDEV-25200] Index count mismatch due to aborted FULLTEXT INDEX Created: 2021-03-19  Updated: 2021-06-17  Resolved: 2021-03-30

Status: Closed
Project: MariaDB Server
Component/s: Full-text Search, Storage Engine - InnoDB
Affects Version/s: 10.0, 10.1, 10.2, 10.3, 10.4, 10.5, 10.6
Fix Version/s: 10.2.38, 10.3.29, 10.4.19, 10.5.10, 10.6.0

Type: Bug Priority: Critical
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 1
Labels: affects-tests, rr-profile-analyzed, upstream

Attachments: File CATCH001.yy    
Issue Links:
Duplicate
duplicates MDEV-23994 InnoDB: Table test/t4 contains 4 inde... Closed
Relates
relates to MDEV-25947 innodb_fts.misc_debug fails in buildbot Closed

 Description   

origin/10.6 4903031baa196dfc9a75638d141b515883cd254c 2021-03-18T17:05:31+02:00
Worflow:
1. Start server and generate some initial data
2. Two sessions run concurrent a DDL mix
     Session 1 runs random
     CHECK TABLE t1 |
     determine processlist id of session 2; KILL SOFT QUERY processlist id of session 2 ;
     Session 2 runs
     ALTER TABLE t1 ADD FULLTEXT KEY ( col_text ) ;
At some point of time session 1 runs CHECK TABLE t1 and harvests
    Warning InnoDB: Table test/t1 contains 3 indexes inside InnoDB, which is different from the number of indexes 2 defined in the MariaDB
 
RQG grammar for illustration
---------------------------------------------
thread1:
    CHECK TABLE t1 |
    COMMIT ; SELECT MIN(processlist_id) INTO @kill_id FROM rqg . rqg_sessions WHERE rqg_id <> _thread_id AND processlist_id IS NOT NULL ; COMMIT ; KILL SOFT QUERY @kill_id ;
thread1_connect:
    ;
thread1_init:
    CREATE TABLE t1 ( col1 INT, col_text TEXT ) ENGINE = InnoDB ROW_FORMAT = Dynamic ; ALTER TABLE t1 ADD COLUMN col1_derivate INT GENERATED ALWAYS AS (col1 * 2) ;
thread2:
    ALTER TABLE t1 ADD FULLTEXT KEY ( col_text ) ;
thread2_connect:
    REPLACE INTO rqg . rqg_sessions SET rqg_id = _thread_id , processlist_id = CONNECTION_ID(); COMMIT ;   # thread2 publishes his processlist_id
thread2_init:
    ;
query:   # Maybe existing but not needed sessions like thread3, thread4 ... just exit
    { exit 0 };
 
pluto:/data/Results/1616102002/CATCH-001/dev/shm/vardir/1616102002/8/1/rr
_RR_TRACE_DIR="." rr replay --mark-stdio
Hint: The rr trace ends with receiving the SIGKILL sent by RQG because of the problem above.
 
RQG
====
git clone https://github.com/mleich1/rqg --branch experimental RQG
 
perl rqg.pl \
--duration=100 \
--queries=10000000 \
--no_mask \
--seed=random \
--gendata=conf/mariadb/table_stress.zz \
--gendata_sql=conf/mariadb/table_stress.sql \
--engine=InnoDB \
--rpl_mode=none \
--mysqld=--log-bin \
--mysqld=--loose-debug_assert_on_not_freed_memory=0 \
--mysqld=--innodb-buffer-pool-size=24M \
--mysqld=--loose-idle_write_transaction_timeout=0 \
--mysqld=--loose-idle_readonly_transaction_timeout=0 \
--mysqld=--log-output=none \
--mysqld=--lock-wait-timeout=86400 \
--mysqld=--loose-idle_transaction_timeout=0 \
--mysqld=--innodb-lock-wait-timeout=50 \
--mysqld=--slave_net_timeout=60 \
--mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
--mysqld=--innodb_page_size=64K \
--mysqld=--loose-table_lock_wait_timeout=50 \
--mysqld=--net_read_timeout=30 \
--mysqld=--connect_timeout=60 \
--mysqld=--innodb_stats_persistent=off \
--mysqld=--interactive_timeout=28800 \
--mysqld=--loose-max-statement-time=3 \
--mysqld=--wait_timeout=28800 \
--mysqld=--net_write_timeout=60 \
--mysqld=--plugin-load-add=file_key_management.so \
--mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
--mysqld=--loose_innodb_use_native_aio=0 \
--mysqld=--log_bin_trust_function_creators=1 \
--reporters=Backtrace,Deadlock1,ErrorLog \
--validators=None \
--threads=3 \
--grammar=CATCH-001.yy \
--workdir=<local settings> \
--vardir=<local settings> \
--mtr-build-thread=<local settings> \
--basedir1=<local settings> \
--script_debug=_nix_ \
--rr=Extended \
--rr_options=--chaos



 Comments   
Comment by Matthias Leich [ 2021-03-19 ]

Error pattern
[ 'CATCH-0001' , 'indexes inside InnoDB, which is different from the number of indexes' ],

Comment by Marko Mäkelä [ 2021-03-19 ]

The scenario looks like the following:

  1. An operation to CREATE FULLTEXT INDEX is aborted after a stub has been added to the data dictionary with an index name that starts with the 0xff byte (which is an invalid UTF-8 sequence).
  2. The table definition is reloaded to the dict_sys cache (during a subsequent ALTER TABLE). But, dict_index_build_internal_fts() will not set the dict_index_t::uncommitted flag.
  3. Finally, on ha_innobase::open() we will complain about the different number of indexes, because the incomplete fulltext index is wrongly counted as an existing one.

I will try to create a single-threaded deterministic test case for this.

Comment by Marko Mäkelä [ 2021-03-19 ]

There is some incorrect error handling that we inherited from MySQL 5.6.

--source include/have_innodb.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
 
--source include/count_sessions.inc
 
--echo #
--echo # MDEV-25200 Index count mismatch due to aborted FULLTEXT INDEX
--echo #
 
CREATE TABLE t1(b TEXT, c TEXT, FULLTEXT INDEX(b)) ENGINE=InnoDB;
connect(con1,localhost,root,,test);
let $ID= `SELECT @id := CONNECTION_ID()`;
SET DEBUG_SYNC='innodb_inplace_alter_table_enter SIGNAL s1 WAIT_FOR g1';
SET DEBUG_SYNC='innodb_commit_inplace_alter_table_enter SIGNAL s2 WAIT_FOR g2';
send ALTER TABLE t1 ADD FULLTEXT KEY(c);
 
connection default;
SET DEBUG_SYNC='now WAIT_FOR s1';
let $ignore= `SELECT @id := $ID`;
KILL QUERY @id;
SET DEBUG_SYNC='now SIGNAL g1 WAIT_FOR s2';
START TRANSACTION;
SELECT * FROM t1;
SET DEBUG_SYNC='now SIGNAL s2';
 
connection con1;
--error ER_QUERY_INTERRUPTED
reap;
disconnect con1;
 
connection default;
SET DEBUG_SYNC=RESET;
 
# Exploit MDEV-17468 to force the table definition to be reloaded
ALTER TABLE t1 ADD bl INT AS (LENGTH(b)) VIRTUAL;
CHECK TABLE t1;
DROP TABLE t1;
 
--source include/wait_until_count_sessions.inc

I can repeat the failure with the above test on both 10.2 and 10.6. Sample output from the end of the test:

10.2 4e825b0e8ae07e1e847cbbc3c5b7203ae5b96a89

ALTER TABLE t1 ADD bl INT AS (LENGTH(b)) VIRTUAL;
Warnings:
Warning	1082	InnoDB: Table test/t1 contains 2 indexes inside InnoDB, which is different from the number of indexes 1 defined in the MariaDB 
CHECK TABLE t1;
Table	Op	Msg_type	Msg_text
test.t1	check	status	OK

As far as I can tell, the problem is that row_merge_drop_indexes() invokes dict_index_remove_from_cache() but does not remove the fulltext index stub from SYS_INDEXES and does not set the table->drop_aborted flag. Here is a start of a fix:

diff --git a/storage/innobase/row/row0merge.cc b/storage/innobase/row/row0merge.cc
index 3de85f024a3..c811bf7c467 100644
--- a/storage/innobase/row/row0merge.cc
+++ b/storage/innobase/row/row0merge.cc
@@ -3752,42 +3752,19 @@ row_merge_drop_indexes(
 					/* Do nothing to already
 					published indexes. */
 				} else if (index->type & DICT_FTS) {
-					/* Drop a completed FULLTEXT
-					index, due to a timeout during
-					MDL upgrade for
-					commit_inplace_alter_table().
-					Because only concurrent reads
-					are allowed (and they are not
-					seeing this index yet) we
-					are safe to drop the index. */
-					dict_index_t* prev = UT_LIST_GET_PREV(
-						indexes, index);
-					/* At least there should be
-					the clustered index before
-					this one. */
-					ut_ad(prev);
 					ut_a(table->fts);
 					fts_drop_index(table, index, trx);
-					/* We can remove a DICT_FTS
-					index from the cache, because
-					we do not allow ADD FULLTEXT INDEX
-					with LOCK=NONE. If we allowed that,
-					we should exclude FTS entries from
-					prebuilt->ins_node->entry_list
-					in ins_node_create_entry_list(). */
 #ifdef BTR_CUR_HASH_ADAPT
 					ut_ad(!index->search_info->ref_count);
 #endif /* BTR_CUR_HASH_ADAPT */
-					dict_index_remove_from_cache(
-						table, index);
-					index = prev;
+					index->type |= DICT_CORRUPT;
+					table->drop_aborted = TRUE;
 				} else {
 					rw_lock_x_lock(
 						dict_index_get_lock(index));
 					dict_index_set_online_status(
 						index, ONLINE_INDEX_ABORTED);
 					index->type |= DICT_CORRUPT;
-					table->drop_aborted = TRUE;
 					goto drop_aborted;
 				}
 				continue;

This is not enough, because we will get an assertion failure later, in dict_table_check_for_dup_indexes().

Generated at Thu Feb 08 09:35:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.