[MDEV-29144] ER_TABLE_SCHEMA_MISMATCH or InnoDB: Failing assertion: index != 0 upon IMPORT TABLESPACE Created: 2022-07-20  Updated: 2023-01-09  Resolved: 2022-12-09

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Locking, Storage Engine - InnoDB
Affects Version/s: 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.11.2, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Critical
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: corruption, rr-profile-analyzed

Issue Links:
Problem/Incident
causes MDEV-30364 Assertion `thd->mdl_context.is_lock_o... Confirmed
Relates
relates to MDEV-11658 Simpler, faster IMPORT of InnoDB tables Open
relates to MDEV-28989 Replacing index using default non-cop... Open

 Description   

The test case is highly non-determinstic, run with a high value of repeat=N. It usually fails for me within several dozen attempts, but it can vary a lot on different machines and builds.

The compound block in the test case is irrelevant to the problem, it is used to get better concurrency within MTR. Instead, the same results can be achieved by running ALTER concurrently with a single UPDATE (send ALTER => run UPDATE => reap ALTER), but then it takes longer to reproduce.

The failure happens only when ALTER ends with ER_LOCK_WAIT_TIMEOUT. However, it doesn't always happen when ALTER ends with the timeout; apparently, a much thiner concurrency is involved, at least I couldn't serialize it neither on the SQL level nor via existing sync points in sql_table/sql_base.

--source include/have_innodb.inc
--source include/have_sequence.inc
 
CREATE TABLE t (pk int, c varchar(1024), primary key (pk)) ENGINE=InnoDB CHARSET latin1;
INSERT INTO t SELECT seq, 'x' FROM seq_1_to_100;
 
--connect (con1,localhost,root,,test)
--delimiter $
--send
  BEGIN NOT ATOMIC
    DECLARE a INT DEFAULT 0;
    REPEAT
      SET a= a+1;
      UPDATE t SET c = 'xx' WHERE pk = a;
    UNTIL a = 100
    END REPEAT;
  END
$
--delimiter ;
 
--connection default  
--error 0,ER_LOCK_WAIT_TIMEOUT
ALTER TABLE t NOWAIT ADD INDEX (c);
 
--connection con1
--reap
 
--connection default
 
--let $datadir= `select @@datadir`
 
FLUSH TABLE t FOR EXPORT;
--let $create= query_get_value(SHOW CREATE TABLE t, Create Table, 1)
--echo $create
--copy_file $datadir/test/t.cfg $MYSQL_TMP_DIR/t.cfg
--copy_file $datadir/test/t.ibd $MYSQL_TMP_DIR/t.ibd
UNLOCK TABLES;
 
DROP TABLE t;
eval $create;
 
ALTER TABLE t DISCARD TABLESPACE;
--move_file $MYSQL_TMP_DIR/t.cfg $datadir/test/t.cfg
--move_file $MYSQL_TMP_DIR/t.ibd $datadir/test/t.ibd
ALTER TABLE t IMPORT TABLESPACE;
 
# Cleanup
DROP TABLE t;

So, the test creates/populates a table, then runs some DML on it and inplace ALTER .. ADD INDEX in parallel. ADD INDEX fails, the table seemingly doesn't get the index (neither show create nor I_S tables show it), but when the tablespace is imported in an identical table, it complains about an extra index in meta-data:

10.4 8911823f, similar on 10.3-10.10

mysqltest: At line 45: query 'ALTER TABLE t IMPORT TABLESPACE' failed: 1808: Schema mismatch (Number of indexes don't match, table has 1 indexes but the tablespace meta-data file has 2 indexes)

If we modify the test case slightly, removing this two lines:

DROP TABLE t;
eval $create;

– that is, instead of re-creating the table from scratch, we re-import the tablespace into the original one – it causes an assertion failure instead:

10.4 8911823f, similar on 10.4-10.10

2022-07-21 00:15:14 0x7f5ae5fdb700  InnoDB: Assertion failure in file /data/src/10.4/storage/innobase/row/row0import.cc line 1403
InnoDB: Failing assertion: index != 0
 
#6  0x000056260b659a5e in ut_dbg_assertion_failed (expr=0x56260bc7c087 "index != 0", file=0x56260bc7b768 "/data/src/10.4/storage/innobase/row/row0import.cc", line=1403) at /data/src/10.4/storage/innobase/ut/ut0dbg.cc:60
No locals.
#7  0x000056260b53d87c in row_import::set_root_by_name (this=0x7f5ae5fd8ec0) at /data/src/10.4/storage/innobase/row/row0import.cc:1403
        index = 0x0
        index_name = 0x7f5a940616d0 "c"
        i = 1
        cfg_index = 0x7f5a94061920
#8  0x000056260b546713 in row_import_for_mysql (table=0x7f5a940d50e0, prebuilt=0x7f5a980326e0) at /data/src/10.4/storage/innobase/row/row0import.cc:4452
        err = DB_SUCCESS
        trx = 0x7f5ae6a74218
        autoinc = 0
        filepath = 0x0
        space_flags = 140028382393816
        __PRETTY_FUNCTION__ = "dberr_t row_import_for_mysql(dict_table_t*, row_prebuilt_t*)"
        cfg = {m_table = 0x7f5a940d50e0, m_version = 1, m_hostname = 0x7f5a941a2dc0 "tomahawk-d11", m_table_name = 0x7f5a940a1560 "test/t", m_autoinc = 0, m_zip_size = 0, m_flags = 33, m_n_cols = 5, m_cols = 0x7f5a94061600, m_col_names = 0x7f5a940d6040, m_n_indexes = 2, m_indexes = 0x7f5a940618a0, m_missing = false}
        converter = {<AbstractCallback> = {_vptr.AbstractCallback = 0x7f5ae5fd9090, m_zip_size = 140028567049672, m_file = {m_file = 0, m_psi = 0x7f5ae5fd8f58}, m_filepath = 0x0, m_trx = 0x56260c0e2638, m_space = 130843, m_size = 3858599840, m_xdes = 0x56260b9f1a2d "MDL_context::acquire_lock", m_xdes_page_no = 0, m_space_flags = 140028382384032}, m_cfg = 0x100000b8b93bc, m_index = 0x7f5ae5fd8fc0, m_current_lsn = 94721402604039, m_page_zip_ptr = 0x7f5ae5fd8fc0, m_rec_iter = {m_cur = {index = 0x7f5a94000cd0, rec = 0x7f5a94000cd0 "\353\255/\250\374\177", offsets = 0x194000cb0, block = 0x7f5ae5fd9200}}, m_offsets_ = {64633, 2957, 0, 0, 61930, 2969, 22054, 0, 0, 0, 0, 0, 36864, 58877, 32602, 0, 37820, 2955, 0, 1, 36896, 58877, 32602, 0, 2567, 2958, 22054, 0, 36896, 58877, 32602, 0, 3280, 37888, 32602, 0, 3280, 37888, 32602, 0, 3248, 37888, 1, 0, 37472, 58877, 32602, 0, 64633, 2957, 0, 0, 19880, 2971, 22054, 0, 0, 0, 0, 0, 36960, 58877, 32602, 0, 37820, 2955, 0, 1, 36992, 58877, 32602, 0, 2567, 2958, 22054, 0, 36992, 58877, 32602, 0, 3280, 37888, 32602, 0, 3280, 37888, 32602, 0, 3248, 37888, 1, 0, 37568, 58877, 32602, 0, 64633, 2957, 22054, 0, 37056, 58877, 32602, 0, 37600, 58877, 0, 0, 54505, 3008, 22054, 0, 0, 0, 0, 0, 37072, 58877, 32602, 0, 37820, 2955, 0, 0, 19880, 2971, 22054, 0, 0, 0, 0, 0, 37104, 58877, 32602, 0, 37820, 2955, 0, 1, 37136, 58877, 32602, 0, 2567, 2958, 22054, 0, 37136, 58877, 32602, 0, 3280, 37888, 32602, 0, 3280, 37888, 32602, 0, 3248, 37888, 1, 0, 37712, 58877, 32602, 0, 64633, 2957, 22054, 0, 37200, 58877, 32602, 0, 37744, 58877, 0, 0, 54505, 3008, 22054, 0, 0, 0, 0, 0, 37216, 58877, 32602, 0, 37820, 2955, 0, 1, 37248, 58877, 32602, 0...}, m_offsets = 0x7f5ae5fd9240, m_heap = 0x100000b8b93bc, m_cluster_index = 0x7f5ae5fd9260}
        fsp_flags = 0
        index = 0x13f008eec457be00
#9  0x000056260b3ddfe5 in ha_innobase::discard_or_import_tablespace (this=0x7f5a98031638, discard=0 '\000') at /data/src/10.4/storage/innobase/handler/ha_innodb.cc:13160
        _db_stack_frame_ = {func = 0x56260b9cfbb0 "mysql_discard_or_import_tablespace", file = 0x56260b9cdd38 "/data/src/10.4/sql/sql_table.cc", level = 2147483654, line = -1, prev = 0x7f5ae5fd9a60}
        err = DB_SUCCESS
#10 0x000056260b0020bd in handler::ha_discard_or_import_tablespace (this=0x7f5a98031638, discard=0 '\000') at /data/src/10.4/sql/handler.cc:4626
        __PRETTY_FUNCTION__ = "int handler::ha_discard_or_import_tablespace(my_bool)"
#11 0x000056260ada94f7 in mysql_discard_or_import_tablespace (thd=0x7f5a94000d90, table_list=0x7f5a94014888, discard=false) at /data/src/10.4/sql/sql_table.cc:6093
        alter_prelocking_strategy = {<Prelocking_strategy> = {_vptr.Prelocking_strategy = 0x56260c12c628 <vtable for Alter_table_prelocking_strategy+16>}, <No data fields>}
        error = 6740
        _db_stack_frame_ = {func = 0x56260b9b1930 "mysql_execute_command", file = 0x56260b9b0cc8 "/data/src/10.4/sql/sql_parse.cc", level = 2147483653, line = -1, prev = 0x7f5ae5fd9c80}
#12 0x000056260ae4f77a in Sql_cmd_discard_import_tablespace::execute (this=0x7f5a94014f58, thd=0x7f5a94000d90) at /data/src/10.4/sql/sql_alter.cc:557
        select_lex = 0x7f5a940054c8
        table_list = 0x7f5a94014888
#13 0x000056260acc6bfe in mysql_execute_command (thd=0x7f5a94000d90) at /data/src/10.4/sql/sql_parse.cc:6192
        res = 0
        up_result = 0
        lex = 0x7f5a94004c00
        select_lex = 0x7f5a940054c8
        first_table = 0x7f5a94014888
        all_tables = 0x7f5a94014888
        unit = 0x7f5a94004cc0
        have_table_map_for_update = false
        rpl_filter = 0x56260b11b07d <inline_mysql_mutex_unlock(mysql_mutex_t*, char const*, uint)+107>
        _db_stack_frame_ = {func = 0x56260b9b2bb8 "mysql_parse", file = 0x56260b9b0cc8 "/data/src/10.4/sql/sql_parse.cc", level = 2147483652, line = -1, prev = 0x7f5ae5fda250}
        __PRETTY_FUNCTION__ = "int mysql_execute_command(THD*)"
        ots = {ctx = 0x7f5a940048e0, traceable = false}
        trace_command = {<Json_writer_struct> = {_vptr.Json_writer_struct = 0x56260c12a510 <vtable for Json_writer_object+16>, my_writer = 0x0, context = {writer = 0x0}, closed = false}, <No data fields>}
        trace_command_steps = {<Json_writer_struct> = {_vptr.Json_writer_struct = 0x56260c12a4f0 <vtable for Json_writer_array+16>, my_writer = 0x0, context = {writer = 0x0}, closed = false}, <No data fields>}
        orig_binlog_format = BINLOG_FORMAT_MIXED
        orig_current_stmt_binlog_format = BINLOG_FORMAT_STMT
#14 0x000056260accbefa in mysql_parse (thd=0x7f5a94000d90, rawbuf=0x7f5a940147a8 "ALTER TABLE t IMPORT TABLESPACE", length=31, parser_state=0x7f5ae5fda410, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:7995
        found_semicolon = 0x0
        error = 32602
        lex = 0x7f5a94004c00
        err = false
        _db_stack_frame_ = {func = 0x56260b9b113a "dispatch_command", file = 0x56260b9b0cc8 "/data/src/10.4/sql/sql_parse.cc", level = 2147483651, line = -1, prev = 0x7f5ae5fda3f0}
        __PRETTY_FUNCTION__ = "void mysql_parse(THD*, char*, uint, Parser_state*, bool, bool)"
#15 0x000056260acb839a in dispatch_command (command=COM_QUERY, thd=0x7f5a94000d90, packet=0x7f5a9400ac01 "ALTER TABLE t IMPORT TABLESPACE", packet_length=31, is_com_multi=false, is_next_command=false) at /data/src/10.4/sql/sql_parse.cc:1857
        packet_end = 0x7f5a940147c7 ""
        parser_state = {m_lip = {lookahead_token = -1, lookahead_yylval = 0x0, m_thd = 0x7f5a94000d90, m_ptr = 0x7f5a940147c8 "\004", m_tok_start = 0x7f5a940147c8 "\004", m_tok_end = 0x7f5a940147c8 "\004", m_end_of_query = 0x7f5a940147c7 "", m_tok_start_prev = 0x7f5a940147c7 "", m_buf = 0x7f5a940147a8 "ALTER TABLE t IMPORT TABLESPACE", m_buf_length = 31, m_echo = true, m_echo_saved = 12, m_cpp_buf = 0x7f5a94014820 "ALTER TABLE t IMPORT TABLESPACE", m_cpp_ptr = 0x7f5a9401483f "", m_cpp_tok_start = 0x7f5a9401483f "", m_cpp_tok_start_prev = 0x7f5a9401483f "", m_cpp_tok_end = 0x7f5a9401483f "", m_body_utf8 = 0x0, m_body_utf8_ptr = 0x56260b8e0a07 <DoTrace+150> "\203\370\005\177\063\205\300\017\210\255", m_cpp_utf8_processed_ptr = 0x0, next_state = MY_LEX_END, found_semicolon = 0x0, ignore_space = false, stmt_prepare_mode = false, multi_statements = true, yylineno = 1, m_digest = 0x0, in_comment = NO_COMMENT, in_comment_saved = (DISCARD_COMMENT | unknown: 0x5624), m_cpp_text_start = 0x7f5a9401482c "t IMPORT TABLESPACE", m_cpp_text_end = 0x7f5a9401482d " IMPORT TABLESPACE", m_underscore_cs = 0x0}, m_yacc = {yacc_yyss = 0x0, yacc_yyvs = 0x0, m_set_signal_info = {m_item = {0x0 <repeats 12 times>}}, m_lock_type = TL_READ_DEFAULT, m_mdl_type = MDL_SHARED_READ}, m_digest_psi = 0x7f5a940046e0}
        net = 0x7f5a940010b8
        error = false
        do_end_of_statement = true
        _db_stack_frame_ = {func = 0x56260b9b0ec5 "do_command", file = 0x56260b9b0cc8 "/data/src/10.4/sql/sql_parse.cc", level = 2147483650, line = -1, prev = 0x7f5ae5fdacb0}
        drop_more_results = false
        __PRETTY_FUNCTION__ = "bool dispatch_command(enum_server_command, THD*, char*, uint, bool, bool)"
        __FUNCTION__ = "dispatch_command"
        res = <optimized out>
#16 0x000056260acb6c4d in do_command (thd=0x7f5a94000d90) at /data/src/10.4/sql/sql_parse.cc:1378
        return_value = false
        packet = 0x7f5a9400ac00 "\003ALTER TABLE t IMPORT TABLESPACE"
        packet_length = 32
        net = 0x7f5a940010b8
        command = COM_QUERY
        _db_stack_frame_ = {func = 0x56260bd6c368 "?func", file = 0x56260bd6c36e "?file", level = 2147483649, line = -1, prev = 0x0}
        __PRETTY_FUNCTION__ = "bool do_command(THD*)"
        __FUNCTION__ = "do_command"
#17 0x000056260ae45fbd in do_handle_one_connection (connect=0x56260e24e440) at /data/src/10.4/sql/sql_connect.cc:1420
        create_user = true
        thr_create_utime = 4773826449734
        thd = 0x7f5a94000d90
        __PRETTY_FUNCTION__ = "void do_handle_one_connection(CONNECT*)"
#18 0x000056260ae45c65 in handle_one_connection (arg=0x56260e24e440) at /data/src/10.4/sql/sql_connect.cc:1316
        connect = 0x56260e24e440
#19 0x000056260b362f9d in pfs_spawn_thread (arg=0x56260e1a2180) at /data/src/10.4/storage/perfschema/pfs.cc:1869
        typed_arg = 0x56260e1a2180
        user_arg = 0x56260e24e440
        user_start_routine = 0x56260ae45c35 <handle_one_connection(void*)>
        pfs = 0x7f5aef156b40
        klass = 0x56260dc7ac00
#20 0x00007f5af1321ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140028382394112, -738925536457634580, 140723130171934, 140723130171935, 140028382391936, 311296, 795735088686583020, 795779544398187756}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#21 0x00007f5af0f1edef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

On 10.3 the same test case variation causes a different assertion failure:

10.3 18488048

2022-07-21 00:16:41 0x7fefbfe69700  InnoDB: Assertion failure in file /data/src/10.3/storage/innobase/dict/dict0dict.cc line 1802
InnoDB: Failing assertion: table->get_ref_count() == 0
 
#6  0x0000565287c1b0ea in ut_dbg_assertion_failed (expr=0x565288352506 "table->get_ref_count() == 0", file=0x5652883526a0 "/data/src/10.3/storage/innobase/dict/dict0dict.cc", line=1802) at /data/src/10.3/storage/innobase/ut/ut0dbg.cc:60
#7  0x0000565287cf5481 in dict_table_remove_from_cache_low (table=0x7fef6c1902e0, lru_evict=0) at /data/src/10.3/storage/innobase/dict/dict0dict.cc:1802
#8  0x0000565287cf5ea5 in dict_table_remove_from_cache (table=0x7fef6c1902e0) at /data/src/10.3/storage/innobase/dict/dict0dict.cc:1906
#9  0x00005652879a37f7 in ha_innobase::discard_or_import_tablespace (this=0x7fef7002d548, discard=0 '\000') at /data/src/10.3/storage/innobase/handler/ha_innodb.cc:13164
#10 0x000056528776fda7 in handler::ha_discard_or_import_tablespace (this=0x7fef7002d548, discard=0 '\000') at /data/src/10.3/sql/handler.cc:4587
#11 0x0000565287522265 in mysql_discard_or_import_tablespace (thd=0x7fef6c000d90, table_list=0x7fef6c012bb8, discard=false) at /data/src/10.3/sql/sql_table.cc:6070
#12 0x00005652875bdb40 in Sql_cmd_discard_import_tablespace::execute (this=0x7fef6c013220, thd=0x7fef6c000d90) at /data/src/10.3/sql/sql_alter.cc:553
#13 0x0000565287449bc8 in mysql_execute_command (thd=0x7fef6c000d90) at /data/src/10.3/sql/sql_parse.cc:6075
#14 0x000056528744f1aa in mysql_parse (thd=0x7fef6c000d90, rawbuf=0x7fef6c012ad8 "ALTER TABLE t IMPORT TABLESPACE", length=31, parser_state=0x7fefbfe685b0, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:7870
#15 0x000056528743b9df in dispatch_command (command=COM_QUERY, thd=0x7fef6c000d90, packet=0x7fef6c008f31 "ALTER TABLE t IMPORT TABLESPACE", packet_length=31, is_com_multi=false, is_next_command=false) at /data/src/10.3/sql/sql_parse.cc:1852
#16 0x000056528743a39d in do_command (thd=0x7fef6c000d90) at /data/src/10.3/sql/sql_parse.cc:1398
#17 0x00005652875b7956 in do_handle_one_connection (connect=0x565289905b30) at /data/src/10.3/sql/sql_connect.cc:1403
#18 0x00005652875b76c1 in handle_one_connection (arg=0x565289905b30) at /data/src/10.3/sql/sql_connect.cc:1308
#19 0x0000565287f66dd0 in pfs_spawn_thread (arg=0x565289942040) at /data/src/10.3/storage/perfschema/pfs.cc:1869
#20 0x00007fefcae79ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#21 0x00007fefcada9def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95



 Comments   
Comment by Elena Stepanova [ 2022-07-20 ]

On a related note, InnoDB is generally very verbose in the error log upon discard/import tablespace, here is the log of one repetition of the above when it ends normally:

10.10

2022-07-21  0:27:28 64 [Note] InnoDB: Sync to disk of `test`.`t` started.
2022-07-21  0:27:28 64 [Note] InnoDB: Stopping purge
2022-07-21  0:27:28 64 [Note] InnoDB: Writing table metadata to './test/t.cfg'
2022-07-21  0:27:28 64 [Note] InnoDB: Table `test`.`t` flushed to disk
2022-07-21  0:27:28 64 [Note] InnoDB: Deleting the meta-data file './test/t.cfg'
2022-07-21  0:27:28 64 [Note] InnoDB: Resuming purge
2022-07-21  0:27:28 64 [Note] InnoDB: Importing tablespace for table 'test/t' that was exported from host 'tomahawk-d11'
2022-07-21  0:27:28 64 [Note] InnoDB: Phase I - Update all pages
2022-07-21  0:27:28 64 [Note] InnoDB: Sync to disk
2022-07-21  0:27:28 64 [Note] InnoDB: Sync to disk - done!
2022-07-21  0:27:28 64 [Note] InnoDB: Phase III - Flush changes to disk
2022-07-21  0:27:28 64 [Note] InnoDB: Phase IV - Flush complete

But when a problem occurs, it isn't really helpful at all:

2022-07-21  0:27:29 69 [Note] InnoDB: Sync to disk of `test`.`t` started.
2022-07-21  0:27:29 69 [Note] InnoDB: Stopping purge
2022-07-21  0:27:29 69 [Note] InnoDB: Writing table metadata to './test/t.cfg'
2022-07-21  0:27:29 69 [Note] InnoDB: Table `test`.`t` flushed to disk
2022-07-21  0:27:29 69 [Note] InnoDB: Deleting the meta-data file './test/t.cfg'
2022-07-21  0:27:29 69 [Note] InnoDB: Resuming purge
2022-07-21  0:27:29 69 [Note] InnoDB: Importing tablespace for table 'test/t' that was exported from host 'tomahawk-d11'
2022-07-21  0:27:29 69 [Note] InnoDB: Discarding tablespace of table `test`.`t`: Generic error

It could be useful if instead of (or at least in addition to) printing all the lines when things go well, it were more elaborate when things go wrong.

Comment by Marko Mäkelä [ 2022-07-25 ]

elenst, can you please produce one https://rr-project.org trace of a crash?

Comment by Marko Mäkelä [ 2022-07-25 ]

After I removed the two statements from the test, I got a nice crash on 10.3, also with ./mtr --rr. I determined the root cause of it to be a thread that will fail to release a table handle:

continue
up 4
watch -l table.n_ref_count
command 1
when
continue
end
run

From this, I got a long log. In my trace, the interesting part starts when the reference count was initially incremented from 0 to 1 and never back again.

For me, the last thread to increment the count from 0 to 1 correspondended to connection default in the test, executing the INSERT. The table handle was not released, because ha_innobase::close() was never called between that and the next statement of that connection:

--error 0,ER_LOCK_WAIT_TIMEOUT
ALTER TABLE t NOWAIT ADD INDEX (c);

This statement would fail due to a metadata lock timeout:

10.3 8aa37c264f1c67beb9dc5cd4127379f0e596e3f5

#0  my_error (nr=1205, MyFlags=0) at /mariadb/10.3/mysys/my_error.c:109
#1  0x000055cd518dbc93 in MDL_context::acquire_lock (
    this=this@entry=0x7f0ac4017148, 
    mdl_request=mdl_request@entry=0x7f0ad40f1a60, 
    lock_wait_timeout=lock_wait_timeout@entry=0)
    at /mariadb/10.3/sql/mdl.cc:2095
#2  0x000055cd518dc1fc in MDL_context::upgrade_shared_lock (
    this=this@entry=0x7f0ac4017148, 
    mdl_ticket=mdl_ticket@entry=0x7f0ac4089e50, 
    new_type=new_type@entry=MDL_EXCLUSIVE, lock_wait_timeout=0)
    at /mariadb/10.3/sql/mdl.cc:2336
#3  0x000055cd518794b6 in mysql_inplace_alter_table (
    thd=thd@entry=0x7f0ac4017008, table_list=0x7f0ac4022df0, 
    table=0x7f0ac4097bc8, altered_table=altered_table@entry=0x7f0ac409c148, 
    ha_alter_info=ha_alter_info@entry=0x7f0ad40f2450, 
    target_mdl_request=target_mdl_request@entry=0x7f0ad40f25d0, 
    alter_ctx=0x7f0ad40f2b80) at /mariadb/10.3/sql/sql_table.cc:7679
#4  0x000055cd5188169c in mysql_alter_table (thd=thd@entry=0x7f0ac4017008, 
    new_db=new_db@entry=0x7f0ac401b550, 
    new_name=new_name@entry=0x7f0ac401b938, 
    create_info=create_info@entry=0x7f0ad40f3730, table_list=<optimized out>, 
    table_list@entry=0x7f0ac4022df0, 
    alter_info=alter_info@entry=0x7f0ad40f3670, order_num=0, order=0x0, 
    ignore=false) at /mariadb/10.3/sql/sql_table.cc:10111
#5  0x000055cd518d7d6d in Sql_cmd_alter_table::execute (this=<optimized out>, 
    thd=0x7f0ac4017008) at /mariadb/10.3/sql/sql_alter.cc:512
#6  0x000055cd517f7cae in mysql_execute_command (thd=thd@entry=0x7f0ac4017008)
    at /mariadb/10.3/sql/sql_parse.cc:6075
#7  0x000055cd517f8dd3 in mysql_parse (thd=thd@entry=0x7f0ac4017008, 
    rawbuf=<optimized out>, length=<optimized out>, 
    parser_state=parser_state@entry=0x7f0ad40f4550, 
    is_com_multi=is_com_multi@entry=false, 
    is_next_command=is_next_command@entry=false)
    at /mariadb/10.3/sql/sql_parse.cc:7870
#8  0x000055cd517fa492 in dispatch_command (command=command@entry=COM_QUERY, 
    thd=thd@entry=0x7f0ac4017008, 
    packet=packet@entry=0x7f0ac4009ed9 "ALTER TABLE t NOWAIT ADD INDEX (c)", 
    packet_length=packet_length@entry=34, 

I do not know the purpose of the NOWAIT, but I can imagine that it is to request that the operation be refused if a metadata lock is not immediately available. The timeout will occur after the index was already successfully created, when attempting to acquire an exclusive metadata lock for replacing the table definition.

There was no call to ha_innobase::close() in the entire run from the start to the crash.

If I add a FLUSH TABLES statement right after the ALTER TABLE t NOWAIT, the assertion failure on the table reference count would go away, but occasionally the assertion index != 0 would fail during IMPORT TABLESPACE. It took several attempts to reproduce that under rr. The reason why the assertion fails is that dict_table_get_index_on_name() will pretend that the index c does not exist, because the index was marked as uncommitted, in that very same ALTER TABLE t NOWAIT ADD INDEX (c) that had timed out. There was a call handler::ha_commit_inplace_alter_table(…, commit=false) to roll back the change in InnoDB, but that call would not remove the stub for the uncommitted index, because the reference count is nonzero.

The index != 0 assertion fails because row_quiesce_write_indexes() wrongly writes information about the uncommitted index c. I think that it would be cleanest to assert that no references to the table exist at that point, and to remove any stubs for uncommitted indexes before writing to the file. But, I do not think that it can be fixed before the missing handler::close() call has been added.

Comment by Elena Stepanova [ 2022-09-13 ]

I will for now assume that other variations of ER_TABLE_SCHEMA_MISMATCH happening upon IMPORT have the same root cause, e.g.

1808: Schema mismatch (Index fld field fld prefix len 0 doesn't match metadata file value 1)

etc.
It's too expensive to dig into each of them separately, especially since the chance they are going to get fix any time soon seems to be slim.
After this one is fixed, we'll see if any of the other ones are left.

Comment by Sergei Golubchik [ 2022-10-23 ]

With breakpoints on open and close (in rr trace and the original test case), I see

  • ha_innobase::open() for INSERT
  • ha_innobase::open() for UPDATE in the compound statement
  • two ha_innobase::close() for DROP
  • ha_innobase::open() for ALTER TABLE t DISCARD TABLESPACE
    and then the "Schema mismatch" happens.

If the test is modified to trigger an assert (removing the line with DROP) then, indeed, there will be no ha_innobase::close().

I've added this close(), but the behavior hasn't changed, still the error (or the assert) happens as before.
The patch I used was

--- a/sql/sql_table.cc
+++ b/sql/sql_table.cc
@@ -6069,6 +6069,8 @@ int mysql_discard_or_import_tablespace(THD *thd,
   }
 
   error= table_list->table->file->ha_discard_or_import_tablespace(discard);
+  tdc_remove_table(thd, TDC_RT_REMOVE_NOT_OWN, table_list->table->s->db.str,
+                   table_list->table->s->table_name.str, FALSE);
 
   THD_STAGE_INFO(thd, stage_end);

Comment by Marko Mäkelä [ 2022-11-15 ]

If I add the call that serg highlighted (with the last parameter bool kill_delayed_threads=true because I believe that it is more appropriate), then a number of tests will start to fail like this:

10.3 df4c3d96a4aba0e5a9af3bd5158cd2049a978130 (patched)

@@ -65,6 +65,8 @@
 restore: t4 .ibd and .cfg files
 restore: t5 .ibd and .cfg files
 ALTER TABLE t1 IMPORT TABLESPACE;
+Warnings:
+Warning	1814	Tablespace has been discarded for table `t1`
 SHOW CREATE TABLE t1;
 Table	Create Table
 t1	CREATE TABLE `t1` (

On any other form of DDL or DML on the table than ALTER TABLE…IMPORT TABLESPACE or DROP TABLE it makes sense to issue these warnings. How could IMPORT TABLESPACE be detected by ha_innobase::open()?

	if ((ib_table->flags2 & DICT_TF2_DISCARDED)) {
 
		ib_senderrf(thd,
			IB_LOG_LEVEL_WARN, ER_TABLESPACE_DISCARDED,
			table->s->table_name.str);
 
		/* Allow an open because a proper DISCARD should have set
		all the flags and index root page numbers to FIL_NULL that
		should prevent any DML from running but it should allow DDL
		operations. */

Note: There are two problems that we have to fix here: The never-released table reference, and the incorrect metadata written by row_quiesce_write_indexes(). For fixing the table->get_ref_count() assertion, some change outside InnoDB is needed.

Comment by Marko Mäkelä [ 2022-11-15 ]

The following addresses the reference-count problem without affecting the output of too many tests:

diff --git a/mysql-test/suite/encryption/r/innodb-bad-key-change2.result b/mysql-test/suite/encryption/r/innodb-bad-key-change2.result
index 5c28203ecf8..50da3dc7b2b 100644
--- a/mysql-test/suite/encryption/r/innodb-bad-key-change2.result
+++ b/mysql-test/suite/encryption/r/innodb-bad-key-change2.result
@@ -44,7 +44,6 @@ ALTER TABLE t1 DISCARD TABLESPACE;
 restore: t1 .ibd and .cfg files
 ALTER TABLE t1 DISCARD TABLESPACE;
 Warnings:
-Warning	1814	Tablespace has been discarded for table `t1`
 Warning	1812	Tablespace is missing for table 'test/t1'
 restore: t1 .ibd and .cfg files
 ALTER TABLE t1 IMPORT TABLESPACE;
diff --git a/sql/sql_table.cc b/sql/sql_table.cc
index 413d78ae814..c6e36df7224 100644
--- a/sql/sql_table.cc
+++ b/sql/sql_table.cc
@@ -6075,6 +6075,9 @@ int mysql_discard_or_import_tablespace(THD *thd,
   if (unlikely(error))
     goto err;
 
+  tdc_remove_table(thd, TDC_RT_REMOVE_NOT_OWN, table_list->table->s->db.str,
+                   table_list->table->s->table_name.str, true);
+
   /*
     The 0 in the call below means 'not in a transaction', which means
     immediate invalidation; that is probably what we wish here
diff --git a/storage/innobase/handler/ha_innodb.cc b/storage/innobase/handler/ha_innodb.cc
index cd2d36df112..9bcecf9bba4 100644
--- a/storage/innobase/handler/ha_innodb.cc
+++ b/storage/innobase/handler/ha_innodb.cc
@@ -6268,11 +6268,6 @@ ha_innobase::open(const char* name, int, uint)
 	MONITOR_INC(MONITOR_TABLE_OPEN);
 
 	if ((ib_table->flags2 & DICT_TF2_DISCARDED)) {
-
-		ib_senderrf(thd,
-			IB_LOG_LEVEL_WARN, ER_TABLESPACE_DISCARDED,
-			table->s->table_name.str);
-
 		/* Allow an open because a proper DISCARD should have set
 		all the flags and index root page numbers to FIL_NULL that
 		should prevent any DML from running but it should allow DDL
@@ -13631,6 +13626,12 @@ int ha_innobase::truncate()
 	if (ib_table->is_temporary()) {
 		info.options|= HA_LEX_CREATE_TMP_TABLE;
 	} else {
+		if (!ib_table->space) {
+			ib_senderrf(m_user_thd,
+				    IB_LOG_LEVEL_WARN, ER_TABLESPACE_DISCARDED,
+				    table->s->table_name.str);
+		}
+
 		dict_get_and_save_data_dir_path(ib_table, false);
 	}
 
diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc
index 0cba7d953d1..e0dd6689c70 100644
--- a/storage/innobase/handler/handler0alter.cc
+++ b/storage/innobase/handler/handler0alter.cc
@@ -1013,6 +1013,12 @@ ha_innobase::check_if_supported_inplace_alter(
 
 	update_thd();
 
+	if (!m_prebuilt->table->space) {
+		ib_senderrf(m_user_thd, IB_LOG_LEVEL_WARN,
+			    ER_TABLESPACE_DISCARDED,
+			    table->s->table_name.str);
+	}
+
 	if (ha_alter_info->handler_flags
 	    & ~(INNOBASE_INPLACE_IGNORE
 		| INNOBASE_ALTER_INSTANT

We already issue such warnings in many ha_innobase member functions that will be invoked on DML. To compensate for the removal of the warning from ha_innobase::open(), we issue the warning on TRUNCATE and native ALTER TABLE, which would always be preceded by a call to ha_innobase::open().

Comment by Marko Mäkelä [ 2022-11-15 ]

The above patch indeed seems to make the crash go away. Occasionally the test fails as follows, which should be easy to fix:

10.3 df4c3d96a4aba0e5a9af3bd5158cd2049a978130 (patched)

mysqltest: At line 45: query 'ALTER TABLE t IMPORT TABLESPACE' failed: 1808: Schema mismatch (Number of indexes don't match, table has 1 indexes but the tablespace meta-data file has 2 indexes)

The following additional patch fixes this:

diff --git a/storage/innobase/row/row0quiesce.cc b/storage/innobase/row/row0quiesce.cc
index 94a372bd046..5af026ecbe8 100644
--- a/storage/innobase/row/row0quiesce.cc
+++ b/storage/innobase/row/row0quiesce.cc
@@ -106,11 +106,17 @@ row_quiesce_write_indexes(
 	FILE*			file,	/*!< in: file to write to */
 	THD*			thd)	/*!< in/out: session */
 {
+	ulint n_indexes = 0;
+	for (const dict_index_t* index = UT_LIST_GET_FIRST(table->indexes);
+	     index; index = UT_LIST_GET_NEXT(indexes, index)) {
+		n_indexes += index->is_committed();
+	}
+
 	{
 		byte		row[sizeof(ib_uint32_t)];
 
 		/* Write the number of indexes in the table. */
-		mach_write_to_4(row, UT_LIST_GET_LEN(table->indexes));
+		mach_write_to_4(row, n_indexes);
 
 		DBUG_EXECUTE_IF("ib_export_io_write_failure_11",
 				close(fileno(file)););
@@ -132,6 +138,12 @@ row_quiesce_write_indexes(
 	     index != 0 && err == DB_SUCCESS;
 	     index = UT_LIST_GET_NEXT(indexes, index)) {
 
+		if (!index->is_committed()) {
+			continue;
+		}
+
+		ut_ad(n_indexes); ut_d(n_indexes--);
+
 		byte*		ptr;
 		byte		row[sizeof(index_id_t)
 				    + sizeof(ib_uint32_t) * 8];
@@ -202,6 +214,7 @@ row_quiesce_write_indexes(
 		err = row_quiesce_write_index_fields(index, file, thd);
 	}
 
+	ut_ad(!n_indexes);
 	return(err);
 }
 

After this, the only failure seems to be the following (due to the nondeterministic nature of the test):

@@ -18,14 +18,16 @@
 CREATE TABLE `t` (
   `pk` int(11) NOT NULL,
   `c` varchar(1024) DEFAULT NULL,
-  PRIMARY KEY (`pk`)
+  PRIMARY KEY (`pk`),
+  KEY `c` (`c`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci
 UNLOCK TABLES;
 DROP TABLE t;
 CREATE TABLE `t` (
   `pk` int(11) NOT NULL,
   `c` varchar(1024) DEFAULT NULL,
-  PRIMARY KEY (`pk`)
+  PRIMARY KEY (`pk`),
+  KEY `c` (`c`)
 ) ENGINE=InnoDB DEFAULT CHARSET=latin1 COLLATE=latin1_swedish_ci;
 ALTER TABLE t DISCARD TABLESPACE;
 ALTER TABLE t IMPORT TABLESPACE;

After I disabled that part of the test output, it passed 15×1000 rounds:

The servers were restarted 0 times
Spent 1106.831 of 195 seconds executing testcases
 
Completed: All 15000 tests were successful.

Comment by Sergei Golubchik [ 2022-11-24 ]

I don't quite understand why DISCARD TABLESPACE have to do TDC_RT_REMOVE_NOT_OWN there and, for example, IMPORT doesn't.

But anyway, it's InnoDB-only feature at the moment, so let's do what InnoDB needs. If we ever get another storage engine that can discard/import tablespaces — then we can reconsider, but until then it's pointless to try to guess what it'll need.

sql_table.cc change is ok to push

Comment by Marko Mäkelä [ 2022-12-09 ]

Thank you. The call to ha_innobase::close() is only required on ALTER TABLE…DISCARD TABLESPACE. Omitting the condition if (discard) would not cause any failures of the regression test suite (nor repeated runs of the test of this fix), but I wanted to play it safe and only implement the necessary change.

I hope that one day MDEV-11658 will provide an alternative to the problematic InnoDB data file import workflow, which in its current form requires the user to break the referential integrity of the data dictionary.

Generated at Thu Feb 08 10:06:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.