[MDEV-28897] Wrong table.get_ref_count() upon concurrent truncate and backup stage operation Created: 2022-06-19  Updated: 2022-06-28  Resolved: 2022-06-28

Status: Closed
Project: MariaDB Server
Component/s: Backup, Partitioning
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Michael Widenius
Resolution: Fixed Votes: 0
Labels: None

Attachments: Text File threads.txt    

 Description   

The test case is very non-deterministic. Sometimes it fails for me consistently in 20-30 attempts, and suddenly it needs hundreds, on the same build, same machine. So, allow high enough repeat=N and try several times before giving up.

I am not sure whether encryption, partitioning, or backup are necessary there, but I couldn't get rid of any of them. Also, couldn't increase the probability of the failure by, for example, populating the table with a larger amount of data.

Run with

--mem --mysqld=--innodb-encrypt-tables --mysqld=--file-key-management --mysqld=--file-key-management-filename=`pwd`/std_data/keys.txt --mysqld=--plugin-load-add=file_key_management --repeat=1000

--source include/have_innodb.inc
--source include/have_partition.inc
--source include/have_file_key_management.inc
 
CREATE TABLE t (a INT, b INT, c INT, d INT) ENGINE=InnoDB PARTITION BY HASH(a) PARTITIONS 8;
INSERT INTO t (a) VALUES (1),(2),(3),(4),(5),(6),(7),(8);
 
--connect (con1,localhost,root,,test)
BACKUP STAGE START;
--send
  BACKUP STAGE BLOCK_COMMIT;
 
--connection default
--send
  TRUNCATE TABLE t;
 
--connection con1
--reap
 
--connection default
--reap
 
# Cleanup
--connection con1
BACKUP STAGE END;
DROP TABLE t;

10.6 0e4cf497 debug

10.6/storage/innobase/dict/drop.cc:155: dberr_t trx_t::drop_table(const dict_table_t&): Assertion `table.get_ref_count() <= 1' failed.
220619 21:52:47 [ERROR] mysqld got signal 6 ;
 
#7  0x00007ff23df24662 in __GI___assert_fail (assertion=0x55a9d84cd165 "table.get_ref_count() <= 1", file=0x55a9d84cce88 "/data/src/10.6/storage/innobase/dict/drop.cc", line=155, function=0x55a9d84cd120 "dberr_t trx_t::drop_table(const dict_table_t&)") at assert.c:101
No locals.
#8  0x000055a9d7e72cd5 in trx_t::drop_table (this=0x7ff238557680, table=@0x7ff1dc154348: {id = 254, id_hash = 0x0, name = {m_name = 0x7ff1dc0cc948 "test/#sql-ib254"}, name_hash = 0x0, heap = 0x7ff1dc158188, data_dir_path = 0x0, space = 0x7ff1dc247848, space_id = 241, flags = 33, flags2 = 80, skip_alter_undo = 0, file_unreadable = 0, cached = 1, n_def = 7, n_cols = 7, n_t_cols = 7, n_t_def = 7, n_v_def = 0, n_v_cols = 0, persistent_autoinc = 0, can_be_evicted = 1, corrupted = 0, drop_aborted = 0, cols = 0x7ff1dc209ba8, v_cols = 0x7ff1dc209c88, s_cols = 0x0, instant = 0x0, col_names = 0x7ff1dc209ce0 "a", v_col_names = 0x0, vers_start = 0, vers_end = 0, is_system_db = false, dict_frm_mismatch = DICT_FRM_CONSISTENT, fts_doc_id_index = 0x0, indexes = {count = 1, start = 0x7ff1dc23d908, end = 0x7ff1dc23d908, node = &dict_index_t::indexes, init = 51966}, freed_indexes = {count = 0, start = 0x0, end = 0x0, node = &dict_index_t::indexes, init = 51966}, foreign_list = {count = 0, start = 0x0, end = 0x0, node = &dict_foreign_t::heap, init = 0}, referenced_list = {count = 0, start = 0x0, end = 0x0, node = &dict_foreign_t::heap, init = 0}, table_LRU = {prev = 0x7ff1dc20c6a8, next = 0x0}, fk_max_recusive_level = 0 '\000', def_trx_id = 753, bulk_trx_id = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 0}, <No data fields>}}, mdl_name = {m_name = 0x7ff1dc209d08 "test/t#P#p0"}, foreign_set = std::set with 0 elements, referenced_set = std::set with 0 elements, stat_initialized = 1, stats_last_recalc = 1655664767, stat_persistent = 0, stats_auto_recalc = 0, stats_sample_pages = 0, stat_n_rows = 1, stat_clustered_index_size = 1, stat_sum_of_other_index_sizes = 0, stat_modified_counter = 0, stats_error_printed = false, autoinc_lock = 0x7ff1dc209c88, autoinc_mutex = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0}, <No data fields>}}, lock_mutex = {lock = {<std::__atomic_base<unsigned int>> = {_M_i = 0}, <No data fields>}}, lock_mutex_owner = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 0}, <No data fields>}}, autoinc = 0, autoinc_trx = {m = {_M_b = {_M_p = 0x0}}}, n_waiting_or_granted_auto_inc_locks = 0, n_lock_x_or_s = 1, fts = 0x0, quiesce = QUIESCE_NONE, n_rec_locks = {m_counter = {<std::__atomic_base<unsigned int>> = {_M_i = 0}, <No data fields>}}, n_ref_count = {m_counter = {<std::__atomic_base<unsigned int>> = {_M_i = 2}, <No data fields>}}, locks = {count = 1, start = 0x7ff238557b80, end = 0x7ff238557b80, node = &lock_table_t::locks, init = 51966}, update_time = {m = {<std::__atomic_base<long>> = {_M_i = 1655664767}, <No data fields>}}, query_cache_inv_trx_id = {m = {<std::__atomic_base<unsigned long>> = {_M_i = 771}, <No data fields>}}, magic_n = 76333786, vc_templ = 0x0}) at /data/src/10.6/storage/innobase/dict/drop.cc:155
        __PRETTY_FUNCTION__ = "dberr_t trx_t::drop_table(const dict_table_t&)"
        found_x = false
        info = 0x7ff23caaf440
#9  0x000055a9d7b1e317 in ha_innobase::truncate (this=0x7ff1dc1ce7b0) at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:13999
        _db_stack_frame_ = {func = 0x55a9d8335ba8 "ha_partition::truncate", file = 0x55a9d83340e0 "/data/src/10.6/sql/ha_partition.cc", level = 2147483657, line = -1, prev = 0x7ff238053b90}
        info = {<Table_scope_and_contents_source_st> = {<Table_scope_and_contents_source_pod_st> = {alter_table_convert_to_charset = 0x0, tabledef_version = {str = 0x0, length = 0}, org_tabledef_version = {str = 0x0, length = 0}, connect_string = {str = 0x0, length = 0}, comment = {str = 0x7ff1dc0facf0 "", length = 0}, alias = {str = 0x0, length = 0}, org_storage_engine_name = {str = 0x0, length = 0}, new_storage_engine_name = {str = 0x0, length = 0}, password = 0x0, tablespace = 0x0, data_file_name = 0x0, index_file_name = 0x0, max_rows = 0, min_rows = 0, auto_increment_value = 0, table_options = 8, avg_row_length = 0, used_fields = 0, key_block_size = 0, expression_length = 0, field_check_constraints = 0, stats_sample_pages = 0, null_bits = 0, options = 0, merge_insert_method = 0, extra_size = 0, db_type = 0x0, row_type = ROW_TYPE_DYNAMIC, transactional = HA_CHOICE_UNDEF, storage_media = HA_SM_DEFAULT, page_checksum = HA_CHOICE_UNDEF, option_list = 0x7ff1dc0b49b0, stats_auto_recalc = HA_STATS_AUTO_RECALC_DEFAULT, varchar = false, sequence = false, check_constraint_list = 0x0, option_struct = 0x0, fields_option_struct = 0x0, indexes_option_struct = 0x0, table = 0x0, pos_in_locked_tables = 0x0, merge_list = 0x0, mdl_ticket = 0x0, table_was_deleted = false, seq_create_info = 0x0}, vers_info = {<Table_period_info> = {<Sql_alloc> = {<No data fields>}, name = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x55a9d80ba326 "SYSTEM_TIME", length = 11}, <No data fields>}, <No data fields>}, period = {start = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}, end = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}}, create_if_not_exists = false, constr = 0x0, unique_keys = 0}, as_row = {start = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}, end = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}}, versioned_fields = false, unversioned_fields = false}, period_info = {<Sql_alloc> = {<No data fields>}, name = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}, period = {start = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}, end = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}}, create_if_not_exists = false, constr = 0x0, unique_keys = 0}}, <Schema_specification_st> = {default_table_charset = 0x55a9d8b6a600 <my_charset_latin1>, schema_comment = 0x0}, alter_info = 0x0}
        ib_table = 0x7ff1dc154348
        stored_lock = LOCK_X
        trx = 0x7ff238557680
        __PRETTY_FUNCTION__ = "virtual int ha_innobase::truncate()"
        heap = 0x7ff1dc04d7c8
        temp_name = 0x7ff1dc04d848 "test/#sql-ib254"
        name = 0x7ff1dc04d870 "test/t#P#p0"
        table_stats = 0x0
        index_stats = 0x0
        mdl_table = 0x0
        mdl_index = 0x0
        error = DB_SUCCESS
        fts = false
        err = 21929
#10 0x000055a9d7705358 in handler::ha_truncate (this=0x7ff1dc1ce7b0) at /data/src/10.6/sql/handler.cc:5069
        __PRETTY_FUNCTION__ = "int handler::ha_truncate()"
#11 0x000055a9d7a1b9b7 in ha_partition::truncate (this=0x7ff1dc0ec610) at /data/src/10.6/sql/ha_partition.cc:4846
        error = 32753
        file = 0x7ff1dc1936c8
        _db_stack_frame_ = {func = 0x55a9d8133270 "Sql_cmd_truncate_table::handler_truncate", file = 0x55a9d8133248 "/data/src/10.6/sql/sql_truncate.cc", level = 2147483656, line = -1, prev = 0x7ff238053c20}
#12 0x000055a9d7705358 in handler::ha_truncate (this=0x7ff1dc0ec610) at /data/src/10.6/sql/handler.cc:5069
        __PRETTY_FUNCTION__ = "int handler::ha_truncate()"
#13 0x000055a9d755b7ab in Sql_cmd_truncate_table::handler_truncate (this=0x7ff1dc014900, thd=0x7ff1dc000db8, table_ref=0x7ff1dc014218, is_tmp_table=false) at /data/src/10.6/sql/sql_truncate.cc:255
        error = 0
        flags = 2
        table = 0x7ff1dc077cd8
        _db_stack_frame_ = {func = 0x55a9d81334a0 "Sql_cmd_truncate_table::truncate_table", file = 0x55a9d8133248 "/data/src/10.6/sql/sql_truncate.cc", level = 2147483655, line = -1, prev = 0x7ff238053d50}
        __PRETTY_FUNCTION__ = "Sql_cmd_truncate_table::truncate_result Sql_cmd_truncate_table::handler_truncate(THD*, TABLE_LIST*, bool)"
#14 0x000055a9d755c347 in Sql_cmd_truncate_table::truncate_table (this=0x7ff1dc014900, thd=0x7ff1dc000db8, table_ref=0x7ff1dc014218) at /data/src/10.6/sql/sql_truncate.cc:507
        hton_can_recreate = false
        error = 32753
        binlog_stmt = 220
        _db_stack_frame_ = {func = 0x55a9d8133550 "Sql_cmd_truncate_table::execute", file = 0x55a9d8133248 "/data/src/10.6/sql/sql_truncate.cc", level = 2147483654, line = -1, prev = 0x7ff238053db0}
        __PRETTY_FUNCTION__ = "bool Sql_cmd_truncate_table::truncate_table(THD*, TABLE_LIST*)"
#15 0x000055a9d755c598 in Sql_cmd_truncate_table::execute (this=0x7ff1dc014900, thd=0x7ff1dc000db8) at /data/src/10.6/sql/sql_truncate.cc:573
        res = true
        table = 0x7ff1dc014218
        _db_stack_frame_ = {func = 0x55a9d80c7cd0 "mysql_execute_command", file = 0x55a9d80c6fa8 "/data/src/10.6/sql/sql_parse.cc", level = 2147483653, line = -1, prev = 0x7ff238053f70}
#16 0x000055a9d736da62 in mysql_execute_command (thd=0x7ff1dc000db8, is_called_from_prepared_stmt=false) at /data/src/10.6/sql/sql_parse.cc:5996
        res = 0
        up_result = 0
        lex = 0x7ff1dc005058
        select_lex = 0x7ff1dc005918
        first_table = 0x7ff1dc014218
        all_tables = 0x7ff1dc014218
        unit = 0x7ff1dc005120
        have_table_map_for_update = false
        rpl_filter = 0x32
        _db_stack_frame_ = {func = 0x55a9d80c9037 "mysql_parse", file = 0x55a9d80c6fa8 "/data/src/10.6/sql/sql_parse.cc", level = 2147483652, line = -1, prev = 0x7ff238054380}
        __PRETTY_FUNCTION__ = "int mysql_execute_command(THD*, bool)"
        ots = {ctx = 0x7ff1dc004d20, traceable = false}
        orig_binlog_format = BINLOG_FORMAT_MIXED
        orig_current_stmt_binlog_format = BINLOG_FORMAT_STMT
#17 0x000055a9d7373ab0 in mysql_parse (thd=0x7ff1dc000db8, rawbuf=0x7ff1dc014160 "TRUNCATE TABLE t", length=16, parser_state=0x7ff238054500) at /data/src/10.6/sql/sql_parse.cc:8029
        found_semicolon = 0x0
        error = 32753
        lex = 0x7ff1dc005058
        err = false
        _db_stack_frame_ = {func = 0x55a9d80c7521 "dispatch_command", file = 0x55a9d80c6fa8 "/data/src/10.6/sql/sql_parse.cc", level = 2147483651, line = -1, prev = 0x7ff2380544e0}
        __PRETTY_FUNCTION__ = "void mysql_parse(THD*, char*, uint, Parser_state*)"
#18 0x000055a9d7360113 in dispatch_command (command=COM_QUERY, thd=0x7ff1dc000db8, packet=0x7ff1dc1e9799 "TRUNCATE TABLE t", packet_length=16, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1896
        packet_end = 0x7ff1dc014170 ""
        parser_state = {m_lip = {lookahead_token = -1, lookahead_yylval = 0x0, m_thd = 0x7ff1dc000db8, m_ptr = 0x7ff1dc014171 "\004", m_tok_start = 0x7ff1dc014171 "\004", m_tok_end = 0x7ff1dc014171 "\004", m_end_of_query = 0x7ff1dc014170 "", m_tok_start_prev = 0x7ff1dc014170 "", m_buf = 0x7ff1dc014160 "TRUNCATE TABLE t", m_buf_length = 16, m_echo = true, m_echo_saved = false, m_cpp_buf = 0x7ff1dc0141c8 "TRUNCATE TABLE t", m_cpp_ptr = 0x7ff1dc0141d8 "", m_cpp_tok_start = 0x7ff1dc0141d8 "", m_cpp_tok_start_prev = 0x7ff1dc0141d8 "", m_cpp_tok_end = 0x7ff1dc0141d8 "", m_body_utf8 = 0x0, m_body_utf8_ptr = 0x1dc000cb8 <error: Cannot access memory at address 0x1dc000cb8>, m_cpp_utf8_processed_ptr = 0x0, next_state = MY_LEX_END, found_semicolon = 0x0, ignore_space = false, stmt_prepare_mode = false, multi_statements = true, yylineno = 1, m_digest = 0x0, in_comment = NO_COMMENT, in_comment_saved = (unknown: 0x10000), m_cpp_text_start = 0x7ff1dc0141d7 "t", m_cpp_text_end = 0x7ff1dc0141d8 "", m_underscore_cs = 0x0}, m_yacc = {yacc_yyss = 0x0, yacc_yyvs = 0x0, m_set_signal_info = {m_item = {0x0 <repeats 12 times>}}, m_lock_type = TL_WRITE, m_mdl_type = MDL_EXCLUSIVE}, m_digest_psi = 0x7ff1dc004ab0}
        net = 0x7ff1dc0010f0
        error = false
        do_end_of_statement = true
        _db_stack_frame_ = {func = 0x55a9d80c71a5 "do_command", file = 0x55a9d80c6fa8 "/data/src/10.6/sql/sql_parse.cc", level = 2147483650, line = -1, prev = 0x7ff238054dc0}
        drop_more_results = false
        __PRETTY_FUNCTION__ = "dispatch_command_return dispatch_command(enum_server_command, THD*, char*, uint, bool)"
        __FUNCTION__ = "dispatch_command"
        res = <optimized out>
#19 0x000055a9d735eb0e in do_command (thd=0x7ff1dc000db8, blocking=true) at /data/src/10.6/sql/sql_parse.cc:1409
        return_value = DISPATCH_COMMAND_SUCCESS
        packet = 0x7ff1dc1e9798 "\003TRUNCATE TABLE t"
        packet_length = 17
        net = 0x7ff1dc0010f0
        command = COM_QUERY
        _db_stack_frame_ = {func = 0x55a9d851a480 "?func", file = 0x55a9d851a486 "?file", level = 2147483649, line = -1, prev = 0x0}
        __PRETTY_FUNCTION__ = "dispatch_command_return do_command(THD*, bool)"
        __FUNCTION__ = "do_command"
#20 0x000055a9d7518c68 in do_handle_one_connection (connect=0x55a9db415b08, put_in_cache=true) at /data/src/10.6/sql/sql_connect.cc:1418
        create_user = true
        thr_create_utime = 2086874911299
        thd = 0x7ff1dc000db8
        __PRETTY_FUNCTION__ = "void do_handle_one_connection(CONNECT*, bool)"
#21 0x000055a9d7518907 in handle_one_connection (arg=0x55a9db4fdea8) at /data/src/10.6/sql/sql_connect.cc:1312
        connect = 0x55a9db4fdea8
#22 0x000055a9d7a341ba in pfs_spawn_thread (arg=0x55a9db4fdf88) at /data/src/10.6/storage/perfschema/pfs.cc:2201
        typed_arg = 0x55a9db4fdf88
        user_arg = 0x55a9db4fdea8
        user_start_routine = 0x55a9d75188ac <handle_one_connection(void*)>
        pfs = 0x7ff23d82e9c0
        klass = 0x55a9dad5e900
#23 0x00007ff23e3f0ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140678298687232, 7731486039672482113, 140732267268766, 140732267268767, 140678298685312, 311296, -7732766744735782591, -7732753226343388863}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#24 0x00007ff23dfeddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

10.6 0e4cf497 non-debug

2022-06-19 21:55:39 0x7f3c17e69700  InnoDB: Assertion failure in file /data/src/10.6/storage/innobase/dict/dict0dict.cc line 1883
InnoDB: Failing assertion: table->get_ref_count() == 0
 
#5  0x00007f3c22b66537 in __GI_abort () at abort.c:79
        save_stage = 1
        act = {__sigaction_handler = {sa_handler = 0x7f3c22d005c0 <_IO_2_1_stderr_>, sa_sigaction = 0x7f3c22d005c0 <_IO_2_1_stderr_>}, sa_mask = {__val = {94102755632040, 139896258827424, 398, 398, 139896257518942, 21, 139896258823616, 398, 398, 1, 94102755632040, 139896258827424, 139896257471933, 725849473024, 139896258823840, 139896258827424}}, sa_flags = 582702978, sa_restorer = 0x55960152b7d0}
        sigs = {__val = {32, 0 <repeats 15 times>}}
#6  0x00005596008d9f1e in ut_dbg_assertion_failed (expr=expr@entry=0x55960152b77e "table->get_ref_count() == 0", file=file@entry=0x55960152b7d0 "/data/src/10.6/storage/innobase/dict/dict0dict.cc", line=line@entry=1883) at /data/src/10.6/storage/innobase/ut/ut0dbg.cc:60
No locals.
#7  0x00005596008e679a in dict_sys_t::remove (this=<optimized out>, table=<optimized out>, lru=lru@entry=false, keep=keep@entry=false) at /data/src/10.6/storage/innobase/dict/dict0dict.cc:1883
        foreign = <optimized out>
        index = <optimized out>
        id_hash = <optimized out>
        id_fold = <optimized out>
        freed = <optimized out>
#8  0x00005596010df054 in trx_t::commit (this=this@entry=0x7f3c1d197680, deleted=std::vector of length 0, capacity 0) at /data/src/10.6/storage/innobase/dict/drop.cc:271
        table = <optimized out>
        space = 0x7f3bc4062b58
        p = {first = 0x7f3bc40bb490, second = {first = 3, first_versioned = 9223372036854775808}}
        __for_range = std::map with 4 elements = {[0x559603dc37b0] = {first = 1, first_versioned = 18446744073709551615}, [0x559603dc43e0] = {first = 4, first_versioned = 18446744073709551615}, [0x559603dc5f50] = {first = 8, first_versioned = 18446744073709551615}, [0x7f3bc40bb490] = {first = 3, first_versioned = 9223372036854775808}}
        __for_begin = Python Exception <class 'gdb.error'> value has been optimized out: 
 
        __for_end = Python Exception <class 'gdb.error'> value has been optimized out: 
 
#9  0x0000559600f75f4c in ha_innobase::create (trx=0x7f3c1d197680, file_per_table=true, create_info=0x7f3c17e67650, form=<optimized out>, name=0x7f3bc4144468 "test/t#P#p0", this=0x7f3bc4144890) at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:13292
        deleted = std::vector of length 0, capacity 0
        info = {m_thd = 0x7f3bc4000c58, m_trx = 0x7f3c1d197680, m_form = 0x7f3bc4071ea8, m_default_row_format = 2, m_create_info = 0x7f3c17e67650, m_table_name = 0x7f3c17e67890 "test/t#P#p0", m_table = 0x7f3bc41483b0, m_remote_path = 0x7f3c17e67a90 "", m_innodb_file_per_table = true, m_allow_file_per_table = true, m_use_file_per_table = true, m_use_data_dir = false, m_flags = 33, m_flags2 = 80}
        own_trx = false
        norm_name = "test/t#P#p0\000\226U\000\000+q\257b\000\000\000\000\244\251\244*\000\000\000\000\340x\346\027<\177\000\000\302@\353\000\226U\000\000X\f\000\304;\177\000\000 I\000\304;\177\000\000\002\000\000\000\000\000\000\000\310\360\311\003\226U\000\000 y\346\027<\177\000\000\355\017\305\000\226U\000\000py\346\027<\177\000\000\000\000\000\000\000\030\365@0y\346\027<\177\000\000\300=\025\304;\177\000\000X\f\000\304;\177\000\000\200k\031\035<\177\000\000\200y\346\027<\177\000\000\343B\366\000\226U\000\000\200{\346\027<\177\000\000\214P\233\000\226U\000\000`y\346\027<\177\000\000\000\000\000\000\000\000\000\000\360b\000\304;\177\000\000"...
        remote_path = "\000z\346\027<\177\000\000\033\322\351\000\226U\000\000X\f\000\304;\177\000\000\240^\021\304;\177\000\000\220{\346\027<\177\000\000W\243\305\000\226U\000\000\a\000\000\000<\177\000\000\000\242>\300;\177\000\000\201\304\062\001\226U\000\000\000\254J\"<\177\000\000\367Y\177*\376\177\000\000\000\254J\"<\177\000\000\060{\346\027<\177\000\000\340%I\001\226U\000\000\000\000\000\000\000\000\000\000Ak\300\"<\177\000\000`{\346\027<\177\000\000@{\346\027<\177\000\000\000\000\000\000\000\000\000\000\223\265\033\001\226U\000\000\367Y\177*\376\177\000\000\001\000\000\000\000\000\000\000\200{\346\027<\177\000\000X\f\000\304;\177\000\000\001\000\000\000\000\000\000\000"...
        error = 0
        norm_name = {<optimized out> <repeats 512 times>}
        remote_path = {<optimized out> <repeats 512 times>}
        info = {m_thd = <optimized out>, m_trx = <optimized out>, m_form = <optimized out>, m_default_row_format = <optimized out>, m_create_info = <optimized out>, m_table_name = <optimized out>, m_table = <optimized out>, m_remote_path = <optimized out>, m_innodb_file_per_table = <optimized out>, m_allow_file_per_table = <optimized out>, m_use_file_per_table = <optimized out>, m_use_data_dir = <optimized out>, m_flags = <optimized out>, m_flags2 = <optimized out>}
        own_trx = <optimized out>
        error = <optimized out>
        error = <optimized out>
        err = <optimized out>
        deleted = Python Exception <class 'gdb.error'> value has been optimized out: 
 
        d = {m_file = <optimized out>, m_psi = <optimized out>}
        __for_range = <optimized out>
        __for_begin = Python Exception <class 'gdb.error'> value has been optimized out: 
 
        __for_end = Python Exception <class 'gdb.error'> value has been optimized out: 
 
#10 ha_innobase::truncate (this=0x7f3bc4144890) at /data/src/10.6/storage/innobase/handler/ha_innodb.cc:14019
        update_time = {m = {<std::__atomic_base<long>> = {_M_i = 1655664939}, <No data fields>}}
        stored_lock = LOCK_X
        def_trx_id = 2153
        info = {<Table_scope_and_contents_source_st> = {<Table_scope_and_contents_source_pod_st> = {alter_table_convert_to_charset = 0x0, tabledef_version = {str = 0x0, length = 0}, org_tabledef_version = {str = 0x0, length = 0}, connect_string = {str = 0x0, length = 0}, comment = {str = 0x7f3bc4186a90 "", length = 0}, alias = {str = 0x0, length = 0}, org_storage_engine_name = {str = 0x0, length = 0}, new_storage_engine_name = {str = 0x0, length = 0}, password = 0x0, tablespace = 0x0, data_file_name = 0x0, index_file_name = 0x0, max_rows = 0, min_rows = 0, auto_increment_value = 0, table_options = 8, avg_row_length = 0, used_fields = 0, key_block_size = 0, expression_length = 0, field_check_constraints = 0, stats_sample_pages = 0, null_bits = 0, options = 0, merge_insert_method = 0, extra_size = 0, db_type = 0x0, row_type = ROW_TYPE_DYNAMIC, transactional = HA_CHOICE_UNDEF, storage_media = HA_SM_DEFAULT, page_checksum = HA_CHOICE_UNDEF, option_list = 0x0, stats_auto_recalc = HA_STATS_AUTO_RECALC_DEFAULT, varchar = false, sequence = false, check_constraint_list = 0x0, option_struct = 0x0, fields_option_struct = 0x0, indexes_option_struct = 0x0, table = 0x0, pos_in_locked_tables = 0x0, merge_list = 0x0, mdl_ticket = 0x0, table_was_deleted = false, seq_create_info = 0x0}, vers_info = {<Table_period_info> = {<Sql_alloc> = {<No data fields>}, name = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x559601486669 "SYSTEM_TIME", length = 11}, <No data fields>}, <No data fields>}, period = {start = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}, end = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}}, create_if_not_exists = false, constr = 0x0, unique_keys = 0}, as_row = {start = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}, end = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}}, versioned_fields = false, unversioned_fields = false}, period_info = {<Sql_alloc> = {<No data fields>}, name = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}, period = {start = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}, end = {<Lex_cstring> = {<st_mysql_const_lex_string> = {str = 0x0, length = 0}, <No data fields>}, <No data fields>}}, create_if_not_exists = false, constr = 0x0, unique_keys = 0}}, <Schema_specification_st> = {default_table_charset = 0x559601b2f120 <my_charset_latin1>, schema_comment = 0x0}, alter_info = 0x0}
        ib_table = 0x7f3bc40bb490
        stored_lock = <optimized out>
        trx = <optimized out>
        heap = <optimized out>
        temp_name = <optimized out>
        name = <optimized out>
        table_stats = <optimized out>
        index_stats = <optimized out>
        mdl_table = 0x0
        mdl_index = 0x0
        error = <optimized out>
        fts = <optimized out>
        err = <optimized out>
#11 0x0000559600ea0112 in ha_partition::truncate (this=<optimized out>) at /data/src/10.6/sql/ha_partition.cc:4846
        error = <optimized out>
        file = 0x7f3bc404f928
#12 0x0000559600b4e13c in Sql_cmd_truncate_table::handler_truncate (this=<optimized out>, thd=0x7f3bc4000c58, table_ref=0x7f3bc4010948, is_tmp_table=<optimized out>) at /data/src/10.6/sql/sql_truncate.cc:255
        error = 0
        flags = <optimized out>
        table = <optimized out>
#13 0x0000559600b4ee58 in Sql_cmd_truncate_table::truncate_table (this=0x7f3bc4011030, thd=0x7f3bc4000c58, table_ref=0x7f3bc4010948) at /data/src/10.6/sql/sql_truncate.cc:507
        hton_can_recreate = false
        error = <optimized out>
        binlog_stmt = <optimized out>
#14 0x0000559600b4f0ae in Sql_cmd_truncate_table::execute (this=0x7f3bc4011030, thd=0x7f3bc4000c58) at /data/src/10.6/sql/sql_truncate.cc:573
        res = true
        table = 0x7f3bc4010948
#15 0x0000559600a222de in mysql_execute_command (thd=0x7f3bc4000c58, is_called_from_prepared_stmt=<optimized out>) at /data/src/10.6/sql/sql_parse.cc:5996
        multi_delete_error = <optimized out>
        wsrep_error_label = <optimized out>
        error = <optimized out>
        res = 0
        up_result = 0
        lex = 0x7f3bc4004d38
        select_lex = 0x7f3bc40055f8
        first_table = 0x7f3bc4010948
        all_tables = 0x7f3bc4010948
        unit = 0x7f3bc4004e00
        have_table_map_for_update = false
        rpl_filter = <optimized out>
        ots = {ctx = 0x7f3bc4004a08, traceable = false}
        orig_binlog_format = BINLOG_FORMAT_MIXED
        orig_current_stmt_binlog_format = BINLOG_FORMAT_STMT
#16 0x0000559600a234b7 in mysql_parse (thd=0x7f3bc4000c58, rawbuf=<optimized out>, length=<optimized out>, parser_state=<optimized out>) at /data/src/10.6/sql/sql_parse.cc:8029
        found_semicolon = <optimized out>
        error = <optimized out>
        lex = 0x7f3bc4004d38
        err = <optimized out>
#17 0x0000559600a25d7d in dispatch_command (command=COM_QUERY, thd=0x7f3bc4000c58, packet=<optimized out>, packet_length=<optimized out>, blocking=<optimized out>) at /data/src/10.6/sql/sql_class.h:234
        packet_end = 0x7f3bc40108a0 ""
        parser_state = {m_lip = {lookahead_token = -1, lookahead_yylval = 0x0, m_thd = 0x7f3bc4000c58, m_ptr = 0x7f3bc40108a1 "\004", m_tok_start = 0x7f3bc40108a1 "\004", m_tok_end = 0x7f3bc40108a1 "\004", m_end_of_query = 0x7f3bc40108a0 "", m_tok_start_prev = 0x7f3bc40108a0 "", m_buf = 0x7f3bc4010890 "TRUNCATE TABLE t", m_buf_length = 16, m_echo = true, m_echo_saved = false, m_cpp_buf = 0x7f3bc40108f8 "TRUNCATE TABLE t", m_cpp_ptr = 0x7f3bc4010908 "", m_cpp_tok_start = 0x7f3bc4010908 "", m_cpp_tok_start_prev = 0x7f3bc4010908 "", m_cpp_tok_end = 0x7f3bc4010908 "", m_body_utf8 = 0x0, m_body_utf8_ptr = 0x0, m_cpp_utf8_processed_ptr = 0x0, next_state = MY_LEX_END, found_semicolon = 0x0, ignore_space = false, stmt_prepare_mode = false, multi_statements = true, yylineno = 1, m_digest = 0x0, in_comment = NO_COMMENT, in_comment_saved = NO_COMMENT, m_cpp_text_start = 0x7f3bc4010907 "t", m_cpp_text_end = 0x7f3bc4010908 "", m_underscore_cs = 0x0}, m_yacc = {yacc_yyss = 0x0, yacc_yyvs = 0x0, m_set_signal_info = {m_item = {0x0 <repeats 12 times>}}, m_lock_type = TL_WRITE, m_mdl_type = MDL_EXCLUSIVE}, m_digest_psi = 0x7f3bc4004798}
        net = 0x7f3bc4000f08
        error = false
        do_end_of_statement = true
        drop_more_results = false
        __FUNCTION__ = "dispatch_command"
        res = <optimized out>
#18 0x0000559600a27a32 in do_command (thd=0x7f3bc4000c58, blocking=blocking@entry=true) at /data/src/10.6/sql/sql_parse.cc:1409
        return_value = <optimized out>
        packet = 0x7f3bc40cae18 "\003TRUNCATE TABLE t"
        packet_length = <optimized out>
        net = 0x7f3bc4000f08
        command = COM_QUERY
        __FUNCTION__ = "do_command"
#19 0x0000559600b3286f in do_handle_one_connection (connect=<optimized out>, put_in_cache=true) at /data/src/10.6/sql/sql_connect.cc:1418
        create_user = true
        thr_create_utime = <optimized out>
        thd = 0x7f3bc4000c58
#20 0x0000559600b32bbd in handle_one_connection (arg=arg@entry=0x55960417af28) at /data/src/10.6/sql/sql_connect.cc:1312
        connect = 0x55960417af28
#21 0x0000559600eb3f22 in pfs_spawn_thread (arg=0x5596040da0f8) at /data/src/10.6/storage/perfschema/pfs.cc:2201
        typed_arg = 0x5596040da0f8
        user_arg = 0x55960417af28
        user_start_routine = 0x559600b32b60 <handle_one_connection(void*)>
        pfs = <optimized out>
        klass = <optimized out>
#22 0x00007f3c23027ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
        ret = <optimized out>
        pd = <optimized out>
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {139896075753216, 142484558547500365, 140729611361150, 140729611361151, 139896075751296, 311296, -35435892223787699, -35339439623783091}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#23 0x00007f3c22c3edef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Reproducible on 10.6+, couldn't reproduce on 10.5.

All threads' stack trace from 10.6 debug is attached.



 Comments   
Comment by Elena Stepanova [ 2022-06-19 ]

I suppose it may turn out the other case of InnoDB being just a messenger, but since I can't say what it is trying to say and to whom, InnoDB will have to be the first suspect.

Comment by Marko Mäkelä [ 2022-06-20 ]

I reproduced a crash on a -DPLUGIN_PERFSCHEMA=NO -DCMAKE_BUILD_TYPE=Debug build, while threads.txt includes the PERFORMANCE_SCHEMA instrumentation. In my crash, the only occurrence of the dict_table_t pointer was in the crashing thread. I couldn’t get it to crash under rr record easily, but I did not try very long.

In threads.txt, the pointer (dict_table_t*) 0x7ff2c0b0d7f8 for id=1118, corresponding to partition p0 of the table, occurs not only in the crashing Thread 1 but also in Thread 25, which looks strange to me:

10.6 0e4cf497ca11a7298e2bd896cb594bd52085a1d4

#8  0x0000561a9ffe1a43 in dict_sys_t::freeze (this=0x561aa0c6f280 <dict_sys>, file=0x561aa0668fc0 "/data/src/10.6/storage/innobase/dict/dict0dict.cc", line=1060) at /data/src/10.6/storage/innobase/dict/dict0dict.cc:1023
        __PRETTY_FUNCTION__ = "void dict_sys_t::freeze(const char*, unsigned int)"
#9  0x0000561a9ffe1cb9 in dict_table_open_on_name (table_name=0x7ff3200b7450 "test/t#P#p1", dict_locked=false, ignore_err=DICT_ERR_IGNORE_FK_NOKEY) at /data/src/10.6/storage/innobase/dict/dict0dict.cc:1060
#15 0x0000561a9f8ac5d8 in handler::ha_open (this=0x7ff2c40f8660, table_arg=0x7ff2c405ac68, name=0x7ff2c0c5bbb0 "./test/t", mode=2, test_if_locked=8194, mem_root=0x0, partitions_to_open=0x0) at /data/src/10.6/sql/handler.cc:3334
#16 0x0000561a9f669ebb in open_table_from_share (thd=0x7ff2c4000db8, share=0x7ff2c0c5b610, alias=0x561aa0aa8140 <empty_clex_str>, db_stat=1, prgflag=0, ha_open_flags=8194, outparam=0x7ff2c405ac68, is_create_table=false, partitions_to_open=0x0) at /data/src/10.6/sql/table.cc:4413
#17 0x0000561a9f462756 in flush_tables (thd=0x7ff2c4000db8, flag=FLUSH_NON_TRANS_TABLES) at /data/src/10.6/sql/sql_base.cc:609
#18 0x0000561a9f7d6755 in backup_block_ddl (thd=0x7ff2c4000db8) at /data/src/10.6/sql/backup.cc:287
#19 0x0000561a9f7d61b2 in run_backup_stage (thd=0x7ff2c4000db8, stage=BACKUP_LOCK_COMMIT) at /data/src/10.6/sql/backup.cc:121

Why would BACKUP STAGE BLOCK_COMMIT cause a table to be opened? We see that this thread is blocked, waiting to open partition p1. It probably already (wrongly) succeeded in opening an old copy of the partition p0, which TRUNCATE is concurrently trying to drop.

Comment by Michael Widenius [ 2022-06-27 ]

Backup_block_ddl works the following way:

  • Upgrade global backup lock stage to MDL_BACKUP_WAIT_FLUSH. This blocks using non
    transactional and waits until all request to these are done.
  • Call flush_tables() to get rid of all not used tables from the cache and ensure that non transactional
    tables are flushed.
  • Upgrade lock to MDL_BACKUP_WAIT_DDL, which blocks ddl.

In other words, the crash happens before we have blocked TRUNCATE.
It is also very likely not related to BACKUP LOCKS but to FLUSH TABLES and GLOBAL READ LOCK.

What happens is this:

  • Loop over all cached table shares
  • If there is an available open table in the share us it, If not, then open another instant
  • Call FLUSH on the open table
  • Close the table if it was not open before.

The question here is why does innodb get a ref_count_error when calling open_table_from_share().
Is the table locked somehow in InnoDB for multiple opens?

In the flush code we are not trying to take a DDL lock on the table we want to flush as we are not going to do any other operation on the table than call EXTRA_FLUSH and we don't want to have flush wait a long time.

If this is a problem for InnoDB, I can suggest two solutions:

  • Before opening an extra table, try to take a MDL read lock. If it fails, we would not flush the table.
  • Extend InnoDB to detect the flag HA_OPEN_FOR_FLUSH and return an error on open if this
    is used in the wrong context (like truncate) or just always.
Comment by Michael Widenius [ 2022-06-28 ]

Pushed

Generated at Thu Feb 08 10:04:18 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.