[MDEV-16515] InnoDB: Failing assertion: ++retries < 10000 in file dict0dict.cc line 2737 Created: 2018-06-18  Updated: 2020-07-20  Resolved: 2018-06-26

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.1
Fix Version/s: 10.1.35, 10.2.17, 10.3.8

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: regression

Attachments: HTML File threads_full    
Issue Links:
Problem/Incident
causes MDEV-16647 InnoDB fails to drop large temporary ... Closed
causes MDEV-16862 build failure for WITH_INNODB_AHI=0 Closed
causes MDEV-17470 Orphan temporary files after interrup... Closed
causes MDEV-18237 InnoDB: Unable to drop FTS index aux ... Closed
causes MDEV-20102 When the ctas from a big table is int... Closed
Relates
relates to MDEV-14491 rpl.rpl_row_mixing_engines, rpl.rpl_n... Closed
relates to MDEV-14727 innodb_fts.fulltext2 failed in buildb... Closed
relates to MDEV-16283 ALTER TABLE...DISCARD TABLESPACE stil... Closed
relates to MDEV-16759 InnoDB: Assertion failure in thread 1... Closed
relates to MDEV-23223 If you cancel CTAS Query on MariaDB S... Closed
relates to MDEV-9663 InnoDB assertion failure: *cursor->in... Closed

 Description   

10.1 9f848da640dd6c3f44d56eae18204370ae7f835c

2018-06-17 04:31:32 7fb03ad51b00  InnoDB: Assertion failure in thread 140394878016256 in file dict0dict.cc line 2737
InnoDB: Failing assertion: ++retries < 10000
 
#5  0x00007fb0635c3028 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x0000557f3f6b3832 in dict_index_remove_from_cache_low (table=0x557f431fcd28, index=0x557f4308a998, lru_evict=0) at /home/travis/src/storage/xtradb/dict/dict0dict.cc:2737
#7  0x0000557f3f6b24de in dict_table_remove_from_cache_low (table=0x557f431fcd28, lru_evict=0) at /home/travis/src/storage/xtradb/dict/dict0dict.cc:2114
#8  0x0000557f3f6b2895 in dict_table_remove_from_cache (table=0x557f431fcd28) at /home/travis/src/storage/xtradb/dict/dict0dict.cc:2151
#9  0x0000557f3f594b5e in row_drop_table_for_mysql (name=0x557f42f57590 "test/#sql-ib3458-1040254509", trx=0x7fb00000bf98, drop_db=false, create_failed=0, nonatomic=false) at /home/travis/src/storage/xtradb/row/row0mysql.cc:4430
#10 0x0000557f3f58544a in row_merge_drop_table (trx=0x7fb00000bf98, table=0x557f431fcd28) at /home/travis/src/storage/xtradb/row/row0merge.cc:3823
#11 0x0000557f3f4a7768 in ha_innobase::commit_inplace_alter_table (this=0x7fb0000aa158, altered_table=0x7fb0000cac40, ha_alter_info=0x7fb03ad4dad0, commit=true) at /home/travis/src/storage/xtradb/handler/handler0alter.cc:6323
#12 0x0000557f3f13e52f in handler::ha_commit_inplace_alter_table (this=0x7fb0000aa158, altered_table=0x7fb0000cac40, ha_alter_info=0x7fb03ad4dad0, commit=true) at /home/travis/src/sql/handler.cc:4225
#13 0x0000557f3efb471b in mysql_inplace_alter_table (thd=0x557f42fea610, table_list=0x7fb00003b0c8, table=0x7fb0000d7390, altered_table=0x7fb0000cac40, ha_alter_info=0x7fb03ad4dad0, inplace_supported=HA_ALTER_INPLACE_NO_LOCK_AFTER_PREPARE, target_mdl_request=0x7fb03ad4db40, alter_ctx=0x7fb03ad4e6e0) at /home/travis/src/sql/sql_table.cc:7222
#14 0x0000557f3efb9085 in mysql_alter_table (thd=0x557f42fea610, new_db=0x7fb00003b6b8 "test", new_name=0x0, create_info=0x7fb03ad4f2d0, table_list=0x7fb00003b0c8, alter_info=0x7fb03ad4f240, order_num=0, order=0x0, ignore=true) at /home/travis/src/sql/sql_table.cc:9048
#15 0x0000557f3f02b4a8 in Sql_cmd_alter_table::execute (this=0x7fb00003b770, thd=0x557f42fea610) at /home/travis/src/sql/sql_alter.cc:325
#16 0x0000557f3eef9f39 in mysql_execute_command (thd=0x557f42fea610) at /home/travis/src/sql/sql_parse.cc:5696
#17 0x0000557f3eefe6a1 in mysql_parse (thd=0x557f42fea610, rawbuf=0x7fb00003add8 "ALTER ONLINE IGNORE TABLE `table_partitioned` DROP FOREIGN KEY IF EXISTS j, DROP COLUMN IF EXISTS tscol5, ALGORITHM=COPY, FORCE, LOCK=NONE, ALGORITHM=INPLACE, DROP COLUMN IF EXISTS `field4`, ALGORITHM"..., length=246, parser_state=0x7fb03ad50630) at /home/travis/src/sql/sql_parse.cc:7449
#18 0x0000557f3eeed0ed in dispatch_command (command=COM_QUERY, thd=0x557f42fea610, packet=0x557f43149741 "ALTER ONLINE IGNORE TABLE `table_partitioned` DROP FOREIGN KEY IF EXISTS j, DROP COLUMN IF EXISTS tscol5, ALGORITHM=COPY, FORCE, LOCK=NONE, ALGORITHM=INPLACE, DROP COLUMN IF EXISTS `field4`, ALGORITHM"..., packet_length=247) at /home/travis/src/sql/sql_parse.cc:1492
#19 0x0000557f3eeebe6d in do_command (thd=0x557f42fea610) at /home/travis/src/sql/sql_parse.cc:1121
#20 0x0000557f3f026a95 in do_handle_one_connection (thd_arg=0x557f42fea610) at /home/travis/src/sql/sql_connect.cc:1330
#21 0x0000557f3f0267f9 in handle_one_connection (arg=0x557f42fea610) at /home/travis/src/sql/sql_connect.cc:1242
#22 0x00007fb063f64184 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#23 0x00007fb063686ffd in clone () from /lib/x86_64-linux-gnu/libc.so.6

ALTER ONLINE IGNORE TABLE `table_partitioned` DROP FOREIGN KEY IF EXISTS j, DROP COLUMN IF EXISTS tscol5, ALGORITHM=COPY, FORCE, LOCK=NONE, ALGORITHM=INPLACE, DROP COLUMN IF EXISTS `field4`, ALGORITHM=DEFAULT, CHECKSUM 1

Full threads are attached. Datadir, coredump, binary, and logs are available on perro.



 Comments   
Comment by Elena Stepanova [ 2018-06-25 ]

It also happened in buildbot:
http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest-big/builds/1872

rpl.rpl_row_mixing_engines 'row,xtradb'  w2 [ fail ]
        Test ended at 2018-06-14 08:58:04
 
CURRENT_TEST: rpl.rpl_row_mixing_engines
mysqltest: In included file "./extra/rpl_tests/rpl_mixing_engines.inc": 
included from ./extra/rpl_tests/rpl_mixing_engines.test at line 1666:
included from /mnt/buildbot/build/mariadb-10.1.34/mysql-test/suite/rpl/t/rpl_row_mixing_engines.test at line 12:
At line 562: query 'DROP TABLE IF EXISTS tt_xx_$tb_id, nt_xx_$tb_id' failed: 2013: Lost connection to MySQL server during query
 
The result from queries just before the failure was:
< snip >
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	#	Gtid	#	#	BEGIN GTID #-#-#
master-bin.000001	#	Table_map	#	#	table_id: # (test.tt_1)
master-bin.000001	#	Write_rows_v1	#	#	table_id: # flags: STMT_END_F
master-bin.000001	#	Xid	#	#	COMMIT /* XID */
-e-e-e-e-e-e-e-e-e-e-e- >> C << -e-e-e-e-e-e-e-e-e-e-e-
-b-b-b-b-b-b-b-b-b-b-b- >> trunc-CS-T B ISe-T<-N T C << -b-b-b-b-b-b-b-b-b-b-b-
include/show_binlog_events.inc
Log_name	Pos	Event_type	Server_id	End_log_pos	Info
master-bin.000001	#	Gtid	#	#	GTID #-#-#
master-bin.000001	#	Query	#	#	use `test`; TRUNCATE TABLE tt_xx_7
master-bin.000001	#	Gtid	#	#	BEGIN GTID #-#-#
master-bin.000001	#	Table_map	#	#	table_id: # (test.tt_1)
master-bin.000001	#	Write_rows_v1	#	#	table_id: # flags: STMT_END_F
master-bin.000001	#	Xid	#	#	COMMIT /* XID */
-e-e-e-e-e-e-e-e-e-e-e- >> trunc-CS-T B ISe-T<-N T C << -e-e-e-e-e-e-e-e-e-e-e-
 
rpl_mixing_engines.inc [commands=drop-CS]
-b-b-b-b-b-b-b-b-b-b-b- >> drop-CS << -b-b-b-b-b-b-b-b-b-b-b-
DROP TABLE IF EXISTS tt_xx_7, nt_xx_7;
 
More results from queries before failure can be found in /mnt/buildbot/build/mariadb-10.1.34/mysql-test/var/2/log/rpl_row_mixing_engines.log
 
 
Server [mysqld.1 - pid: 27930, winpid: 27930, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2018-06-14 08:58:00 7ff418576b00  InnoDB: Assertion failure in thread 140686357129984 in file dict0dict.cc line 2737
InnoDB: Failing assertion: ++retries < 10000
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
180614  8:58:00 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.1.34-MariaDB
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=7
max_threads=153
thread_count=6
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 62940 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x62a00077a208
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7ff418576030 thread_stack 0x48400
/usr/lib/x86_64-linux-gnu/libasan.so.2(+0x4a077)[0x7ff42e9d3077]
/mnt/buildbot/build/mariadb-10.1.34/sql/mysqld(my_print_stacktrace+0xdc)[0x564d571994d9]
/mnt/buildbot/build/mariadb-10.1.34/sql/mysqld(handle_fatal_signal+0x766)[0x564d56638056]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ff42d87a390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7ff42ce4a428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7ff42ce4c02a]
/mnt/buildbot/build/mariadb-10.1.34/sql/mysqld(+0x142b472)[0x564d56f2f472]
/mnt/buildbot/build/mariadb-10.1.34/sql/mysqld(+0x14401ce)[0x564d56f441ce]
/mnt/buildbot/build/mariadb-10.1.34/sql/mysqld(+0x14409b0)[0x564d56f449b0]
dict/dict0dict.cc:2738(dict_index_remove_from_cache_low(dict_table_t*, dict_index_t*, unsigned long))[0x564d56d78c60]
dict/dict0dict.cc:2110(dict_table_remove_from_cache_low(dict_table_t*, unsigned long))[0x564d56bbec8a]
dict/dict0dict.cc:2152(dict_table_remove_from_cache(dict_table_t*))[0x564d5664ecb7]
row/row0mysql.cc:4432(row_drop_table_for_mysql(char const*, trx_t*, bool, unsigned long, bool))[0x564d5664eec5]
handler/ha_innodb.cc:13188(ha_innobase::delete_table(char const*))[0x564d563745fb]
sql/handler.cc:4320(handler::ha_delete_table(char const*))[0x564d563761bf]
sql/handler.cc:2384(ha_delete_table(THD*, handlerton*, char const*, char const*, char const*, bool))[0x564d5620f203]
sql/sql_table.cc:2472(mysql_rm_table_no_locks(THD*, TABLE_LIST*, bool, bool, bool, bool, bool))[0x564d5621a4f9]
sql/sql_table.cc:2084(mysql_rm_table(THD*, TABLE_LIST*, char, char))[0x564d562200ca]
sql/sql_parse.cc:4254(mysql_execute_command(THD*))[0x564d56225dfc]
sql/sql_parse.cc:7450(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x564d564615e5]
sql/sql_parse.cc:1515(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x564d56461ac2]
sql/sql_parse.cc:1121(do_command(THD*))[0x564d56a4c649]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7ff42d8706ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7ff42cf1b82d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x62b000ba6220): DROP TABLE IF EXISTS tt_xx_7, nt_xx_7
Connection ID (thread ID): 1747
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
 
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Comment by Marko Mäkelä [ 2018-06-25 ]

In the core dump, we have index->space=3426 and index->page=FIL_NULL, but the adaptive hash index is not empty:

(gdb) p *info
$1 = {ref_count = 56, root_guess = 0x7fb03fd8bb90, hash_analysis = 2292, 
  last_hash_succ = 1, n_hash_potential = 2280, n_fields = 0, n_bytes = 1, 
  left_side = 0, magic_n = 1112765}

The info->root_guess points to a different tablespace:

(gdb) p $1.root_guess.page
$10 = {space = 3268, offset = 3, buf_fix_count = 0, io_fix = 0, state = 5, 
  flush_type = 0, buf_pool_index = 0, zip = {data = 0x0, m_start = 0, 
    m_external = false, m_end = 0, m_nonempty = 0, n_blobs = 0, ssize = 0}, 
  write_size = 0, encrypted = false, real_size = 0, slot = 0x0, hash = 0x0, 
  in_page_hash = 1, in_zip_hash = 0, list = {prev = 0x7fb03ffcd0f0, 
    next = 0x7fb03f6b70c8}, in_flush_list = 0, in_free_list = 0, 
  newest_modification = 161156672, oldest_modification = 0, LRU = {
    prev = 0x7fb03f9df820, next = 0x7fb03fd8c5a0}, in_LRU_list = 1, old = 0, 
  freed_page_clock = 11594, access_time = 201517474, is_corrupt = 0, 
  file_page_was_freed = 0}

Assuming that the info->root_guess points to a page that was valid for the index, the issue would seem to be that table->space points to something else, and the call

buf_LRU_drop_page_hash_for_tablespace(table);

in the loop would have no effect.

In MariaDB before 10.2, TRUNCATE TABLE could reassign dict_table_t::space. row_truncate_table_for_mysql() does call buf_LRU_drop_page_hash_for_tablespace(table) before that, but it does not check if all the pages were dropped, I/O-fixed or buffer-fixed pages would seem to be skipped.

I will try to check the fil_system->space_list and the buffer pool contents, to determine if the problem could have been TRUNCATE TABLE.

Comment by Marko Mäkelä [ 2018-06-25 ]

Only the tablespace 3426 exists:

$16 = (fil_space_t *) 0x7fb00018e738
(gdb) p *$16
$17 = {name = 0x7fb0000b8428 "test/#sql-ib3458-1040254509", id = 3426, 
  tablespace_version = 3427, stop_new_ops = false, purpose = 501, chain = {
    count = 1, start = 0x7fb000115b68, end = 0x7fb000115b68}, size = 6, 
  recv_size = 0, flags = 0, n_reserved_extents = 0, n_pending_flushes = 0, … }

I think that it is plausible that the tablespace (with original ID 3268) was truncated. The size is only 6 pages. The old root_guess page had PAGE_LEVEL=1 and 60 node pointers, and InnoDB tablespaces never shrink outside truncate, so the only way how fil_space_t::size can be only 6 (instead of something bigger than 60) should be TRUNCATE TABLE.

I found exactly 56 blocks in the buffer pool with block->index == index, matching info->ref_count. All block->page.space numbers point to a yet different tablespace 2828, which does not exist either. The tablespace ID and the page number in each block->frame match those of the block->page. So, it looks like the tablespace ID in the info->root_guess is incorrect (because the block was used for some other table that was later dropped or rebuilt), and the original tablespace ID before TRUNCATE TABLE was 2828, not 3268.

I think that the most plausible explanation for this failure is that TRUNCATE TABLE was executed, and it did not drop all of the adaptive hash index because some pages were I/O-fixed (likely for flushing dirty pages) or buffer-fixed at that time.

Comment by Marko Mäkelä [ 2018-06-25 ]

In MariaDB 5.5, we would not drop the adaptive hash indexes at all during ALTER TABLE…DISCARD TABLESPACE or TRUNCATE TABLE; we would only warn that they are not empty. In MariaDB 10.0 we would call fil_discard_tablespace() in both cases.

The active dropping was added recently in MariaDB 10.1.34 as part of the MDEV-16283/MDEV-14727/MDEV-14491 fix.

This failure can be classified as a regression of that fix. I think that the most feasible solution is to fix this in 10.1+ only. I believe that 10.2 could fail in a different way due to TRUNCATE TABLE. Because the tablespace ID would not change, the adaptive hash index would be dropped in the end in this scenario. But we could find false positives when searching for records through the adaptive hash index.

Comment by Marko Mäkelä [ 2018-07-31 ]

As noted in MDEV-16759, this fix appears to have fixed a case of InnoDB adaptive hash index corruption that lead to permanent corruption of InnoDB tables (MDEV-9663). It is not known yet if there are other causes of MDEV-9663 than TRUNCATE TABLE while innodb_adaptive_hash_index is at its default setting (ON).

Generated at Thu Feb 08 08:29:29 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.