[MDEV-29277] On error, fts_sync_table() fails to release a table handle Created: 2022-08-09  Updated: 2023-04-24  Resolved: 2022-09-23

Status: Closed
Project: MariaDB Server
Component/s: Full-text Search, Storage Engine - InnoDB
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10, 10.11
Fix Version/s: 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2

Type: Bug Priority: Critical
Reporter: Roel Van de Paar Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 0
Labels: affects-tests, regression, rr-profile-analyzed, sporadic

Issue Links:
Relates
relates to MDEV-26636 Race conditions due to attempted upda... Closed
relates to MDEV-29053 safe_mutex: Trying to destroy a mutex... Closed
relates to MDEV-29250 InnoDB: Failing assertion: table->get... Closed

 Description   

It looks like there are still several issues around the table->get_ref_count() == 0 assert (ref MDEV-26636, MDEV-29250). Here is a list of UniqueID's/stacks that I have seen in connection with this:

table->get_ref_count() == 0|SIGABRT|ut_dbg_assertion_failed|dict_sys_t::remove|dict_sys_t::close|innodb_shutdown
table->get_ref_count() == 0|SIGABRT|ut_dbg_assertion_failed|dict_sys_t::remove|trx_t::commit|ha_innobase::delete_table
table->get_ref_count() == 0|SIGABRT|ut_dbg_assertion_failed|dict_sys_t::remove|trx_t::commit|ha_innobase::create
table->get_ref_count() > 0|SIGABRT|ut_dbg_assertion_failed|dict_table_close|row_prebuilt_free|ha_innobase::close
table->get_ref_count() == 0|SIGABRT|ut_dbg_assertion_failed|dict_sys_t::remove|row_drop_table_for_mysql|ha_innobase::delete_table

Issues seem to be highly sporadic, and so far I have several testcases just under 100 lines, some around 150 lines and some 400-900 lines. I will attempt to reduce further to ~10 lines or less.
Here is a selection of versions I have seen this bug on (list does not correlate to list above). Note the numbers after MD are DDMMYY.

MD140222-mariadb-10.9.0-linux-x86_64-dbg @ b5852ffbeebc3000982988383daeefb0549e058a
MD030322-mariadb-10.9.0-linux-x86_64-dbg 177345dadc9250387343164be0053b1952fc59c1
MD160322-mariadb-10.5.16-linux-x86_64-opt @ 73fee39ea62037780c59161507e89dd76c10b7a3
MD160322-mariadb-10.8.3-linux-x86_64-dbg @ 9f5a3e568913e0810109554608c56c93f3ec24f8
online-alter_MD080622-mariadb-10.10.0-linux-x86_64-dbg @ d7a7d16713070c7c2902c3df4b4f6d024cd0320f
MDEV-28883_MD280622-mariadb-10.10.0-linux-x86_64-dbg @ 2db18fdb3d68d906fbd188ec570a64502ba55849
MD120722-mariadb-10.10.0-linux-x86_64-dbg @ 88b22356e623fd63aa87273a895521a6e6667bc7

Likely other versions (and especially 10.6, 10.7 given there is 10.5 and 10.8) are affected.



 Comments   
Comment by Roel Van de Paar [ 2022-09-13 ]

One example reduced testcase

CREATE TABLE t2 (c INT);
DROP TABLE t,t2;
CREATE TABLE t2 (s CHAR(255),FULLTEXT (s)) DEFAULT CHARSET=utf8;
INSERT INTO t2 VALUES (10009);
SET GLOBAL innodb_trx_rseg_n_slots_debug=1,@@SESSION.pseudo_slave_mode=ON;
SHUTDOWN;

Leads to:

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

InnoDB: Failing assertion: table->get_ref_count() == 0

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

Core was generated by `/test/MD190822-mariadb-10.11.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x14eb78942940 (LWP 40379))]
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x000014eb78b1b859 in __GI_abort () at abort.c:79
#2  0x00005623ed87cd33 in ut_dbg_assertion_failed (expr=expr@entry=0x5623edeb1db0 "table->get_ref_count() == 0", file=file@entry=0x5623edeb2318 "/test/10.11_dbg/storage/innobase/dict/dict0dict.cc", line=line@entry=1854) at /test/10.11_dbg/storage/innobase/ut/ut0dbg.cc:60
#3  0x00005623ed93f986 in dict_sys_t::remove (this=this@entry=0x5623ee530a00 <dict_sys>, table=0x14eaec07c6d0, lru=lru@entry=false, keep=keep@entry=false) at /test/10.11_dbg/storage/innobase/dict/dict0dict.cc:1854
#4  0x00005623ed940305 in dict_sys_t::close (this=0x5623ee530a00 <dict_sys>) at /test/10.11_dbg/storage/innobase/dict/dict0dict.cc:4563
#5  0x00005623ed831529 in innodb_shutdown () at /test/10.11_dbg/storage/innobase/srv/srv0start.cc:1765
#6  0x00005623ed62f622 in innobase_end () at /test/10.11_dbg/storage/innobase/handler/ha_innodb.cc:4308
#7  0x00005623ed32d9a7 in ha_finalize_handlerton (plugin=0x5623efd3cb30) at /test/10.11_dbg/sql/handler.cc:596
#8  0x00005623ed05ff76 in plugin_deinitialize (plugin=0x5623efd3cb30, ref_check=ref_check@entry=true) at /test/10.11_dbg/sql/sql_plugin.cc:1269
#9  0x00005623ed063e55 in reap_plugins () at /test/10.11_dbg/sql/sql_plugin.cc:1343
#10 0x00005623ed064938 in plugin_shutdown () at /test/10.11_dbg/sql/sql_plugin.cc:2051
#11 0x00005623ecf092c0 in clean_up (print_message=print_message@entry=true) at /test/10.11_dbg/sql/mysqld.cc:1988
#12 0x00005623ecf156c6 in mysqld_main (argc=<optimized out>, argv=<optimized out>) at /test/10.11_dbg/sql/mysqld.cc:5940
#13 0x00005623ecf08b76 in main (argc=<optimized out>, argv=<optimized out>) at /test/10.11_dbg/sql/main.cc:34

As well as:

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

InnoDB: Failing assertion: table->get_ref_count() == 0

10.11.0 bc563f1a4b0b38de3b41fd0f0d3d8b7f1aacbd8b (Debug)

Core was generated by `/test/MD190822-mariadb-10.11.0-linux-x86_64-dbg/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGABRT, Aborted.
#0  0x0000149a4e70200b in ?? ()
[Current thread is 1 (LWP 3947348)]
(gdb) bt
#0  0x0000149a4e70200b in ?? ()
Backtrace stopped: Cannot access memory at address 0x7fffa4a84110

Bug confirmed present in:
MariaDB: 10.6.10 (dbg), 10.7.6 (dbg), 10.8.5 (dbg), 10.9.2 (dbg), 10.10.2 (dbg), 10.11.0 (dbg)

Bug (or feature/syntax) confirmed not present in:
MariaDB: 10.3.37 (dbg), 10.3.37 (opt), 10.4.27 (dbg), 10.4.27 (opt), 10.5.18 (dbg), 10.5.18 (opt), 10.6.10 (opt), 10.7.6 (opt), 10.8.5 (opt), 10.9.2 (opt), 10.10.2 (opt), 10.11.0 (opt)
MySQL: 5.5.62 (dbg), 5.5.62 (opt), 5.6.51 (dbg), 5.6.51 (opt), 5.7.38 (dbg), 5.7.38 (opt), 8.0.29 (dbg), 8.0.29 (opt)

The two UniqueID's for this testcase specifically:

table->get_ref_count() == 0|SIGABRT|Backtrace stopped: Cannot access memory at address
table->get_ref_count() == 0|SIGABRT|ut_dbg_assertion_failed|dict_sys_t::remove|dict_sys_t::close|innodb_shutdown

Comment by Marko Mäkelä [ 2022-09-14 ]

I do not agree that this is a race condition or a sporadic failure. Many assertion failures on dict_table_t::n_ref_count are due to a race condition, but not this one.

I can deterministically reproduce this with the following:

--source include/have_innodb.inc
CREATE TABLE t1 (s CHAR(255),FULLTEXT (s)) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO t1 VALUES (10009);
SET GLOBAL innodb_trx_rseg_n_slots_debug=1,@@SESSION.pseudo_slave_mode=ON;
--source include/restart_mysqld.inc
DROP TABLE t1;

Edit: if the 10009 above is changed to '10009', then this will not crash.

10.6 fd0bdd3180a7d5f4b9804d372d6a63b6a202818c

Thread 1 received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
44	./nptl/pthread_kill.c: Tiedostoa tai hakemistoa ei ole.
(rr) up 4
#4  0x0000558e7f9b41c6 in ut_dbg_assertion_failed (
    expr=expr@entry=0x558e7ea06d52 "table->get_ref_count() == 0", 
    file=file@entry=0x558e7eac44c0 "/mariadb/10.6/storage/innobase/dict/dict0dict.cc", line=line@entry=1860) at /mariadb/10.6/storage/innobase/ut/ut0dbg.cc:60
60		abort();
(rr) up
#5  0x0000558e7f7e0733 in dict_sys_t::remove (
    this=this@entry=0x558e7fc43980 <dict_sys>, table=0x7f57b0226b08, 
    lru=lru@entry=false, keep=keep@entry=false)
    at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1860
1860		ut_a(table->get_ref_count() == 0);
(rr) p table.name
$1 = {
  m_name = 0x7f57b0057168 "test/FTS_", '0' <repeats 14 times>, "12_", '0' <repeats 14 times>, "1e_INDEX_1"}
(rr) watch -l table.n_ref_count 
Hardware watchpoint 1: -location table.n_ref_count
(rr) reverse-continue
(rr) reverse-continue
(rr) continue
Thread 4 hit Hardware watchpoint 1: -location table.n_ref_count
 
Old value = {m_counter = std::atomic<unsigned int> = { 0 }}
New value = {m_counter = std::atomic<unsigned int> = { 1 }}
0x0000558e7f74494d in dict_table_t::acquire (this=0x7f57b0226b08) at /mariadb/10.6/storage/innobase/include/dict0dict.inl:1133
1133	}
(rr) backtrace
#0  0x0000558e7f74494d in dict_table_t::acquire (this=0x7f57b0226b08) at /mariadb/10.6/storage/innobase/include/dict0dict.inl:1133
#1  0x0000558e7f7ddf06 in dict_table_open_on_name (table_name=<optimized out>, dict_locked=dict_locked@entry=true, ignore_err=ignore_err@entry=DICT_ERR_IGNORE_NONE)
    at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1111
#2  0x0000558e7f8ec436 in pars_retrieve_table_def (sym_node=sym_node@entry=0x7f57cc007138) at /mariadb/10.6/storage/innobase/pars/pars0pars.cc:768
#3  0x0000558e7f8ee021 in pars_insert_statement (table_sym=0x7f57cc007138, values_list=0x7f57cc007248, select=select@entry=0x0) at /mariadb/10.6/storage/innobase/pars/pars0pars.cc:1274
#4  0x0000558e7f8ef5d0 in yyparse () at /dev/shm/10.6/pars0grm.y:311
#5  0x0000558e7f8edaf0 in pars_sql (info=info@entry=0x7f57cc004838, 
    str=str@entry=0x7f57cc0065c8 "PROCEDURE P() IS\nBEGIN\nINSERT INTO $index_table_name VALUES (:token, :first_doc_id,  :last_doc_id, :doc_count, :ilist);\nEND;\n")
    at /mariadb/10.6/storage/innobase/pars/pars0pars.cc:1986
#6  0x0000558e7f82c969 in fts_parse_sql (fts_table=fts_table@entry=0x7f57d5053410, info=info@entry=0x7f57cc004838, 
    sql=sql@entry=0x558e7eb52ed0 "BEGIN\nINSERT INTO $index_table_name VALUES (:token, :first_doc_id,  :last_doc_id, :doc_count, :ilist);") at /mariadb/10.6/storage/innobase/fts/fts0sql.cc:136
#7  0x0000558e7f7562f6 in fts_write_node (trx=trx@entry=0x7f57d6b60b80, graph=0x7f57b011a3d8, fts_table=fts_table@entry=0x7f57d5053410, word=word@entry=0x7f57b0088158, node=0x7f57b011a620)
    at /mariadb/10.6/storage/innobase/fts/fts0fts.cc:3753
#8  0x0000558e7f75b402 in fts_sync_write_words (trx=trx@entry=0x7f57d6b60b80, index_cache=index_cache@entry=0x7f57b011a388) at /mariadb/10.6/storage/innobase/fts/fts0fts.cc:3877
#9  0x0000558e7f75b66e in fts_sync_index (sync=sync@entry=0x7f57b011a338, index_cache=index_cache@entry=0x7f57b011a388) at /mariadb/10.6/storage/innobase/fts/fts0fts.cc:3960
#10 0x0000558e7f75c38a in fts_sync (sync=0x7f57b011a338) at /mariadb/10.6/storage/innobase/fts/fts0fts.cc:4107
#11 0x0000558e7f75c51a in fts_sync_table (table=table@entry=0x7f57b01359f8) at /mariadb/10.6/storage/innobase/fts/fts0fts.cc:4152
#12 0x0000558e7f82967b in fts_optimize_sync_table (table=<optimized out>, process_message=process_message@entry=false) at /mariadb/10.6/storage/innobase/fts/fts0opt.cc:2768
#13 0x0000558e7f82c12a in fts_optimize_callback () at /mariadb/10.6/storage/innobase/fts/fts0opt.cc:2886
#14 0x0000558e7f9b9582 in tpool::task_group::execute (this=0x558e80566b00 <task_group>, t=t@entry=0x558e80566ae0 <task>) at /mariadb/10.6/tpool/task_group.cc:55
(rr) continue
Continuing.
 
Thread 4 hit Hardware watchpoint 1: -location table.n_ref_count
 
Old value = {m_counter = std::atomic<unsigned int> = { 1 }}
New value = {m_counter = std::atomic<unsigned int> = { 2 }}
0x0000558e7f74494d in dict_table_t::acquire (this=0x7f57b0226b08) at /mariadb/10.6/storage/innobase/include/dict0dict.inl:1133
1133	}
(rr) bt
#10 0x0000558e7f9a1358 in trx_t::rollback (this=this@entry=0x7f57d6b60b80, savept=savept@entry=0x0) at /mariadb/10.6/storage/innobase/trx/trx0roll.cc:176
#11 0x0000558e7f75bbb3 in fts_sync_commit (sync=sync@entry=0x7f57b011a338) at /mariadb/10.6/storage/innobase/fts/fts0fts.cc:4004
(rr) continue
Continuing.
 
Thread 4 hit Hardware watchpoint 1: -location table.n_ref_count
 
Old value = {m_counter = std::atomic<unsigned int> = { 2 }}
New value = {m_counter = std::atomic<unsigned int> = { 1 }}
dict_table_t::release (this=0x7f57b0226b08) at /mariadb/10.6/storage/innobase/include/dict0dict.inl:1142
1142		ut_ad(n > 0);
(rr) bt
#10 0x0000558e7f9a1358 in trx_t::rollback (this=this@entry=0x7f57d6b60b80, savept=savept@entry=0x0) at /mariadb/10.6/storage/innobase/trx/trx0roll.cc:176
#11 0x0000558e7f75bbb3 in fts_sync_commit (sync=sync@entry=0x7f57b011a338) at /mariadb/10.6/storage/innobase/fts/fts0fts.cc:4004

The first table handle was never released. The rollback acquired and released a table handle just fine.

Comment by Roel Van de Paar [ 2022-09-15 ]

While the testcase given may be deterministic and yield a correct stack, I have also observed stack smashing (ref example above) and possibly sporadicity (cannot recall). There are also other asserts (ref initial description), so it is likely not all code paths/avenues are covered yet.

Comment by Marko Mäkelä [ 2022-09-16 ]

Actually, it turns out that I was just lucky 2 days ago to get a failure straight away. Today, the same test would only crash after very many retries.

Comment by Thirunarayanan Balathandayuthapani [ 2022-09-20 ]

Given test case fails only in 10.6+ versions.

Comment by Thirunarayanan Balathandayuthapani [ 2022-09-20 ]

Patch is in bb-10.6-MDEV-29277

Comment by Marko Mäkelä [ 2022-09-21 ]

The fix looks OK to me.

Comment by Marko Mäkelä [ 2022-09-21 ]

This bug is likely not the only reason why the assertion would fail. The provided test case was never claimed to reproduce a crash on 10.5.

Comment by Marko Mäkelä [ 2022-09-21 ]

I removed the stack-smashing label because I do not see any evidence of any buffer overflow with the posted test case. Buffer overflows like MDEV-23256 can corrupt InnoDB dict_table_t, including dict_table_t::n_ref_count, but no specific evidence of that has been posted in this ticket.

Generated at Thu Feb 08 10:07:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.