[MDEV-29250] InnoDB: Failing assertion: table->get_ref_count() == 0 Created: 2022-08-05  Updated: 2022-08-23  Resolved: 2022-08-21

Status: Closed
Project: MariaDB Server
Component/s: Full-text Search, Storage Engine - InnoDB
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10
Fix Version/s: 10.9.2, 10.10.1, 10.6.10, 10.7.6, 10.8.5

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Thirunarayanan Balathandayuthapani
Resolution: Fixed Votes: 1
Labels: regression-10.6

Issue Links:
Blocks
Relates
relates to MDEV-29277 On error, fts_sync_table() fails to r... Closed
relates to MDEV-27783 InnoDB: Failing assertion: table->get... Closed

 Description   

The fix of MDEV-27783 appears to have been incomplete. I just got the following crash:

bb-10.6-release c9803504389e12b79253e12f1d7d78fbdf92395c

CURRENT_TEST: main.alter_table
mysqltest: At line 1636: query 'ALTER TABLE ti1 ADD PRIMARY KEY(a)' failed: <Unknown> (2013): Lost connection to server during query
...
2022-08-05 11:17:44 0x7fd40d683640  InnoDB: Assertion failure in file /mariadb/10.6/storage/innobase/dict/dict0dict.cc line 1844
InnoDB: Failing assertion: table->get_ref_count() == 0
...
#6  0x0000557143293726 in ut_dbg_assertion_failed (
    expr=expr@entry=0x5571437ef3c1 "table->get_ref_count() == 0", 
    file=file@entry=0x5571437ef958 "/mariadb/10.6/storage/innobase/dict/dict0dict.cc", line=line@entry=1844) at /mariadb/10.6/storage/innobase/ut/ut0dbg.cc:60
#7  0x000055714331011f in dict_sys_t::remove (this=0x557143e9fc40 <dict_sys>, 
    table=table@entry=0x7fd36c11a338, lru=lru@entry=false, 
    keep=keep@entry=false)
    at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:1844
#8  0x0000557143330cb4 in trx_t::commit (this=this@entry=0x7fd40e967680, 
...
Thread 55 (Thread 0x7fd32ffff640 (LWP 2881184)):
...
#8  0x0000557143120b44 in dict_sys_t::lock (this=this@entry=0x557143e9fc40 <dict_sys>, file=file@entry=0x5571437ef958 "/mariadb/10.6/storage/innobase/dict/dict0dict.cc", line=line@entry=225) at /mariadb/10.6/storage/innobase/include/dict0dict.h:1532
#9  0x000055714330d54f in dict_table_close (table=table@entry=0x7fd36c11a338) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:225
#10 0x000055714330d5a6 in dict_table_close (table=0x7fd36c11a338, dict_locked=dict_locked@entry=false, thd=0x557145271ac8, mdl=0x7fd31c0016c0) at /mariadb/10.6/storage/innobase/dict/dict0dict.cc:253
#11 0x0000557143235002 in purge_node_t::close_table (this=this@entry=0x557144f2d6f8) at /mariadb/10.6/storage/innobase/include/row0purge.h:229
#12 0x000055714323509b in purge_node_t::retain_mdl (this=this@entry=0x557144f2d6f8, table_id=57) at /mariadb/10.6/storage/innobase/include/row0purge.h:249
#13 0x0000557143232392 in row_purge_parse_undo_rec (node=node@entry=0x557144f2d6f8, undo_rec=undo_rec@entry=0x557144f3dc18 "", thr=thr@entry=0x557144f2d508, updated_extern=updated_extern@entry=0x7fd32fffea57) at /mariadb/10.6/storage/innobase/row/row0purge.cc:932
#14 0x0000557143234c7c in row_purge (node=node@entry=0x557144f2d6f8, undo_rec=undo_rec@entry=0x557144f3dc18 "", thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/row/row0purge.cc:1120
#15 0x0000557143234d1b in row_purge_step (thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/row/row0purge.cc:1172
#16 0x00005571431da5e1 in que_thr_step (thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/que/que0que.cc:653
#17 0x00005571431da7c2 in que_run_threads_low (thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/que/que0que.cc:709
#18 0x00005571431da87f in que_run_threads (thr=thr@entry=0x557144f2d508) at /mariadb/10.6/storage/innobase/que/que0que.cc:729
#19 0x00005571432610ac in srv_task_execute () at /mariadb/10.6/storage/innobase/srv/srv0srv.cc:1656
#20 0x0000557143261794 in purge_worker_callback () at /mariadb/10.6/storage/innobase/srv/srv0srv.cc:1899

I think that a similar problem (crash of a non-debug server) may affect DROP TABLE as well as any table rebuild operation (including OPTIMIZE TABLE and TRUNCATE TABLE) on tables that contain FULLTEXT INDEX.



 Comments   
Comment by Marko Mäkelä [ 2022-08-05 ]

So far, I failed to reproduce the crash with the following test, which is based on the failing section of the main.alter_table test:

--source include/have_innodb.inc
 
CREATE TABLE ti1(a INT NOT NULL, b INT, c INT) engine=InnoDB;
ALTER TABLE ti1 ADD COLUMN d VARCHAR(200);
ALTER TABLE ti1 ADD COLUMN d2 VARCHAR(200);
ALTER TABLE ti1 ADD COLUMN e ENUM('a', 'b') FIRST;
ALTER TABLE ti1 ADD COLUMN f INT AFTER a;
ALTER TABLE ti1 ADD FULLTEXT INDEX ii3 (d), ALGORITHM=INPLACE;
ALTER TABLE ti1 ADD FULLTEXT INDEX ii4 (d2);
ALTER TABLE ti1 ADD PRIMARY KEY(a);
DROP TABLE ti1;

I think that purge is running on the table because of the hidden metadata records for instant ADD COLUMN. I believe that the following test should be equivalent, but also it fails to fail for me:

--source include/have_innodb.inc
 
CREATE TABLE ti1(a SERIAL, d VARCHAR(200), FULLTEXT INDEX(d)) ENGINE=InnoDB;
 
let $N=10;
while ($N) {
INSERT INTO ti1 SET a=NULL;
dec $N;
}
let $N=10;
while ($N) {
DELETE FROM ti1 LIMIT 1;
dec $N;
}
ALTER TABLE ti1 FORCE, ALGORITHM=COPY;
DROP TABLE ti1;

Comment by Marko Mäkelä [ 2022-08-05 ]

With the following code patch, I finally had some success, on the 38th round when running 115 concurrent tests and --innodb-purge-threads=20. This was with the last test variant (CREATE, INSERT, DELETE, ALTER, DROP):

mysqltest: At line 16: query 'DROP TABLE ti1' failed: <Unknown> (2013): Lost connection to server during query

I think that it could have failed in a similar way during the ALTER TABLE, but I may be mistaken.
The patch introduces a sleep and disables a condition that could prevent the problem from reproducing with the reduced test.

diff --git a/storage/innobase/include/row0purge.h b/storage/innobase/include/row0purge.h
index 34af658cb12..8681b64cee9 100644
--- a/storage/innobase/include/row0purge.h
+++ b/storage/innobase/include/row0purge.h
@@ -238,7 +238,8 @@ struct purge_node_t{
   bool retain_mdl(table_id_t table_id)
   {
     ut_ad(table_id);
-    if (last_table_id == table_id && mdl_hold_recs < 100)
+    std::this_thread::sleep_for(std::chrono::milliseconds(1));
+    if (0 && last_table_id == table_id && mdl_hold_recs < 100)
     {
       ut_ad(table);
       mdl_hold_recs++;

Comment by Thirunarayanan Balathandayuthapani [ 2022-08-05 ]

Patch is in bb-10.6-MDEV-29250

Comment by Marko Mäkelä [ 2022-08-09 ]

The revised patch looks correct to me. OK to push after addressing my comments, and after this has passed stress testing.

Comment by Matthias Leich [ 2022-08-18 ]

origin/bb-10.6-MDEV-29250 d3d8e91f51ed289d43ac91eb1d8ba9d11c904cb5 2022-08-18T16:24:26+05:30
performed quite well in RQG testing.
Especially remarkable is the clear smaller amount of failing crash recovery tests compared to
origin/10.6 af552f2903b9764e3bc0634a0037c39d1a837f4f 2022-08-16T21:33:45+05:30
In case the first time seen problem
     mysqld: /data/Server/bb-10.6-MDEV-29250/storage/innobase/include/dict0mem.h:1348: void dict_index_t::init_change_cols(unsigned int): Assertion `n_fields > n_cols' failed.
is not caused by the modifications for MDEV-29250 I vote for pushing MDEV-29250.

Generated at Thu Feb 08 10:07:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.