[MDEV-26936] Recovery crash on rolling back DELETE FROM SYS_INDEXES Created: 2021-10-29  Updated: 2021-11-01  Resolved: 2021-10-29

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: None
Fix Version/s: 10.6.5

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: crash, recovery

Issue Links:
Problem/Incident
is caused by MDEV-25180 Atomic ALTER TABLE Closed

 Description   

In yesterday’s stress tests by mleich we got a crash in a kill+restart test. I started a server on the saved data directory and got the following crash:

10.6 dbd6c6dc01228fe6e63f3f7dc695eb56ca8cd28d

2021-10-29 13:52:47 0 [Note] InnoDB: 128 rollback segments are active.
mariadbd: /mariadb/10.6m/storage/innobase/trx/trx0trx.cc:1221: void trx_t::evict_table(table_id_t, bool): Assertion `!locked || (table->locks).start->trx == this' failed.

This occurred while we were rolling back a DELETE operation of a SYS_INDEXES record for which NAME='\xffuidx'. The special byte '\xff' indicates that it is a stub for ADD INDEX uidx. At the same time, we had recovered a DML transaction that is holding a lock on the same user table (t1). The assertion fails, because tables must never be evicted if other transactions are holding locks on them.

With some effort, I created a repeatable test case for this:

--source include/have_innodb.inc
# The embedded server tests do not support restarting.
--source include/not_embedded.inc
--source include/have_debug.inc
--source include/have_debug_sync.inc
 
connection default;
CREATE TABLE t1(a INT PRIMARY KEY, b INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES(1,1);
 
connect ddl, localhost, root;
SET DEBUG_SYNC = 'row_merge_after_scan SIGNAL scanned WAIT_FOR commit';
SET DEBUG_SYNC = 'before_commit_rollback_inplace SIGNAL c WAIT_FOR ever';
send ALTER TABLE t1 ADD UNIQUE INDEX(b), ALGORITHM=INPLACE;
 
connection default;
SET DEBUG_SYNC = 'now WAIT_FOR scanned';
BEGIN;
INSERT INTO t1 VALUES(2,1);
SET DEBUG_SYNC = 'now SIGNAL commit';
SET DEBUG_SYNC = 'now WAIT_FOR c';
SET GLOBAL innodb_flush_log_at_trx_commit=1;
INSERT INTO t1 VALUES(3,3);
sleep 1;
 
--source include/kill_mysqld.inc
disconnect ddl;
--source include/start_mysqld.inc
 
CHECK TABLE t1;
SHOW CREATE TABLE t1;
SELECT * FROM t1;
DROP TABLE t1;

Note: I have no idea why that sleep 1 is needed. I suspect MDEV-26789 or some related changes. Is our durability broken now?
The test requires the following synchronization point:

diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc
index adeaf87f7fe..79a308a20a6 100644
--- a/storage/innobase/handler/handler0alter.cc
+++ b/storage/innobase/handler/handler0alter.cc
@@ -8764,6 +8764,7 @@ inline bool rollback_inplace_alter_table(Alter_inplace_info *ha_alter_info,
       ut_d(dict_table_check_for_dup_indexes(ctx->old_table, CHECK_ABORTED_OK));
     }
 
+    DEBUG_SYNC(ctx->trx->mysql_thd, "before_commit_rollback_inplace");
     commit_unlock_and_unlink(ctx->trx);
     if (fts_exist)
       purge_sys.resume_FTS();



 Comments   
Comment by Marko Mäkelä [ 2021-10-29 ]

I do not think that there is anything wrong with durability after all. The sleep 1 is sloppy, but if I change it to COMMIT (which is what would request durability), then the recovery would not crash.

Comment by Marko Mäkelä [ 2021-10-29 ]

I can force a redo log write (and provoke the recovery crash) with a debug variable:

connection default;
SET DEBUG_SYNC = 'now WAIT_FOR scanned';
BEGIN;
INSERT INTO t1 VALUES(2,1);
SET DEBUG_SYNC = 'now SIGNAL commit';
SET DEBUG_SYNC = 'now WAIT_FOR c';
# Force a redo log flush.
SET GLOBAL innodb_fil_make_page_dirty_debug=0;
 
--source include/kill_mysqld.inc

Generated at Thu Feb 08 09:49:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.