[MDEV-26936] Recovery crash on rolling back DELETE FROM SYS_INDEXES - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 10.6.5
Component/s: Storage Engine - InnoDB
Labels:
- crash
- recovery

Description

In yesterday’s stress tests by mleich we got a crash in a kill+restart test. I started a server on the saved data directory and got the following crash:

10.6 dbd6c6dc01228fe6e63f3f7dc695eb56ca8cd28d
…
2021-10-29 13:52:47 0 [Note] InnoDB: 128 rollback segments are active.
mariadbd: /mariadb/10.6m/storage/innobase/trx/trx0trx.cc:1221: void trx_t::evict_table(table_id_t, bool): Assertion `!locked \|\| (table->locks).start->trx == this' failed.

This occurred while we were rolling back a DELETE operation of a SYS_INDEXES record for which NAME='\xffuidx'. The special byte '\xff' indicates that it is a stub for ADD INDEX uidx. At the same time, we had recovered a DML transaction that is holding a lock on the same user table (t1). The assertion fails, because tables must never be evicted if other transactions are holding locks on them.

With some effort, I created a repeatable test case for this:

--source include/have_innodb.inc

# The embedded server tests do not support restarting.

--source include/not_embedded.inc

--source include/have_debug.inc

--source include/have_debug_sync.inc

connection default;

CREATE TABLE t1(a INT PRIMARY KEY, b INT) ENGINE=InnoDB;

INSERT INTO t1 VALUES(1,1);

connect ddl, localhost, root;

SET DEBUG_SYNC = 'row_merge_after_scan SIGNAL scanned WAIT_FOR commit';

SET DEBUG_SYNC = 'before_commit_rollback_inplace SIGNAL c WAIT_FOR ever';

send ALTER TABLE t1 ADD UNIQUE INDEX(b), ALGORITHM=INPLACE;

connection default;

SET DEBUG_SYNC = 'now WAIT_FOR scanned';

BEGIN;

INSERT INTO t1 VALUES(2,1);

SET DEBUG_SYNC = 'now SIGNAL commit';

SET DEBUG_SYNC = 'now WAIT_FOR c';

SET GLOBAL innodb_flush_log_at_trx_commit=1;

INSERT INTO t1 VALUES(3,3);

sleep 1;

--source include/kill_mysqld.inc

disconnect ddl;

--source include/start_mysqld.inc

CHECK TABLE t1;

SHOW CREATE TABLE t1;

SELECT * FROM t1;

DROP TABLE t1;

Note: I have no idea why that sleep 1 is needed. I suspect ~~MDEV-26789~~ or some related changes. Is our durability broken now?
The test requires the following synchronization point:

diff --git a/storage/innobase/handler/handler0alter.cc b/storage/innobase/handler/handler0alter.cc

index adeaf87f7fe..79a308a20a6 100644

--- a/storage/innobase/handler/handler0alter.cc

+++ b/storage/innobase/handler/handler0alter.cc

@@ -8764,6 +8764,7 @@ inline bool rollback_inplace_alter_table(Alter_inplace_info *ha_alter_info,

       ut_d(dict_table_check_for_dup_indexes(ctx->old_table, CHECK_ABORTED_OK));

+    DEBUG_SYNC(ctx->trx->mysql_thd, "before_commit_rollback_inplace");

     commit_unlock_and_unlink(ctx->trx);

     if (fts_exist)

       purge_sys.resume_FTS();

Attachments

Issue Links

is caused by

MDEV-25180 Atomic ALTER TABLE

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2021-10-29 12:43

Updated:: 2021-11-01 18:02

Resolved:: 2021-10-29 13:33

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.