[MDEV-25662] Atomic DDL: InnoDB: Expected tablespace id ... but found 0 upon crash recovery Created: 2021-05-12  Updated: 2021-06-10  Resolved: 2021-06-10

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: N/A
Fix Version/s: 10.6.2

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: rr-profile

Issue Links:
Relates
relates to MDEV-17567 Atomic DDL Closed

 Description   

The failure comes in two flavors.

One looks like this:

bb-10.6-monty-innodb f179fdefd1

2021-05-11 19:14:17 0 [ERROR] InnoDB: Expected tablespace id 67 but found 0 in the file ./test/tt13.ibd
2021-05-11 19:14:17 0 [Note] DDL_LOG: Crash recovery executed 2 entries

That is, the error message is thrown once, and then the recovery finishes. I'm not sure whether it's a problem or a bogus message here.

Another one looks like this:

2021-05-12  3:39:01 0 [Note] InnoDB: Creating #2 encryption thread id 140169671862016 total threads 4.
2021-05-12  3:39:01 0 [Note] InnoDB: Creating #3 encryption thread id 140169663469312 total threads 4.
2021-05-12  3:39:01 0 [Note] InnoDB: Creating #4 encryption thread id 140169655076608 total threads 4.
2021-05-12  3:39:01 0 [Note] InnoDB: Buffer pool(s) load completed at 210512  3:39:01
2021-05-12  3:39:01 0 [ERROR] InnoDB: Expected tablespace id 10 but found 0 in the file ./test/D.ibd
2021-05-12  3:39:01 0 [ERROR] InnoDB: Expected tablespace id 10 but found 0 in the file ./test/D.ibd
2021-05-12  3:39:01 0 [ERROR] InnoDB: Expected tablespace id 10 but found 0 in the file ./test/D.ibd
...
2021-05-12  3:39:01 0 [ERROR] InnoDB: Expected tablespace id 10 but found 0 in the file ./test/D.ibd

Thousands of identical messages are produced within a second. I don't know if it's an endless loop or it would finish eventually, as the test was killing the server after encountering those.

In the first case, the server was running DDL in 2 threads when it was sigkilled. The last queries according to the general log were

27 Query    DROP /* QNO 16561 CON_ID 27 */ TABLE IF EXISTS vv11, tt13
26 Query    ALTER /* QNO 507 CON_ID 26 */ TABLE t3 ENGINE=InnoDB, ADD COLUMN IF NOT EXISTS icol3 MEDIUMINT UNSIGNED ZEROFILL DEFAULT 2, LOCK=SHARED

In the second case, a previously stored backup from MariaBackup was being restored.

Logs and data directories are available for both cases. The rr profiles are available for the first one.

At least the recovery failure on previously created datadir from case 1 is reproducible on bb-10.6-monty-innodb 98ff31770.



 Comments   
Comment by Marko Mäkelä [ 2021-05-12 ]

Neither trace in the -1 directory works for me. The more interesting one (the recovery) ends prematurely:

[ERROR /home/roc/rr/rr/src/TraceStream.cc:1122:read_mapped_region()] Metadata of /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/sql/mariadbd changed: replay divergence likely, but continuing anyway. inode: 81948310/83127419; mode: 33277/33277; uid: 1001/1001; gid: 1001/1001; size: 181508408/181508408; mtime: 1620826716/1620752489
Version: '10.6.1-MariaDB-debug-log'  socket: '/home/mdbe/atomic_ddl/workdirs/worker1/job14/current1_1/mysql.sock'  port: 10510  Source distribution
[FATAL /home/roc/rr/rr/src/ReplaySession.cc:762:guard_unexpected_signal()] 
 (task 36504 (rec:19071) at time 136822)
 -> Assertion `false' failed to hold. Replay got unrecorded signal SIGSEGV while awaiting signal

In the -2 directory I see no rr replay traces.
But, the backup directories or a copy of the data directory could be enough to diagnose this.

Comment by Elena Stepanova [ 2021-05-12 ]

marko,

I've checked the traces in -1 directory.
The first one (before recovery) seems to work, the plugin error flood aside. At least it goes all the way to SIGKILL, as it should. I've also created a packed copy, it seems to help to get rid of the annoying errors.
The second one (recovery) indeed doesn't work, but it shouldn't matter. Recovery fails reliably on the provided datadir, with the new binaries too, so you can always get a new one. I've created one for your convenience, rr_upon_recovery, along with its packed copy.

Comment by Marko Mäkelä [ 2021-05-14 ]

Thank you, the -1 directory works for me now. The server was killed in the middle of DROP /* QNO 16561 CON_ID 27 */ TABLE IF EXISTS vv11, tt13. The transaction for removing persistent statistics was durably committed:

#12 0x00005606c4e5840a in dict_stats_drop_table (
    db_and_table=0x7fe7c523c630 "test/tt13", 
    errstr=0x7fe7c523c190 "\300\301#\305\347\177", errstr_sz=1024)
    at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/dict/dict0stats.cc:3606
#13 0x00005606c4cb9ba3 in row_drop_table_for_mysql (
    name=0x7fe7c523c630 "test/tt13", trx=0x49327cd686d8, 
    sqlcom=SQLCOM_DROP_TABLE, create_failed=false, nonatomic=true)
    at /home/mdbe/atomic_ddl/bb-10.6-monty-for-rr/storage/innobase/row/row0mysql.cc:3332

The call returned to row_drop_table_for_mysql(), but row_drop_table_for_mysql() did not finish before the SIGKILL arrived. The SIGKILL arrived during the execution of fil_delete_tablespace().

This seems to be an omission in MDEV-18518: We are attempting to delete the file before the transaction has been committed! In reality, only the deletion of persistent statistics for the table had been committed (in a separate transaction, due to bad design that we inherited from MySQL), but the actual DROP TABLE was not.

Comment by Marko Mäkelä [ 2021-05-14 ]

Correction: the problem is an omission in MDEV-25506, which I reopened. We must not delete files before the transaction is committed.

Comment by Marko Mäkelä [ 2021-06-10 ]

During the testing of the third part of MDEV-25506 (which was pushed to 10.6 yesterday), this type of failure was no longer found. Before that fix, we would delete .ibd files already before the DDL transaction was committed. So, if the server had been killed between the deletion and the redo log write of the commit, the DDL transaction would be rolled back, but the file would be gone.

Generated at Thu Feb 08 09:39:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.