[MDEV-28779] ALTER TABLE IMPORT TABLESPACE fails with Data structure corruption Created: 2022-06-08  Updated: 2023-03-10  Resolved: 2022-06-09

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 10.3, 10.4
Fix Version/s: 10.3.36, 10.4.26

Type: Bug Priority: Major
Reporter: Sasha Pachev Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: not-10.5

Attachments: File t2.opt     File t2.patch     File t2.test    
Issue Links:
Relates
relates to MDEV-21132 Remove buf_page_t::newest_modification Closed

 Description   

Attached is the MTR test that demonstrates the bug. Also a patch with a proposed fix.

The problem is that during ALTER TABLE IMPORT TABLESPACE two things happen:

  • Regardless of whether the tablespace is encrypted, the current LSN (8 bytes) gets written at FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION
  • Then if the page is encrypted the first 4 bytes at that offset are being interpreted as key_version while the remainder as the CRC32 checksum. If the key_version is non-zero, CRC32 check of the page is performed (fil0import.cc:3599) . So if the LSN >= 1 << 32 we perform the check, which will most likely fail as we are comparing the lowest 4 bytes of the LSN against the computed CRC-32 checksum of the page thus resulting in the error message to the client.

I believe the right thing to do is to not write the LSN at FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION for encrypted pages. At least it fixes the problem at hand.



 Comments   
Comment by Marko Mäkelä [ 2022-06-09 ]

Thank you, spachev! I think that it is even simpler than that: the field should not be written at all. The history is:

  • On every page, InnoDB reserved 8 bytes at FIL_PAGE_FILE_FLUSH_LSN and excluded it from page checksums.
  • On shutdown, InnoDB used to update those 8 bytes in the first page of each file of the system tablespace, without using the doublewrite buffer or redo log. This risky write was removed (and this field lost all of its original meaning) in MDEV-27199.
  • Later, those 8 bytes were repurposed for various things, such as the SPATIAL INDEX R-tree "split sequence number" and the MariaDB encryption key ID. At that time, the logic was revised so that only the first page of the system tablespace would be updated on shutdown.
  • A conflict in the repurposing (inability to encrypt SPATIAL INDEX) motivated a file format change (innodb_checksum_algorithm=full_crc32, MDEV-12026).
  • In .ibd files, the field never had any special meaning.
Comment by Marko Mäkelä [ 2022-06-09 ]

If I got it right, the time-consuming operations on the table db1.t1 in t2.test are necessary for advancing the LSN. I tried a different way of reproducing this, by extending an existing test that uses a special trick to advance the LSN:

diff --git a/mysql-test/suite/mariabackup/huge_lsn.test b/mysql-test/suite/mariabackup/huge_lsn.test
index 3349ef40df8..98aa73cec4f 100644
--- a/mysql-test/suite/mariabackup/huge_lsn.test
+++ b/mysql-test/suite/mariabackup/huge_lsn.test
@@ -56,6 +56,12 @@ exec $XTRABACKUP  --prepare --target-dir=$targetdir;
 --source include/restart_and_restore.inc
 --enable_result_log
 SELECT * FROM t;
+FLUSH TABLE t FOR EXPORT;
+copy_file $_datadir/test/t.ibd $_datadir/test/t_copy.ibd;
+UNLOCK TABLES;
+ALTER TABLE t DISCARD TABLESPACE;
+move_file $_datadir/test/t_copy.ibd $_datadir/test/t.ibd;
+ALTER TABLE t IMPORT TABLESPACE;
 DROP TABLE t;
 rmdir $targetdir;
 let $targetdir= $targetdir_old;

With or without the patch, the IMPORT TABLESPACE would crash:

10.3 44ab6cba762e23a992e7c3bfd8f4319e56e08fe8

mysqltest: At line 64: query 'ALTER TABLE t IMPORT TABLESPACE' failed: 2013: Lost connection to MySQL server during query
mysqld: /mariadb/10.3/storage/innobase/fil/fil0crypt.cc:2696: bool fil_space_verify_crypt_checksum(const byte*, const page_size_t&): Assertion `mach_read_from_4(page + 26U)' failed.

In the current 10.6, the same modification to the test would cause a different type of failure:

10.6 892c426371b4be558d32fdeba7d1d56f46b40f2b

mysqltest: At line 66: query 'ALTER TABLE t IMPORT TABLESPACE' failed: ER_NOT_KEYFILE (1034): Index for table 't' is corrupt; try to repair it

I will try to fix that as well.

Comment by Marko Mäkelä [ 2022-06-09 ]

The problem with the above test is that DISCARD TABLESPACE forgets that the tablespace was default-encrypted. If I add explicit ENCRYPTED=YES to the CREATE TABLE statement, the file can be imported. Preserving the .cfg file did not help. This finally reproduced the problem as reported.

Comment by Marko Mäkelä [ 2022-06-09 ]

It turns out that as part of MDEV-21132 in 10.5.0 the function PageConverter::update_header() had already been rewritten to zero-initialize the field FIL_PAGE_FILE_FLUSH_LSN_OR_KEY_VERSION in the first page.

Generated at Thu Feb 08 10:03:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.