[MDEV-20931] ALTER...IMPORT can crash the server Created: 2019-10-30  Updated: 2021-09-30  Resolved: 2021-08-17

Status: Closed
Project: MariaDB Server
Component/s: Data Definition - Alter Table, Storage Engine - InnoDB
Affects Version/s: 10.2.21, 10.3.17
Fix Version/s: 10.2.41, 10.3.32, 10.4.22, 10.5.13, 10.6.5

Type: Bug Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Eugene Kosov (Inactive)
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
Blocks
is blocked by MDEV-21513 Fix some crashes in ALTER TABLE…IMPOR... Closed
Duplicate
duplicates MDEV-14342 Importing partial backup from XtraBac... Closed
Relates
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed
relates to MDEV-20974 Don't require .cfg files to import In... Closed

 Description   

When importing a tablespace, especially when just having the .ibd tablespace file but no .cfg file, the import may fail with messages like:

2019-10-14 15:35:43 250 [Note] InnoDB: Phase I - Update all pages
2019-10-14 15:35:44 250 [Note] InnoDB: Sync to disk
2019-10-14 15:35:44 250 [Note] InnoDB: Sync to disk - done!
2019-10-14 15:35:44 250 [Note] InnoDB: Phase II - Purge records from index `Index_name`
2019-10-14 15:35:44 250 [ERROR] [FATAL] InnoDB: Trying to read page number 65200640 in space 574155, space name db/table, which is outside the tablespace bounds. Byte offset 0, len 16384
191014 15:35:44 [ERROR] mysqld got signal 6 ;

In such cases the import should simply be aborted, and the table left in DISCARD state.

Crashing the table due to a failed import doesn't look like a good idea as it is clear that the detected corruption is local to the not yet successfully imported table only



 Comments   
Comment by Geoff Montee (Inactive) [ 2019-10-30 ]

I submitted the same problem as MDEV-14342 a while back, and it was closed as "Not a Bug", but it would probably be more ideal to fix the crash.

Comment by Eugene Kosov (Inactive) [ 2021-07-16 ]

https://github.com/mariadb/server/commits/bb-10.2-MDEV-20931-import-crash

Comment by Marko Mäkelä [ 2021-07-20 ]

I like the idea, but I would suggest 2 things:

  • Add the warn_unused_result attribute to btr_level_list_remove_func() and adjust all callers.
  • Provide a 10.5 or 10.6 version of the fix for stress testing. The main challenge should be porting the fil_io() change; the code was heavily refactored in MDEV-23855.
Comment by Eugene Kosov (Inactive) [ 2021-07-26 ]

mleich found no issues in https://github.com/MariaDB/server/tree/bb-10.5-MDEV-20931-import-crash

Other branches are:
https://github.com/MariaDB/server/tree/bb-10.2-MDEV-20931-import-crash
https://github.com/MariaDB/server/tree/bb-10.3-MDEV-20931-import-crash
https://github.com/MariaDB/server/tree/bb-10.4-MDEV-20931-import-crash

Comment by Marko Mäkelä [ 2021-07-27 ]

The 10.5 version looks fairly good (only the test innodb.import_corrupted is failing on Windows, possibly due to the .exe suffix).

But, please check and fix the failures on other versions. We seem to be missing some error handling cleanup here, possibly from an earlier test that ran on the same worker:

bb-10.2-MDEV-20931-import-crash

innodb.innodb-agregate 'innodb'          w4 [ fail ]
        Test ended at 2021-07-16 01:30:55
 
CURRENT_TEST: innodb.innodb-agregate
mysqltest: At line 7: query 'create table t2 (a smallint(6) not null, b int(10) not null, name varchar(20), primary key(a,b), key(name)) engine=InnoDB' failed: 1813: Tablespace for table '`test`.`t2`' exists. Please DISCARD the tablespace before IMPORT

In the 10.2 version, the return value of btr_level_list_remove_func() is not being checked, like it is in the other versions. Please double-check that the ports to the older versions correspond to the 10.5 version.

In buf_page_get_low(), I do not know if it is safe to acquire mutexes. I would expect at least the dict_sys mutexes to be above the buffer pool mutexes in the latching order.
Please run at least the InnoDB test suites with

./mtr --mysqld=--loose-innodb-sync-debug

to confirm this.

I do not see a need to look up the table in buf_page_get_low(). It should suffice to look up the tablespace. If the tablespace is missing or if space->purpose == FIL_TYPE_IMPORT, then we can be more lenient about out-of-bounds access. Could we let buf_read_page() return more information? That function should already look up the tablespace by itself.

Comment by Eugene Kosov (Inactive) [ 2021-07-30 ]

File cleanups are fixed.

--loose-innodb-sync-debug indeed showed a regression. Luckily, I've found FIL_TYPE_IMPORT and locking problem was fixed. This simplified code.

And I did nothing with `btr_level_list_remove_func()` because it's already has a necessary attribute in a header file and compiler ensures (it really does to me) checking return type.

Comment by Marko Mäkelä [ 2021-07-30 ]

Thank you, these look OK to me, and the different branches bb-10.

{2,3,4,5}

-MDEV-20931-import-crash now seem to correspond to each other.

But, I think that mleich needs to run stress test on one or more of these branches, because the fix is affecting functionality outside IMPORT. This is OK to push after such testing.

Generated at Thu Feb 08 09:03:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.