Details

    Description

      When importing a tablespace, especially when just having the .ibd tablespace file but no .cfg file, the import may fail with messages like:

      2019-10-14 15:35:43 250 [Note] InnoDB: Phase I - Update all pages
      2019-10-14 15:35:44 250 [Note] InnoDB: Sync to disk
      2019-10-14 15:35:44 250 [Note] InnoDB: Sync to disk - done!
      2019-10-14 15:35:44 250 [Note] InnoDB: Phase II - Purge records from index `Index_name`
      2019-10-14 15:35:44 250 [ERROR] [FATAL] InnoDB: Trying to read page number 65200640 in space 574155, space name db/table, which is outside the tablespace bounds. Byte offset 0, len 16384
      191014 15:35:44 [ERROR] mysqld got signal 6 ;
      

      In such cases the import should simply be aborted, and the table left in DISCARD state.

      Crashing the table due to a failed import doesn't look like a good idea as it is clear that the detected corruption is local to the not yet successfully imported table only

      Attachments

        Issue Links

          Activity

            I submitted the same problem as MDEV-14342 a while back, and it was closed as "Not a Bug", but it would probably be more ideal to fix the crash.

            GeoffMontee Geoff Montee (Inactive) added a comment - I submitted the same problem as MDEV-14342 a while back, and it was closed as "Not a Bug", but it would probably be more ideal to fix the crash.
            kevg Eugene Kosov (Inactive) added a comment - https://github.com/mariadb/server/commits/bb-10.2-MDEV-20931-import-crash

            I like the idea, but I would suggest 2 things:

            • Add the warn_unused_result attribute to btr_level_list_remove_func() and adjust all callers.
            • Provide a 10.5 or 10.6 version of the fix for stress testing. The main challenge should be porting the fil_io() change; the code was heavily refactored in MDEV-23855.
            marko Marko Mäkelä added a comment - I like the idea, but I would suggest 2 things: Add the warn_unused_result attribute to btr_level_list_remove_func() and adjust all callers. Provide a 10.5 or 10.6 version of the fix for stress testing. The main challenge should be porting the fil_io() change; the code was heavily refactored in MDEV-23855 .
            kevg Eugene Kosov (Inactive) added a comment - - edited mleich found no issues in https://github.com/MariaDB/server/tree/bb-10.5-MDEV-20931-import-crash Other branches are: https://github.com/MariaDB/server/tree/bb-10.2-MDEV-20931-import-crash https://github.com/MariaDB/server/tree/bb-10.3-MDEV-20931-import-crash https://github.com/MariaDB/server/tree/bb-10.4-MDEV-20931-import-crash

            The 10.5 version looks fairly good (only the test innodb.import_corrupted is failing on Windows, possibly due to the .exe suffix).

            But, please check and fix the failures on other versions. We seem to be missing some error handling cleanup here, possibly from an earlier test that ran on the same worker:

            bb-10.2-MDEV-20931-import-crash

            innodb.innodb-agregate 'innodb'          w4 [ fail ]
                    Test ended at 2021-07-16 01:30:55
             
            CURRENT_TEST: innodb.innodb-agregate
            mysqltest: At line 7: query 'create table t2 (a smallint(6) not null, b int(10) not null, name varchar(20), primary key(a,b), key(name)) engine=InnoDB' failed: 1813: Tablespace for table '`test`.`t2`' exists. Please DISCARD the tablespace before IMPORT
            

            In the 10.2 version, the return value of btr_level_list_remove_func() is not being checked, like it is in the other versions. Please double-check that the ports to the older versions correspond to the 10.5 version.

            In buf_page_get_low(), I do not know if it is safe to acquire mutexes. I would expect at least the dict_sys mutexes to be above the buffer pool mutexes in the latching order.
            Please run at least the InnoDB test suites with

            ./mtr --mysqld=--loose-innodb-sync-debug
            

            to confirm this.

            I do not see a need to look up the table in buf_page_get_low(). It should suffice to look up the tablespace. If the tablespace is missing or if space->purpose == FIL_TYPE_IMPORT, then we can be more lenient about out-of-bounds access. Could we let buf_read_page() return more information? That function should already look up the tablespace by itself.

            marko Marko Mäkelä added a comment - The 10.5 version looks fairly good (only the test innodb.import_corrupted is failing on Windows, possibly due to the .exe suffix). But, please check and fix the failures on other versions. We seem to be missing some error handling cleanup here , possibly from an earlier test that ran on the same worker: bb-10.2-MDEV-20931-import-crash innodb.innodb-agregate 'innodb' w4 [ fail ] Test ended at 2021-07-16 01:30:55   CURRENT_TEST: innodb.innodb-agregate mysqltest: At line 7: query 'create table t2 (a smallint(6) not null, b int(10) not null, name varchar(20), primary key(a,b), key(name)) engine=InnoDB' failed: 1813: Tablespace for table '`test`.`t2`' exists. Please DISCARD the tablespace before IMPORT In the 10.2 version, the return value of btr_level_list_remove_func() is not being checked, like it is in the other versions. Please double-check that the ports to the older versions correspond to the 10.5 version. In buf_page_get_low() , I do not know if it is safe to acquire mutexes. I would expect at least the dict_sys mutexes to be above the buffer pool mutexes in the latching order. Please run at least the InnoDB test suites with ./mtr --mysqld=--loose-innodb-sync-debug to confirm this. I do not see a need to look up the table in buf_page_get_low() . It should suffice to look up the tablespace. If the tablespace is missing or if space->purpose == FIL_TYPE_IMPORT , then we can be more lenient about out-of-bounds access. Could we let buf_read_page() return more information? That function should already look up the tablespace by itself.

            File cleanups are fixed.

            --loose-innodb-sync-debug indeed showed a regression. Luckily, I've found FIL_TYPE_IMPORT and locking problem was fixed. This simplified code.

            And I did nothing with `btr_level_list_remove_func()` because it's already has a necessary attribute in a header file and compiler ensures (it really does to me) checking return type.

            kevg Eugene Kosov (Inactive) added a comment - File cleanups are fixed. --loose-innodb-sync-debug indeed showed a regression. Luckily, I've found FIL_TYPE_IMPORT and locking problem was fixed. This simplified code. And I did nothing with `btr_level_list_remove_func()` because it's already has a necessary attribute in a header file and compiler ensures (it really does to me) checking return type.

            Thank you, these look OK to me, and the different branches bb-10.

            {2,3,4,5}

            -MDEV-20931-import-crash now seem to correspond to each other.

            But, I think that mleich needs to run stress test on one or more of these branches, because the fix is affecting functionality outside IMPORT. This is OK to push after such testing.

            marko Marko Mäkelä added a comment - Thank you, these look OK to me, and the different branches bb-10. {2,3,4,5} - MDEV-20931 -import-crash now seem to correspond to each other. But, I think that mleich needs to run stress test on one or more of these branches, because the fix is affecting functionality outside IMPORT. This is OK to push after such testing.

            People

              kevg Eugene Kosov (Inactive)
              hholzgra Hartmut Holzgraefe
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.