Details

    Description

      When importing a tablespace, especially when just having the .ibd tablespace file but no .cfg file, the import may fail with messages like:

      2019-10-14 15:35:43 250 [Note] InnoDB: Phase I - Update all pages
      2019-10-14 15:35:44 250 [Note] InnoDB: Sync to disk
      2019-10-14 15:35:44 250 [Note] InnoDB: Sync to disk - done!
      2019-10-14 15:35:44 250 [Note] InnoDB: Phase II - Purge records from index `Index_name`
      2019-10-14 15:35:44 250 [ERROR] [FATAL] InnoDB: Trying to read page number 65200640 in space 574155, space name db/table, which is outside the tablespace bounds. Byte offset 0, len 16384
      191014 15:35:44 [ERROR] mysqld got signal 6 ;
      

      In such cases the import should simply be aborted, and the table left in DISCARD state.

      Crashing the table due to a failed import doesn't look like a good idea as it is clear that the detected corruption is local to the not yet successfully imported table only

      Attachments

        Issue Links

          Activity

            hholzgra Hartmut Holzgraefe created issue -
            GeoffMontee Geoff Montee (Inactive) made changes -
            Field Original Value New Value
            Assignee Marko Mäkelä [ marko ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            Affects Version/s 10.2.21 [ 23213 ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            GeoffMontee Geoff Montee (Inactive) made changes -

            I submitted the same problem as MDEV-14342 a while back, and it was closed as "Not a Bug", but it would probably be more ideal to fix the crash.

            GeoffMontee Geoff Montee (Inactive) added a comment - I submitted the same problem as MDEV-14342 a while back, and it was closed as "Not a Bug", but it would probably be more ideal to fix the crash.
            GeoffMontee Geoff Montee (Inactive) made changes -
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Eugene Kosov [ kevg ]
            marko Marko Mäkelä made changes -
            serg Sergei Golubchik made changes -
            Summary ALTER...IMPORT can crash the serve ALTER...IMPORT can crash the server
            serg Sergei Golubchik made changes -
            julien.fritsch Julien Fritsch made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow
            rob.schwyzer@mariadb.com Rob Schwyzer (Inactive) made changes -
            Labels ServiceNow 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z
            kevg Eugene Kosov (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            kevg Eugene Kosov (Inactive) added a comment - https://github.com/mariadb/server/commits/bb-10.2-MDEV-20931-import-crash
            kevg Eugene Kosov (Inactive) made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            I like the idea, but I would suggest 2 things:

            • Add the warn_unused_result attribute to btr_level_list_remove_func() and adjust all callers.
            • Provide a 10.5 or 10.6 version of the fix for stress testing. The main challenge should be porting the fil_io() change; the code was heavily refactored in MDEV-23855.
            marko Marko Mäkelä added a comment - I like the idea, but I would suggest 2 things: Add the warn_unused_result attribute to btr_level_list_remove_func() and adjust all callers. Provide a 10.5 or 10.6 version of the fix for stress testing. The main challenge should be porting the fil_io() change; the code was heavily refactored in MDEV-23855 .
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Eugene Kosov [ kevg ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            kevg Eugene Kosov (Inactive) added a comment - - edited mleich found no issues in https://github.com/MariaDB/server/tree/bb-10.5-MDEV-20931-import-crash Other branches are: https://github.com/MariaDB/server/tree/bb-10.2-MDEV-20931-import-crash https://github.com/MariaDB/server/tree/bb-10.3-MDEV-20931-import-crash https://github.com/MariaDB/server/tree/bb-10.4-MDEV-20931-import-crash
            kevg Eugene Kosov (Inactive) made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            The 10.5 version looks fairly good (only the test innodb.import_corrupted is failing on Windows, possibly due to the .exe suffix).

            But, please check and fix the failures on other versions. We seem to be missing some error handling cleanup here, possibly from an earlier test that ran on the same worker:

            bb-10.2-MDEV-20931-import-crash

            innodb.innodb-agregate 'innodb'          w4 [ fail ]
                    Test ended at 2021-07-16 01:30:55
             
            CURRENT_TEST: innodb.innodb-agregate
            mysqltest: At line 7: query 'create table t2 (a smallint(6) not null, b int(10) not null, name varchar(20), primary key(a,b), key(name)) engine=InnoDB' failed: 1813: Tablespace for table '`test`.`t2`' exists. Please DISCARD the tablespace before IMPORT
            

            In the 10.2 version, the return value of btr_level_list_remove_func() is not being checked, like it is in the other versions. Please double-check that the ports to the older versions correspond to the 10.5 version.

            In buf_page_get_low(), I do not know if it is safe to acquire mutexes. I would expect at least the dict_sys mutexes to be above the buffer pool mutexes in the latching order.
            Please run at least the InnoDB test suites with

            ./mtr --mysqld=--loose-innodb-sync-debug
            

            to confirm this.

            I do not see a need to look up the table in buf_page_get_low(). It should suffice to look up the tablespace. If the tablespace is missing or if space->purpose == FIL_TYPE_IMPORT, then we can be more lenient about out-of-bounds access. Could we let buf_read_page() return more information? That function should already look up the tablespace by itself.

            marko Marko Mäkelä added a comment - The 10.5 version looks fairly good (only the test innodb.import_corrupted is failing on Windows, possibly due to the .exe suffix). But, please check and fix the failures on other versions. We seem to be missing some error handling cleanup here , possibly from an earlier test that ran on the same worker: bb-10.2-MDEV-20931-import-crash innodb.innodb-agregate 'innodb' w4 [ fail ] Test ended at 2021-07-16 01:30:55   CURRENT_TEST: innodb.innodb-agregate mysqltest: At line 7: query 'create table t2 (a smallint(6) not null, b int(10) not null, name varchar(20), primary key(a,b), key(name)) engine=InnoDB' failed: 1813: Tablespace for table '`test`.`t2`' exists. Please DISCARD the tablespace before IMPORT In the 10.2 version, the return value of btr_level_list_remove_func() is not being checked, like it is in the other versions. Please double-check that the ports to the older versions correspond to the 10.5 version. In buf_page_get_low() , I do not know if it is safe to acquire mutexes. I would expect at least the dict_sys mutexes to be above the buffer pool mutexes in the latching order. Please run at least the InnoDB test suites with ./mtr --mysqld=--loose-innodb-sync-debug to confirm this. I do not see a need to look up the table in buf_page_get_low() . It should suffice to look up the tablespace. If the tablespace is missing or if space->purpose == FIL_TYPE_IMPORT , then we can be more lenient about out-of-bounds access. Could we let buf_read_page() return more information? That function should already look up the tablespace by itself.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Eugene Kosov [ kevg ]
            Status In Review [ 10002 ] Stalled [ 10000 ]

            File cleanups are fixed.

            --loose-innodb-sync-debug indeed showed a regression. Luckily, I've found FIL_TYPE_IMPORT and locking problem was fixed. This simplified code.

            And I did nothing with `btr_level_list_remove_func()` because it's already has a necessary attribute in a header file and compiler ensures (it really does to me) checking return type.

            kevg Eugene Kosov (Inactive) added a comment - File cleanups are fixed. --loose-innodb-sync-debug indeed showed a regression. Luckily, I've found FIL_TYPE_IMPORT and locking problem was fixed. This simplified code. And I did nothing with `btr_level_list_remove_func()` because it's already has a necessary attribute in a header file and compiler ensures (it really does to me) checking return type.
            kevg Eugene Kosov (Inactive) made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            Thank you, these look OK to me, and the different branches bb-10.

            {2,3,4,5}

            -MDEV-20931-import-crash now seem to correspond to each other.

            But, I think that mleich needs to run stress test on one or more of these branches, because the fix is affecting functionality outside IMPORT. This is OK to push after such testing.

            marko Marko Mäkelä added a comment - Thank you, these look OK to me, and the different branches bb-10. {2,3,4,5} - MDEV-20931 -import-crash now seem to correspond to each other. But, I think that mleich needs to run stress test on one or more of these branches, because the fix is affecting functionality outside IMPORT. This is OK to push after such testing.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Eugene Kosov [ kevg ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            serg Sergei Golubchik made changes -
            Labels 76qDvLB8Gju6Hs7nk3VY3EX42G795W5z
            kevg Eugene Kosov (Inactive) made changes -
            Fix Version/s 10.2.41 [ 26032 ]
            Fix Version/s 10.3.32 [ 26029 ]
            Fix Version/s 10.4.22 [ 26031 ]
            Fix Version/s 10.5.13 [ 26026 ]
            Fix Version/s 10.6.5 [ 26034 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 100702 ] MariaDB v4 [ 156921 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 201658 120960 201896
            Zendesk active tickets 201658 201896
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk active tickets 201658 201896 201896

            People

              kevg Eugene Kosov (Inactive)
              hholzgra Hartmut Holzgraefe
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.