Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-13407

innodb.drop_table_background failed in buildbot with "Tablespace for table exists"

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/work-amd64-valgrind/builds/10084/steps/test/logs/stdio

      innodb.drop_table_background 'xtradb'    w3 [ fail ]
              Test ended at 2017-06-09 06:04:25
       
      CURRENT_TEST: innodb.drop_table_background
      mysqltest: At line 29: query 'CREATE TABLE t (a INT) ENGINE=InnoDB' failed: 1813: Tablespace for table '`test`.`t`' exists. Please DISCARD the tablespace before IMPORT.
       
      The result from queries just before the failure was:
      CREATE TABLE t(c0 SERIAL, c1 INT, c2 INT, c3 INT, c4 INT,
      KEY(c1), KEY(c2), KEY(c2,c1),
      KEY(c3), KEY(c3,c1), KEY(c3,c2), KEY(c3,c2,c1),
      KEY(c4), KEY(c4,c1), KEY(c4,c2), KEY(c4,c2,c1),
      KEY(c4,c3), KEY(c4,c3,c1), KEY(c4,c3,c2), KEY(c4,c3,c2,c1)) ENGINE=InnoDB;
      SET DEBUG_DBUG='+d,row_drop_table_add_to_background';
      DROP TABLE t;
      CREATE TABLE t (a INT) ENGINE=InnoDB;
      

      Note: The failure happened on the valgrind builder. Either because it's valgrind, or on some other reason, but the server didn't shutdown properly on restart before running the failed command – possibly it is the reason.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä added a comment - - edited

            The background DROP TABLE queue is not persistent and thus not crash-safe.

            I think that the likely explanation is that with Valgrind, the shutdown_server timeout was exceeded and the server process was killed before the DROP TABLE queue was persistently emptied. Or the test was run multiple times, and the process was killed before table t from the previous test was actually dropped.

            A fix would be to rename the table to an intermediate name before adding it to the queue, and on server startup, drop the tables that match the intermediate name pattern. That is what I am going to do in bb-10.2-ext and 10.3. The intermediate name prefix that I am going to use is #sql-ib, which starting with MDEV-14378 is only used for InnoDB internal tables. In MariaDB 10.x before MDEV-14378, it can happen that during ALTER TABLE…ALGORITHM=INPLACE that is rebuilding the table, both copies of the table will temporarily carry a name that starts with #sql-ib.

            A more comprehensive fix would be to remove the background DROP TABLE queue and the concept of "DDL transactions" from InnoDB altogether. This would require replacing the dict_operation_lock with meta-data locks (MDL), also in internal InnoDB operations.

            marko Marko Mäkelä added a comment - - edited The background DROP TABLE queue is not persistent and thus not crash-safe. I think that the likely explanation is that with Valgrind, the shutdown_server timeout was exceeded and the server process was killed before the DROP TABLE queue was persistently emptied. Or the test was run multiple times, and the process was killed before table t from the previous test was actually dropped. A fix would be to rename the table to an intermediate name before adding it to the queue, and on server startup, drop the tables that match the intermediate name pattern. That is what I am going to do in bb-10.2-ext and 10.3. The intermediate name prefix that I am going to use is #sql-ib , which starting with MDEV-14378 is only used for InnoDB internal tables. In MariaDB 10.x before MDEV-14378 , it can happen that during ALTER TABLE…ALGORITHM=INPLACE that is rebuilding the table, both copies of the table will temporarily carry a name that starts with #sql-ib . A more comprehensive fix would be to remove the background DROP TABLE queue and the concept of "DDL transactions" from InnoDB altogether. This would require replacing the dict_operation_lock with meta-data locks (MDL), also in internal InnoDB operations.

            I plan to implement the following solution:

            1. Never break locks on DROP TABLE. Instead, use the background DROP TABLE queue.
            2. Rename the table to a temporary name and add the table to the background queue.
            3. On non-slow shutdown, stop processing the background queue.
            4. On server restart, drop all tables that match the temporary name prefix #sql-ib. (This is also part of MDEV-14585.)

            It seems to me that there could be cases where the background DROP TABLE gives up on the first failed attempt when it is trying to drop a table. I would rather keep the tables in the queue until they are dropped. And I will identify the tables by id, not name.

            marko Marko Mäkelä added a comment - I plan to implement the following solution: Never break locks on DROP TABLE . Instead, use the background DROP TABLE queue. Rename the table to a temporary name and add the table to the background queue. On non-slow shutdown, stop processing the background queue. On server restart, drop all tables that match the temporary name prefix #sql-ib . (This is also part of MDEV-14585 .) It seems to me that there could be cases where the background DROP TABLE gives up on the first failed attempt when it is trying to drop a table. I would rather keep the tables in the queue until they are dropped. And I will identify the tables by id, not name.

            My fix depends on MDEV-14378, because ALTER TABLE must not rename both copies of the table to something that starts with #sql-ib, to be remoevd at server startup.

            marko Marko Mäkelä added a comment - My fix depends on MDEV-14378 , because ALTER TABLE must not rename both copies of the table to something that starts with #sql-ib , to be remoevd at server startup.

            The test failure still needs to be fixed in previous versions, even if it's by disabling the test or parts of it.

            elenst Elena Stepanova added a comment - The test failure still needs to be fixed in previous versions, even if it's by disabling the test or parts of it.

            In 10.0, 10.1, 10.2 we can make the background DROP TABLE queue more robust, by using table IDs instead of names, and by ensuring that tables will never be removed from the queue until they have been dropped. (That is, do not implement the renaming or the dropping on startup.)

            marko Marko Mäkelä added a comment - In 10.0, 10.1, 10.2 we can make the background DROP TABLE queue more robust, by using table IDs instead of names, and by ensuring that tables will never be removed from the queue until they have been dropped. (That is, do not implement the renaming or the dropping on startup.)

            I ported a minimal version of the fix to 10.0. In GA versions (before MariaDB 10.3) we will not rename tables before adding them to the background drop queue. But we should keep tables in the queue until dropping them has succeeded.

            marko Marko Mäkelä added a comment - I ported a minimal version of the fix to 10.0. In GA versions (before MariaDB 10.3) we will not rename tables before adding them to the background drop queue. But we should keep tables in the queue until dropping them has succeeded.

            As part of MDEV-14585, tables will be renamed before they are added to the queue, and thus the background DROP TABLE becomes crash-safe and as transactional as it can get. That fix was backported to MariaDB Server 10.2.19.

            marko Marko Mäkelä added a comment - As part of MDEV-14585 , tables will be renamed before they are added to the queue, and thus the background DROP TABLE becomes crash-safe and as transactional as it can get. That fix was backported to MariaDB Server 10.2.19.

            People

              marko Marko Mäkelä
              elenst Elena Stepanova
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.