Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25852

Orphan #sql*.ibd files are occasionally left behind after killed ALTER TABLE

Details

    Description

      Even with the third fix related to MDEV-25506 (to never delete .ibd files before the DDL operation has been committed), we still have occasional result difference of the following test, which is based on innodb_fts.crash_recovery,release:

      --source include/have_innodb.inc
      --source include/not_embedded.inc
      FLUSH TABLES;
       
      --connect(ddl1, localhost, root,,)
      CREATE TABLE t1(a TEXT,b TEXT,INDEX(a(5))) ENGINE=InnoDB;
      send ALTER TABLE t1 ADD INDEX(b(5));
      --connect(ddl2, localhost, root,,)
      CREATE TABLE t2(a TEXT,b TEXT,INDEX(a(5))) ENGINE=InnoDB;
      send ALTER TABLE t2 DROP INDEX a, ADD INDEX(b(5)), FORCE;
      --connect(ddl3, localhost, root,,)
      CREATE TABLE t3(a TEXT,b TEXT,INDEX(a(5))) ENGINE=InnoDB;
      send ALTER TABLE t3 DROP INDEX a, ADD INDEX(b(5)), ALGORITHM=COPY;
       
      --connection default
      let $shutdown_timeout=0;
      --source include/shutdown_mysqld.inc
      --exec tar czf $datadir/../dd.tar.gz -C $datadir .
      --source include/start_mysqld.inc
       
      disconnect ddl1;
      disconnect ddl2;
      disconnect ddl3;
       
      DROP TABLE t1,t2,t3;
       
      SELECT * FROM information_schema.innodb_sys_tables WHERE name LIKE 'test/%';
      let $datadir=`select @@datadir`;
      list_files $datadir/test *.ibd;
      

      Occasionally, the list_files will list a file #sql-ib*.ibd or #sql-backup-*.ibd. If that last line is commented out, the test will continue, and the discrepancy will be reported by check-testcase (which does not count as a failure for mtr). Here are a couple of examples:

      --- /dev/shm/10.6/mysql-test/var/13/tmp/check-mysqld_1.result	2021-06-03 16:38:44.877394351 +0300
      +++ /dev/shm/10.6/mysql-test/var/13/tmp/check-mysqld_1.reject	2021-06-03 16:38:45.473401026 +0300
      @@ -1097,3 +1097,4 @@
       partition	1.0	DISABLED	STORAGE ENGINE	100602.0	NULL	NULL	Mikael Ronstrom, MySQL AB	Partition Storage Engine Helper	GPL	OFF	Stable	1.0
       VARIABLE_NAME	VARIABLE_VALUE
       DEBUG_SYNC	ON - current signal: ''
      +#sql-ib168.ibd
       
      mysqltest: Result length mismatch
      

      and

      --- /dev/shm/10.6/mysql-test/var/25/tmp/check-mysqld_1.result	2021-06-03 16:38:31.953249791 +0300
      +++ /dev/shm/10.6/mysql-test/var/25/tmp/check-mysqld_1.reject	2021-06-03 16:38:32.509256004 +0300
      @@ -1097,3 +1097,4 @@
       partition	1.0	DISABLED	STORAGE ENGINE	100602.0	NULL	NULL	Mikael Ronstrom, MySQL AB	Partition Storage Engine Helper	GPL	OFF	Stable	1.0
       VARIABLE_NAME	VARIABLE_VALUE
       DEBUG_SYNC	ON - current signal: ''
      +#sql-backup-387caf-a.ibd
       
      mysqltest: Result length mismatch
      

      Nothing else is failing. Attached is a copy of a data directory where we end up with an orphan file. The following patch would fix that, but as collateral damage, we would fail in a different way, by prematurely deleting a file t2.ibd during the test:

      innodb_fts.cr 'innodb'                   w2 [ fail ]  Found warnings/errors in server log file!
              Test ended at 2021-06-03 17:53:36
      line
      2021-06-03 17:53:36 3 [ERROR] InnoDB: Operating system error number 2 in a file operation.
      2021-06-03 17:53:36 3 [ERROR] InnoDB: The error means the system cannot find the path specified.
      2021-06-03 17:53:36 3 [ERROR] InnoDB: File ./test/t2.ibd: 'delete' returned OS error 71.
      

      Here is my incorrect patch:

      diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
      index 12ec2e29bfb..21f9ea86acc 100644
      --- a/storage/innobase/log/log0recv.cc
      +++ b/storage/innobase/log/log0recv.cc
      @@ -597,6 +597,8 @@ static struct
           lsn_t lsn;
           /** File name from the FILE_ record */
           std::string file_name;
      +    /** whether a FILE_DELETE record was encountered */
      +    mutable bool deleted;
         };
       
         using map= std::map<const uint32_t, item, std::less<const uint32_t>,
      @@ -638,7 +640,7 @@ static struct
       
           char *fil_path= fil_make_filepath(nullptr, {filename, strlen(filename)},
                                             IBD, false);
      -    const item defer= {lsn, fil_path};
      +    const item defer= {lsn, fil_path, false};
           auto p= defers.emplace(space, defer);
           if (!p.second && p.first->second.lsn <= defer.lsn)
             p.first->second= defer;
      @@ -1034,9 +1036,10 @@ fil_name_process(char* name, ulint len, ulint space_id,
       
       	if (deleted) {
       		/* Got FILE_DELETE */
      -
      -		deferred_spaces.remove(
      -			static_cast<uint32_t>(space_id));
      +		if (auto d = deferred_spaces.find(static_cast<uint32_t>(
      +							  space_id))) {
      +			d->deleted = true;
      +		}
       		if (!p.second && f.status != file_name_t::DELETED) {
       			f.status = file_name_t::DELETED;
       			if (f.space != NULL) {
      

      Attachments

        1. data_copy.tar.bz2
          957 kB
        2. dd.tar.gz
          625 kB
        3. dd-t2.tar.gz
          588 kB

        Issue Links

          Activity

            dd.tar.gz is a data directory for which the orphan file would remain if you do not apply my patch to the bb-10.6-MDEV-25506 branch.
            dd-t2.tar.gz is a data directory where recovery would wrongly delete the file test/t2.ibd if you apply my patch.

            marko Marko Mäkelä added a comment - dd.tar.gz is a data directory for which the orphan file would remain if you do not apply my patch to the bb-10.6- MDEV-25506 branch. dd-t2.tar.gz is a data directory where recovery would wrongly delete the file test/t2.ibd if you apply my patch.

            I debugged dd-t2.tar.gz with the patch a little further.- We delete the wrong file here:

            #3  fil_delete_tablespace (id=id@entry=6)
                at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1699
            #4  0x000055d4764adc9a in row_purge_remove_clust_if_poss_low (
                node=node@entry=0x55d478d32db8, mode=<optimized out>, mode@entry=2)
                at /mariadb/10.6/storage/innobase/row/row0purge.cc:188
            #5  0x000055d4764a8494 in row_purge_remove_clust_if_poss (node=0x55d478d32db8)
                at /mariadb/10.6/storage/innobase/row/row0purge.cc:275
            #6  row_purge_del_mark (node=0x55d478d32db8)
                at /mariadb/10.6/storage/innobase/row/row0purge.cc:734
            

            That file name belongs to another tablespace, with id=7:

            (rr) p *fil_system.spaces.array@fil_system.spaces.n_cells
            $3 = {{node = 0x0} <repeats 6376 times>, {node = 0x55d478db5468}, {node = 0x0}, {node = 0x55d478d0c9e8}, {node = 0x55d478d0c7e8}, {node = 0x55d478d0c5e8}, {node = 0x0}, {node = 0x55d478d78b08}, {
                node = 0x55d478d785e8}, {node = 0x55d478d78828}, {node = 0x55d478d1faf8}, {node = 0x0} <repeats 15558 times>, {node = 0x55d478d74198}, {node = 0x0} <repeats 28822 times>}
            (rr) p ((fil_space_t*)0x55d478db5468).chain.start.name
            $4 = 0x55d478d0c448 "./test/t3.ibd"
            (rr) p ((fil_space_t*)0x55d478d0c9e8).chain.start.name
            $5 = 0x55d478d0c398 "./test/t2.ibd"
            (rr) p ((fil_space_t*)0x55d478d0c9e8).id
            $6 = 7
            

            The incorrect name for tablespace 6 was assigned here:

            #0  fil_space_t::add (this=0x55d478d0c7e8, name=0x55d478d0a5d8 "./test/t2.ibd", handle={m_file = -1, m_psi = <synthetic pointer>}, size=7, is_raw=false, atomic_write=<optimized out>, max_pages=4294967295)
                at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:312
            #1  0x000055d4763bdb6b in $_0::create (it=<optimized out>, name=<optimized out>, flags=<optimized out>, crypt_data=<optimized out>, size=7) at /mariadb/10.6/storage/innobase/log/log0recv.cc:733
            #2  recv_sys_t::recover_deferred (this=<optimized out>, p=<optimized out>, name=<optimized out>, free_block=<optimized out>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:781
            #3  0x000055d4763cbb20 in $_0::reinit_all (this=<optimized out>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:700
            #4  recv_recovery_from_checkpoint_start (flush_lsn=<optimized out>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4065
            

            I think that we must make the deferred_spaces aware of FILE_RENAME records. I made an attempt at that, and managed to fix both dd.tar.gz and dd-t2.tar.gz, but the above test case would occasionally end up with orphan files, and at lest 10 existing regression tests would fail:

            Failing test(s): encryption.innodb-encryption-alter encryption.innodb_encrypt_freed innodb.innodb-index innodb.innodb-alter-tempfile parts.debug_innodb_crash innodb.truncate_crash innodb.innodb-blob innodb.instant_alter_crash atomic.create_table innodb.innodb_bulk_create_index_debug
            

            marko Marko Mäkelä added a comment - I debugged dd-t2.tar.gz with the patch a little further.- We delete the wrong file here: #3 fil_delete_tablespace (id=id@entry=6) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:1699 #4 0x000055d4764adc9a in row_purge_remove_clust_if_poss_low ( node=node@entry=0x55d478d32db8, mode=<optimized out>, mode@entry=2) at /mariadb/10.6/storage/innobase/row/row0purge.cc:188 #5 0x000055d4764a8494 in row_purge_remove_clust_if_poss (node=0x55d478d32db8) at /mariadb/10.6/storage/innobase/row/row0purge.cc:275 #6 row_purge_del_mark (node=0x55d478d32db8) at /mariadb/10.6/storage/innobase/row/row0purge.cc:734 That file name belongs to another tablespace, with id=7: (rr) p *fil_system.spaces.array@fil_system.spaces.n_cells $3 = {{node = 0x0} <repeats 6376 times>, {node = 0x55d478db5468}, {node = 0x0}, {node = 0x55d478d0c9e8}, {node = 0x55d478d0c7e8}, {node = 0x55d478d0c5e8}, {node = 0x0}, {node = 0x55d478d78b08}, { node = 0x55d478d785e8}, {node = 0x55d478d78828}, {node = 0x55d478d1faf8}, {node = 0x0} <repeats 15558 times>, {node = 0x55d478d74198}, {node = 0x0} <repeats 28822 times>} (rr) p ((fil_space_t*)0x55d478db5468).chain.start.name $4 = 0x55d478d0c448 "./test/t3.ibd" (rr) p ((fil_space_t*)0x55d478d0c9e8).chain.start.name $5 = 0x55d478d0c398 "./test/t2.ibd" (rr) p ((fil_space_t*)0x55d478d0c9e8).id $6 = 7 The incorrect name for tablespace 6 was assigned here: #0 fil_space_t::add (this=0x55d478d0c7e8, name=0x55d478d0a5d8 "./test/t2.ibd", handle={m_file = -1, m_psi = <synthetic pointer>}, size=7, is_raw=false, atomic_write=<optimized out>, max_pages=4294967295) at /mariadb/10.6/storage/innobase/fil/fil0fil.cc:312 #1 0x000055d4763bdb6b in $_0::create (it=<optimized out>, name=<optimized out>, flags=<optimized out>, crypt_data=<optimized out>, size=7) at /mariadb/10.6/storage/innobase/log/log0recv.cc:733 #2 recv_sys_t::recover_deferred (this=<optimized out>, p=<optimized out>, name=<optimized out>, free_block=<optimized out>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:781 #3 0x000055d4763cbb20 in $_0::reinit_all (this=<optimized out>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:700 #4 recv_recovery_from_checkpoint_start (flush_lsn=<optimized out>) at /mariadb/10.6/storage/innobase/log/log0recv.cc:4065 I think that we must make the deferred_spaces aware of FILE_RENAME records. I made an attempt at that, and managed to fix both dd.tar.gz and dd-t2.tar.gz , but the above test case would occasionally end up with orphan files, and at lest 10 existing regression tests would fail: Failing test(s): encryption.innodb-encryption-alter encryption.innodb_encrypt_freed innodb.innodb-index innodb.innodb-alter-tempfile parts.debug_innodb_crash innodb.truncate_crash innodb.innodb-blob innodb.instant_alter_crash atomic.create_table innodb.innodb_bulk_create_index_debug

            thiru, thank you, your refinement of my fix does the trick.

            There is also a race condition in the test. We must wait for purge, so that any #sql-ib.ibd files for dropped tables will be removed before the test ends. With that fixed, a debug build of the server test failed due to an unrelated message:

            innodb_fts.cr 'innodb'                   w5 [ 631 fail ]  Found warnings/errors in server log file!
                    Test ended at 2021-06-04 17:12:17
            line
            2021-06-04 17:12:16 0 [Warning] InnoDB: Failed to load the clustered index for table `test`.`#sql-ib3120` because of TABLE_ID mismatch. Refusing to load the rest of the indexes (if any) and the whole table altogether.
            2021-06-04 17:12:16 0 [Warning] InnoDB: Failed to load table `test`.`#sql-ib3120`:Data structure corruption
            

            The release build of the same completed successfully:

            innodb_fts.cr 'innodb'                   w3 [ 1000 pass ]   1480
            --------------------------------------------------------------------------
            The servers were restarted 0 times
            Spent 55298.590 of 2408 seconds executing testcases
             
            Completed: All 27000 tests were successful.
            

            I plan to repeat another variant of the test, with FULLTEXT INDEX instead of a normal INDEX, so that more files will be created and deleted. But I think that this problem is solved now, and I will take care of pushing the fix. Thank you for finishing it!

            marko Marko Mäkelä added a comment - thiru , thank you, your refinement of my fix does the trick. There is also a race condition in the test. We must wait for purge, so that any #sql-ib.ibd files for dropped tables will be removed before the test ends. With that fixed, a debug build of the server test failed due to an unrelated message: innodb_fts.cr 'innodb' w5 [ 631 fail ] Found warnings/errors in server log file! Test ended at 2021-06-04 17:12:17 line 2021-06-04 17:12:16 0 [Warning] InnoDB: Failed to load the clustered index for table `test`.`#sql-ib3120` because of TABLE_ID mismatch. Refusing to load the rest of the indexes (if any) and the whole table altogether. 2021-06-04 17:12:16 0 [Warning] InnoDB: Failed to load table `test`.`#sql-ib3120`:Data structure corruption The release build of the same completed successfully: innodb_fts.cr 'innodb' w3 [ 1000 pass ] 1480 -------------------------------------------------------------------------- The servers were restarted 0 times Spent 55298.590 of 2408 seconds executing testcases   Completed: All 27000 tests were successful. I plan to repeat another variant of the test, with FULLTEXT INDEX instead of a normal INDEX , so that more files will be created and deleted. But I think that this problem is solved now, and I will take care of pushing the fix. Thank you for finishing it!

            The following worked too, for 27000 runs of the release build. However, if I replace either --echo with send, it would occasionally report orphan FTS_*.ibd files for the common tables (but no orphan tables). That should be fixed by MDEV-25850.

            --source include/have_innodb.inc
            --source include/not_embedded.inc
            FLUSH TABLES;
            let $datadir=`select @@datadir`;
             
            --connect(ddl1, localhost, root,,)
            CREATE TABLE t1(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
            send ALTER TABLE t1 ADD INDEX(b(5));
            --connect(ddl2, localhost, root,,)
            CREATE TABLE t2(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
            --echo ALTER TABLE t2 DROP INDEX a, ADD FULLTEXT INDEX(b), FORCE;
            --connect(ddl3, localhost, root,,)
            CREATE TABLE t3(a TEXT,b TEXT,FULLTEXT INDEX(a)) ENGINE=InnoDB;
            --echo ALTER TABLE t3 DROP INDEX a, ADD FULLTEXT INDEX(b), ALGORITHM=COPY;
             
            --connection default
            let $shutdown_timeout=0;
            --source include/shutdown_mysqld.inc
            #--exec tar czf $datadir/../dd.tar.gz -C $datadir .
            --source include/start_mysqld.inc
             
            # Ensure that the history list length will actually be decremented by purge.
            SET @saved_frequency = @@GLOBAL.innodb_purge_rseg_truncate_frequency;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = 1;
            source ../../innodb/include/wait_all_purged.inc;
            SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;
             
            disconnect ddl1;
            disconnect ddl2;
            disconnect ddl3;
             
            DROP TABLE t1,t2,t3;
             
            SELECT * FROM information_schema.innodb_sys_tables WHERE name LIKE 'test/%';
            list_files $datadir/test *.ibd;
            

            marko Marko Mäkelä added a comment - The following worked too, for 27000 runs of the release build. However, if I replace either --echo with send , it would occasionally report orphan FTS_*.ibd files for the common tables (but no orphan tables). That should be fixed by MDEV-25850 . --source include/have_innodb.inc --source include/not_embedded.inc FLUSH TABLES; let $datadir=` select @@datadir`;   --connect(ddl1, localhost, root,,) CREATE TABLE t1(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB; send ALTER TABLE t1 ADD INDEX (b(5)); --connect(ddl2, localhost, root,,) CREATE TABLE t2(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB; --echo ALTER TABLE t2 DROP INDEX a, ADD FULLTEXT INDEX(b), FORCE; --connect(ddl3, localhost, root,,) CREATE TABLE t3(a TEXT,b TEXT,FULLTEXT INDEX (a)) ENGINE=InnoDB; --echo ALTER TABLE t3 DROP INDEX a, ADD FULLTEXT INDEX(b), ALGORITHM=COPY;   --connection default let $shutdown_timeout=0; --source include/shutdown_mysqld.inc # --exec tar czf $datadir/../dd.tar.gz -C $datadir . --source include/start_mysqld.inc   # Ensure that the history list length will actually be decremented by purge. SET @saved_frequency = @@ GLOBAL .innodb_purge_rseg_truncate_frequency; SET GLOBAL innodb_purge_rseg_truncate_frequency = 1; source ../../innodb/include/wait_all_purged.inc; SET GLOBAL innodb_purge_rseg_truncate_frequency = @saved_frequency;   disconnect ddl1; disconnect ddl2; disconnect ddl3;   DROP TABLE t1,t2,t3;   SELECT * FROM information_schema.innodb_sys_tables WHERE name LIKE 'test/%' ; list_files $datadir/test *.ibd;

            Unfortunately, this still needs more work. data_copy.tar.bz2 fails to recover if the current fix is present:

            2021-06-07 14:02:30 0 [ERROR] InnoDB: Tablespace 402 was not found at ./test/#sql-alter-26a6cc-2f.ibd.
            2021-06-07 14:02:30 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace.
            2021-06-07 14:02:30 0 [ERROR] InnoDB: Tablespace 404 was not found at ./test/#sql-alter-26a6cc-2f.ibd.
            2021-06-07 14:02:30 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1433] with error Tablespace not found
            

            Furthermore, I started to think that it is useless to open the same files over and over again:

            		case FIL_LOAD_DEFER:
            			/** Skip the deferred spaces
            			when lsn is already processed */
            			if (*store != store_t::STORE_IF_EXISTS) {
            				deferred_spaces.add(
            					static_cast<uint32_t>(space_id),
            					name, lsn);
            			}
            			break;
            

            We should already have filtered out duplicated name at this point. Currently, we are paying the overhead of several system calls in fil_ibd_load(). If some file t.ibd did not contain a valid tablespace ID, it will not contain it on a subsequent call either, because we are checking the consistency of the files before we are applying any changes, right?

            Maybe we should have a mapping from file names to tablespace IDs. If the first page of the file was not correctly written before recovery, then we would store 0 as the tablespace ID for that file name.

            Last, I do not see why Datafile::read_first_page() has to be so complicated. We could simply attempt to read innodb_page_size bytes once. If the file is too short or the data is all-zero, we will add it to deferred_spaces. If its FSP_SPACE_FLAGS do not match innodb_page_size we will reject the file (normally, refuse recovery). Finally, on checksum mismatch we might also simply add the file to deferred_spaces. Page 0 will never be reinitialized, and when it is initialized during tablespace creation, also all other pages will be reinitialized. So it should be of no help to try to find the tablespace ID in subsequent pages of the file.

            marko Marko Mäkelä added a comment - Unfortunately, this still needs more work. data_copy.tar.bz2 fails to recover if the current fix is present: 2021-06-07 14:02:30 0 [ERROR] InnoDB: Tablespace 402 was not found at ./test/#sql-alter-26a6cc-2f.ibd. 2021-06-07 14:02:30 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore this and to permanently lose all changes to the tablespace. 2021-06-07 14:02:30 0 [ERROR] InnoDB: Tablespace 404 was not found at ./test/#sql-alter-26a6cc-2f.ibd. 2021-06-07 14:02:30 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1433] with error Tablespace not found Furthermore, I started to think that it is useless to open the same files over and over again: case FIL_LOAD_DEFER: /** Skip the deferred spaces when lsn is already processed */ if (*store != store_t::STORE_IF_EXISTS) { deferred_spaces.add( static_cast <uint32_t>(space_id), name, lsn); } break ; We should already have filtered out duplicated name at this point. Currently, we are paying the overhead of several system calls in fil_ibd_load() . If some file t.ibd did not contain a valid tablespace ID, it will not contain it on a subsequent call either, because we are checking the consistency of the files before we are applying any changes, right? Maybe we should have a mapping from file names to tablespace IDs. If the first page of the file was not correctly written before recovery, then we would store 0 as the tablespace ID for that file name. Last, I do not see why Datafile::read_first_page() has to be so complicated. We could simply attempt to read innodb_page_size  bytes once. If the file is too short or the data is all-zero, we will add it to deferred_spaces . If its FSP_SPACE_FLAGS do not match innodb_page_size we will reject the file (normally, refuse recovery). Finally, on checksum mismatch we might also simply add the file to deferred_spaces . Page 0 will never be reinitialized, and when it is initialized during tablespace creation, also all other pages will be reinitialized. So it should be of no help to try to find the tablespace ID in subsequent pages of the file.

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.