Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25899

intermediate files operations are not protected by backup locks

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Fixed
    • 10.6
    • 10.6.2
    • Backup, Server
    • None

    Description

      After MDEV-25666 fix mariabackup.innodb_ddl_on_intermediate_table fails with the following way:

      1. mariabackup holds "BACKUP STAGE START" (see MDEV-5336 for details).
      2. The server starts "ALTER TABLE" with "COPY" algorithm. copy_data_between_tables() is invoked and holds MDL_BACKUP_ALTER_COPY lock and copies data to intermediate table. InnoDB writes FILE_CREATE redo log record.
      3. mariabackup requests BLOCK_COMMIT lock, and the lock is granted. mariabackup remembers LSN just after the lock is granted.
      4. The server finishes data copying in copy_data_between_tables() and tries to upgrade the lock to MDL_BACKUP_DDL, which conflicts with the lock held by mariabackup. The lock request finishes by timeout and copy_data_between_tables() returns error.
      5. mysql_alter_table() drops intermediate table, this drop is not protected with MDL_BACKUP_DDL, so the table is dropped and InnoDB writes FILE_DELETE redo log record.
      6. mariabackup waits while redo log is read to the LSN remembered on the step 3. So FILE_CREATE for the intermediate table is read, but FILE_DELETE has not been read yet.
      7. mariabackup executes backup_fix_ddl(), and as FILE_CREATE was read, but FILE_DELETE was not read for the intermediate table, it tries to backup that table, and fails trying to open it, as it was deleted on step 5.

      Just for the note, mariabackup parses InnoDB redo log during backup and remembers all file operations read from the log, and then copies newly created files or creates special files for renamed and deleted files, which are processed during backup prepare phase.

      Before MDEV-25666 fix mariabackup ignored temp and intermediate tables, and backup_fix_ddl() worked well. But after this fix it does not work because intermediate file's operations are not protected with MDL_BACKUP_DDL lock.

      It's not obvious for me how to fix it or if this is bug or not. The transformation of #sql-* file into a real tablespace should not happen, and some BACKUP STAGE , or FTWRL should prevent that last rename during backup(see MDEV-5336). It might be that errors in MDEV-25666 log should not be treated as errors.

      The test: https://github.com/MariaDB/server/tree/10.6-MDEV-25899

      Attachments

        Issue Links

          Activity

            For native ALTER TABLE we also have the problem that rollback is not protected by backup locks at all. The original reasoning why the rollback is not necessarily protected by MDL is that during the final phase of ALTER TABLE…LOCK=NONE the lock upgrade to MDL_EXCLUSIVE may time out, and as a result we would want to roll back the operation. This rollback may delete and rename files if the operation was a table-rebuilding ALTER.
            Relevant code in sql_table.cc:

              /* Set MDL_BACKUP_DDL */
              if (backup_reset_alter_copy_lock(thd))
                goto rollback;
            

            We would invoke the rollback if there is a conflict with a backup lock, instead of waiting for the backup lock!

            marko Marko Mäkelä added a comment - For native ALTER TABLE we also have the problem that rollback is not protected by backup locks at all. The original reasoning why the rollback is not necessarily protected by MDL is that during the final phase of ALTER TABLE…LOCK=NONE the lock upgrade to MDL_EXCLUSIVE may time out, and as a result we would want to roll back the operation. This rollback may delete and rename files if the operation was a table-rebuilding ALTER . Relevant code in sql_table.cc : /* Set MDL_BACKUP_DDL */ if (backup_reset_alter_copy_lock(thd)) goto rollback; We would invoke the rollback if there is a conflict with a backup lock, instead of waiting for the backup lock!

            I expect that MDEV-25854 can fully address most problems related to restoring a backup where a DDL operation on the server was blocked by backup locks. But, I expect that if the server was running a rollback of native ALTER TABLE during the last phase of backup, we can end up with a corrupted backup.

            marko Marko Mäkelä added a comment - I expect that MDEV-25854 can fully address most problems related to restoring a backup where a DDL operation on the server was blocked by backup locks. But, I expect that if the server was running a rollback of native ALTER TABLE during the last phase of backup, we can end up with a corrupted backup.

            It seems that "rollback of native ALTER TABLE" isn't in the scope of this bug report. This was specifically about #sql temporary files created by ALTER TABLE ... ALGORITHM=COPY. And this was, apparently, fixed by MDEV-25854.

            serg Sergei Golubchik added a comment - It seems that "rollback of native ALTER TABLE " isn't in the scope of this bug report. This was specifically about #sql temporary files created by ALTER TABLE ... ALGORITHM=COPY . And this was, apparently, fixed by MDEV-25854 .

            serg, I don’t think that it is appropriate to call the #sql files that are created during DDL "temporary", because they eventually would become the persistent, durable copy of the data when the operation completes. "Intermediate file name" would seem to be a less confusing term.

            There seems to be some confusion around the unfortunately named ALGORITHM=INPLACE again. As you can read in MDEV-11424, the native ALTER TABLE may very well rebuild tables. Starting with MDEV-14378, the intermediate file names in InnoDB would use the same pattern as the .frm files, for files that are being created, but not for files that are in the process of being removed.

            In MDEV-25854 I had written the following:

            I think that files with names #sql-ib*.ibd must always be copied.

            In ha_innobase::commit_inplace_alter_table() as well as in ha_innobase::truncate() and fts_drop_table() we invoke dict_mem_create_temporary_tablename() in order to generate an #sql-ib name for being-dropped tables. At that point, we cannot possibly know whether the operation will be durably committed. If the server was killed and restarted before the durable commit, InnoDB must be able to roll back an ALTER TABLE or OPTIMIZE TABLE or TRUNCATE TABLE. The renamed file would be removed after commit, either by the DDL thread, or in case the server was killed and restarted, by the purge logic that was implemented in the third part of MDEV-25506.

            marko Marko Mäkelä added a comment - serg , I don’t think that it is appropriate to call the #sql files that are created during DDL "temporary", because they eventually would become the persistent, durable copy of the data when the operation completes. "Intermediate file name" would seem to be a less confusing term. There seems to be some confusion around the unfortunately named ALGORITHM=INPLACE again. As you can read in MDEV-11424 , the native ALTER TABLE may very well rebuild tables. Starting with MDEV-14378 , the intermediate file names in InnoDB would use the same pattern as the .frm files, for files that are being created , but not for files that are in the process of being removed . In MDEV-25854 I had written the following: I think that files with names #sql-ib*.ibd must always be copied. In ha_innobase::commit_inplace_alter_table() as well as in ha_innobase::truncate() and fts_drop_table() we invoke dict_mem_create_temporary_tablename() in order to generate an #sql-ib name for being-dropped tables. At that point, we cannot possibly know whether the operation will be durably committed. If the server was killed and restarted before the durable commit, InnoDB must be able to roll back an ALTER TABLE or OPTIMIZE TABLE or TRUNCATE TABLE . The renamed file would be removed after commit, either by the DDL thread, or in case the server was killed and restarted, by the purge logic that was implemented in the third part of MDEV-25506 .

            It was somewhat unclear to me why this bug had been closed, or what the fix was.

            I revisited this because of an issue MDEV-35257 that affects the 10.6-enterprise release and anything starting with MariaDB Server 10.11.

            marko Marko Mäkelä added a comment - It was somewhat unclear to me why this bug had been closed, or what the fix was. I revisited this because of an issue MDEV-35257 that affects the 10.6-enterprise release and anything starting with MariaDB Server 10.11.

            People

              Unassigned Unassigned
              vlad.lesin Vladislav Lesin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.