Details

      Description

      Streaming, Faster Exporting and Instant Importing of Tables

      Currently, tables are exported like this in MySQL 5.6 and 5.7:

      SHOW CREATE TABLE t;
      FLUSH TABLE t FOR EXPORT;
      /* access the file system to copy t.ibd */
      UNLOCK TABLES;
      

      The export syntax is somewhat awkward, and it assumes that the user has access to the file system. We might support an alternative that streams an IMPORT statement with the metadata and encoded file contents to the client connection.

      The import procedure is even more awkward:

      CREATE TABLE t…; /* using the output from SHOW CREATE TABLE */
      ALTER TABLE t DISCARD TABLESPACE; /* effectively corrupts the database! */
      /* copy the file t.ibd to the file system */
      ALTER TABLE t IMPORT TABLESPACE;
      

      As an alternative to this, we can simply support the following:

      SHOW CREATE TABLE t;/* finds user-copied *.frm and *.ibd files */
      

      To allow importing of files in a streaming-friendly way that does not encourage direct manipulation of the data directory, we could also support the following:

      CREATE TABLE t…IMPORT DATA [base64 or base85 dump];
      

      Such statements could be the output of a new streaming-friendly EXPORT statement.

      What currently necessitates a rewrite of the data file at IMPORT?

      The InnoDB data files contain some identifiers that bind the file to the containing instance:

      • Redo log sequence number (LSN) on each page (needed for crash recovery)
      • The DB_TRX_ID transaction identifier in each clustered index record
      • The PAGE_MAX_TRX_ID transaction identifier in each secondary index leaf page
      • The 32-bit tablespace ID (space_id) on every page and in every BLOB pointer

      At IMPORT we must ensure that all these identifiers are compatible with the importing InnoDB instance.

      Instant IMPORT

      Ever since MySQL 5.6 implemented WL#5522, IMPORT reads and writes every page in the to-be-imported tablespace. Before MySQL 5.7 introduced WL#7142 and the MLOG_FILE_NAME record, the IMPORT procedure was also not crash-safe, because crash recovery could mistake a not-yet-imported file for a real data file and even attempt to apply mismatching redo log records to it.

      In MariaDB we could relatively easily allow an instant variant of import that would not adjust anything in the data file. Only in the case when the space_id is already used by the importing system, we would refuse a quick import.

      The trick is that each *.ibd file would keep track of the maximum contained TRX_ID and LSN as dynamic persistent metadata, persisted in a header page on EXPORT or shutdown. If the current mysql.transactions.start_id (MDEV-11657, trx_sys->max_trx_id) or the current redo log sequence number (log_sys->lsn) in the importing server instance is less than the values found near the start of the .ibd file, we would simply move the server’s settings to a larger value. No need to adjust anything in the file! (Some caution is needed when MAX_LSN or MAX_TRX_ID of the being-imported file is within an ‘active window’ of the importing server.)

      With some more effort, we can also remove the requirement to avoid space_id collision on import. This would require that space_id be replaced with fil_space_t* in various parts of InnoDB code and all space_id lookups be removed. Then, the space_id in the data files would essentially become "don’t care" fields, or fil_space_t::id could become a non-unique ‘nonce’ that could be used in consistency checks. We might also continue to require the space_id to be unique, but only write the unique space_id to the first page of the tablespace file, not to each page header.

      Thanks to MDEV-12288, the DB_TRX_ID could be reset when the history is no longer needed. After a proper EXPORT, all DB_TRX_ID columns would be 0, and would not pose any problem. The PAGE_MAX_TRX_ID is trickier, because it cannot be easily be reset due to the complexity that is involved (before resetting, we would have to look up every clustered index record that the secondary index page is pointing to).

      Maybe we should replace the PAGE_MAX_TRX_ID with per-record DB_TRX_ID in secondary indexes and let purge reset those fields as well. This would also properly fix MySQL Bug#67044/Oracle Bug#14704286 Secondary index updates make consistent reads do O(N^2) undo page lookups), which Oracle ‘fixed’ by merely making the lengthy operations interruptible.

      No instant EXPORT for InnoDB

      Normal FLUSH TABLES will not ensure that no pending transactions exist on a table, and it will also not prevent the InnoDB purge operation (or XA ROLLBACK) from modifying the *.ibd file while it is being copied.

      We definitely want an EXPORT that ensures that there are no active or XA PREPARE transactions on the table, and that any delete-marked garbage has been purged. This (apart from server shutdown) is the only way to guarantee that InnoDB cannot modify the *.ibd file while the user is copying it.

      To allow quick import after a normal server shutdown (innodb_fast_shutdown=1), InnoDB could keep track of the number of uncommitted and unpurged records as persistent dynamic metadata. If both counts are zero, an instant import is possible. If the ‘uncommitted’ count is not zero, we cannot safely import anything (the data would essentially be READ UNCOMMITTED, and secondary indexes could be out of sync with the clustered index). If the ‘unpurged’ count is nonzero, import will have to delete all delete-marked records.

      Optional offline clean-up step after EXPORT

      If it is not desirable to move the LSN or TRX_ID forward in the importing server instance, an external utility can be developed to reset the fields in the data file.
      This external utility could also clear (or punch_hole) unallocated pages to reduce the physical size of the file.
      This utility could set a bit in FSP_SPACE_FLAGS to indicate whether the file was completely cleaned.

      Maybe we should write the server GUID somewhere in the first 3 pages of an InnoDB file. When a file is being imported from the same server instance, it can be assumed that no adjustments will be necessary.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                ralf.gebhardt@mariadb.com Ralf Gebhardt
                Reporter:
                marko Marko Mäkelä
              • Votes:
                10 Vote for this issue
                Watchers:
                15 Start watching this issue

                Dates

                • Created:
                  Updated: