Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11424

Instant ALTER TABLE of failure-free record format changes

Details

    • 10.4.0-1

    Description

      This is an umbrella task for allowing ALTER TABLE to be instantaneous in cases that cannot fail due to existing records being incompatible with the altered table definition. Later, MDEV-16356 and MDEV-16291 could extend this to support ALGORITHM=NOCOPY for operations that can avoid rebuilding tables, but need to validate the data.

      Introduction

      Traditionally, ALTER TABLE would be roughly equivalent to the following SQL statements:

      CREATE TABLE `#sql-…` (…);
      INSERT INTO `#sql-…` SELECTFROM t;
      RENAME TABLE t TO `#sql2-…`, `#sql-…` TO t;
      DROP TABLE `#sql2-…`;
      

      This mode of operation is still available by specifying ALGORITHM=COPY or SET old_alter_table=1 (or starting with MDEV-13134 in MariaDB 10.3, SET alter_algorithm=copy).

      Copying a table and rebuilding all its indexes can be a very expensive operation. While InnoDB mostly allows tables to be rebuilt online since MariaDB Server version 10.0, the temporary files can occupy a significant amount of space and I/O capacity.

      There are cases where the data format is not affected by the ALTER TABLE operation; only metadata needs to be updated. Examples include renaming columns, changing the default values of columns, and changing the maximum length of a VARCHAR column such that the storage format does not change.

      A goal for MariaDB Server is to allow instantaneous execution of any ALTER TABLE operation where data conversions cannot fail, and indexes do not need to be rebuilt. Even in cases where some affected indexes have to be rebuilt, it will be more efficient to only rebuild some indexes than to copy the whole table.

      The goal can be reformulated as: Avoid rebuilding the table.

      How to avoid rebuilding the table, if the underlying storage format would be affected by the ALTER TABLE operation? By extending the storage format in a way that allows the data to be in ‘non-canonical’ format. The main examples of this are MDEV-11369 and MDEV-15562, which implement ADD COLUMN, DROP COLUMN and changing the order of columns.

      The original InnoDB storage format (retroactively named ROW_FORMAT=REDUNDANT) is very generic, basically allowing NULL values and arbitrary length for every column. For it, MDEV-15562 would be the only storage format change needed to avoid unnecessary rebuild of the table.

      Note: Whenever the PRIMARY KEY is changed, all indexes will have to be rebuilt. Likewise, some operations on indexed columns may require the indexes to be rebuilt.

      The space-optimized row formats COMPACT and DYNAMIC omit ‘is null’ flags for NOT NULL columns and length information for fixed-length columns. MDEV-17520 could extend MDEV-15563 to these ROW_FORMAT by using a more flexible encoding for all clustered index pages that have been modified since the ALTER TABLE operation.

      Operations that involve adding or dropping indexes (also DROP COLUMN can imply this) will not be supported for ALGORITHM=INSTANT; they will be supported with ALGORITHM=NOCOPY. ALTER TABLE…ADD [UNIQUE] INDEX supports concurrent modifications to the table since MariaDB 10.0. MDEV-16223 could defer ADD INDEX to a background operation.

      Operations that will continue to be refused by ALGORITHM=INSTANT (and ALGORITHM=NOCOPY even after MDEV-16291) include:

      • Changing ROW_FORMAT or ENGINE
      • Altering a table that is in ROW_FORMAT=COMPRESSED
      • Dropping, adding or changing PRIMARY KEY columns, or ADD/DROP PRIMARY KEY

      Any ALTER TABLE that would be refused with ALGORITHM=NOCOPY (anything that rebuilds the clustered index) will drop any ‘instant ALTER TABLE’ metadata. The metadata would also be deleted if a rebuild is explicitly requested by the use of the FORCE keyword.

      Metadata format changes

      In InnoDB, instant ALTER TABLE affects clustered index page leaf records only. The data dictionary will reflect the most recent table definition. Additional metadata for interpreting records that correspond to an earlier version of the table definition will be stored in the clustered index tree as follows.

      • MDEV-11369 in MariaDB 10.3.4 changed the root page type code to FIL_PAGE_TYPE_INSTANT to indicate that instant ALTER TABLE has been used. It also introduced a hidden metadata record at the start of the clustered index, on the leftmost leaf page. The new page type code prevents older MariaDB versions from opening the table.
      • MDEV-15562 slightly modifies the format of the metadata record to represent dropped and reordered index fields. (All columns will internally be added last in the user records in the clustered index leaf pages.) MariaDB 10.3 will refuse to open such tables, because new info_bits will be set in the metadata record.

      Data format changes

      User records in the clustered index leaf pages will have to indicate which format they correspond to.

      • For other than ROW_FORMAT=REDUNDANT, MDEV-11369 introduced REC_STATUS_COLUMNS_ADDED that indicates the presence of an optional record header that encodes the number of 'instantly added' columns that are present in the record.
      • For ROW_FORMAT=REDUNDANT, MDEV-11369 simply stores the number of fields in the record header.
      • In MDEV-11369 and MDEV-15562, any 'instantly added' columns whose values are missing from the end of the clustered index record will be substituted with the values stored in the metadata record.
      • MDEV-15562 will not change the user record format in any way. Instantly added columns are always added as last fields in the clustered index leaf page records.
      • MDEV-17520 would allow clustered index leaf pages to be in a format where the metadata version of each record is identified.

      A note on MVCC

      Because ha_innobase::commit_inplace_alter_table() will be invoked while holding MDL_EXCLUSIVE, any transactions that read or modified the table must finish before the ALTER TABLE can commit. But it is possible that some old transaction tries to do its first access to the table after the ALTER TABLE committed. Such transactions may receive an error message 'table definition changed', as noted in MySQL Bug#28432. It would be too much effort to support MVCC if a transaction after ALTER modified a record (converting it to newer dictionary version) that would otherwise be visible to the old transaction.
      Here is the scenario in SQL:

      connection con1;
      START TRANSACTION WITH CONSISTENT SNAPSHOT; -- creates read view
      connection con2;
      ALTER TABLE t CHANGE COLUMN b b INT NULL;
      UPDATE t SET b=1 WHERE a=100;
      connection con1;
      SELECT * FROM t WHERE a=1; -- might be OK, if the record is still in old format
      SELECT * FROM t WHERE a=100; -- error: record is in to new format
      

      For simplicity and consistency, we could always return an error to the SELECT statements (after any ALTER TABLE).

      ALTER TABLE operations that potentially affect the format of a row

      In MariaDB Server 10.2, the following alter_table_operations might require a table to be rebuilt:

      alter_table_operations SQL Remarks
      ALTER_ADD_STORED_BASE_COLUMN ADD COLUMN MDEV-11369 (10.3)
      ALTER_ADD_VIRTUAL_COLUMN ADD COLUMN…AS Virtual columns are always added instantly
      ALTER_ADD_STORED_GENERATED_COLUMN ADD COLUMN…PERSISTENT AS cannot be instant; until MDEV-16354 requires rebuild with ALGORITHM=COPY
      ALTER_ADD_PK_INDEX ADD PRIMARY KEY Requires all indexes to be rebuilt.
      ALTER_DROP_PK_INDEX DROP PRIMARY KEY Requires all indexes to be rebuilt. Without ADD PRIMARY KEY, cannot even support online rebuilding (LOCK=NONE).
      ALTER_CHANGE_CREATE_OPTION ROW_FORMAT, KEY_BLOCK_SIZE, encryption Requires rebuild; see MDEV-16291.
      ALTER_CHANGE_CREATE_OPTION page_compressed, page_compression_level MDEV-16328
      ALTER_COLUMN_NULLABLE NULL MDEV-15563 for ROW_FORMAT=REDUNDANT
      ALTER_COLUMN_NULLABLE NULL MDEV-17520 for COMPACT and DYNAMIC
      ALTER_COLUMN_NOT_NULLABLE NOT NULL MDEV-16291
      ALTER_STORED_COLUMN_ORDER FIRST, LAST, AFTER MDEV-15562
      ALTER_DROP_STORED_COLUMN DROP COLUMN MDEV-15562
      ALTER_RECREATE_TABLE FORCE the sole purpose of this keyword is to explicitly request rebuild
      ALTER_STORED_COLUMN_TYPE CHANGE to wider type MDEV-15564,MDEV-17520,MDEV-27864
      ALTER_STORED_COLUMN_TYPE CHANGE type MDEV-16291
      ALTER_VIRTUAL_COLUMN_TYPE CHANGE type MDEV-16332
      ALTER_STORED_GCOL_EXPR CHANGE expr MDEV-17035
      ALTER_COLUMN_UNVERSIONED CHANGE…WITH[OUT] SYSTEM VERSIONING MDEV-16330
      ALTER_ADD_SYSTEM_VERSIONING ADD SYSTEM VERSIONING Must rebuild the PRIMARY KEY and thus the full table
      ALTER_DROP_SYSTEM_VERSIONING DROP SYSTEM VERSIONING Must rebuild the PRIMARY KEY and thus the full table
      ALTER_ADD_PERIOD ADD PERIOD FOR SYSTEM TIME must be combined with ADD SYSTEM VERSIONING
      ALTER_DROP_PERIOD DROP PERIOD FOR SYSTEM TIME must be combined with DROP SYSTEM VERSIONING
      ALTER_ADD_CHECK_CONSTRAINT ADD [CONSTRAINT] CHECK MDEV-16356
      ALTER_DROP_CHECK_CONSTRAINT DROP CONSTRAINT MDEV-16331

      Legend:

      • can be performed instantly
      • can be performed instantly, except if any secondary indexes need to be rebuilt
      • not instantaneous; could later be performed without rebuild, with validation
      • will continue to require full table rebuild

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä added a comment - - edited

            I filed the following prerequisite tasks:
            MDEV-15562 Instant DROP COLUMN or changing the order of columns
            MDEV-15563 Instant NOT NULL removal and CHAR or VARCHAR extension for ROW_FORMAT=REDUNDANT

            This task would address the remaining ALTER TABLE operations that currently require the table to be rebuilt. The solution would involve storing a format identifier in each record and storing multiple versions of the table definition by somehow extending the hidden ‘default row’ record that was introduced in MDEV-11369 and will be extended by MDEV-15562.

            Secondary indexes of full columns (not column prefixes) in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC will have to be rebuilt when the indexed columns are modified as follows:

            1. Changing the length of a fixed-length column (CHAR, BINARY). (For ROW_FORMAT=REDUNDANT, this will be determined in MDEV-15563.)
            2. Changing a VARCHAR maximum length from up to 255 bytes to more than 255 bytes.

            Lifting these restrictions on secondary indexes would require a change to the secondary index format and a rewrite of the InnoDB change buffer (MDEV-11634) so that the full table metadata would be available on change buffer merge operations.

            marko Marko Mäkelä added a comment - - edited I filed the following prerequisite tasks: MDEV-15562 Instant DROP COLUMN or changing the order of columns MDEV-15563 Instant NOT NULL removal and CHAR or VARCHAR extension for ROW_FORMAT=REDUNDANT This task would address the remaining ALTER TABLE operations that currently require the table to be rebuilt. The solution would involve storing a format identifier in each record and storing multiple versions of the table definition by somehow extending the hidden ‘default row’ record that was introduced in MDEV-11369 and will be extended by MDEV-15562 . Secondary indexes of full columns (not column prefixes) in ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC will have to be rebuilt when the indexed columns are modified as follows: Changing the length of a fixed-length column ( CHAR , BINARY ). (For ROW_FORMAT=REDUNDANT , this will be determined in MDEV-15563 .) Changing a VARCHAR maximum length from up to 255 bytes to more than 255 bytes. Lifting these restrictions on secondary indexes would require a change to the secondary index format and a rewrite of the InnoDB change buffer ( MDEV-11634 ) so that the full table metadata would be available on change buffer merge operations.

            The following were filed as follow-up work for MDEV-15562:
            MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists
            MDEV-17468 Avoid table rebuild on operations on generated columns
            MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large

            marko Marko Mäkelä added a comment - The following were filed as follow-up work for MDEV-15562 : MDEV-17459 Allow instant ALTER TABLE even if FULLTEXT INDEX exists MDEV-17468 Avoid table rebuild on operations on generated columns MDEV-17494 Refuse ALGORITHM=INSTANT when the row size is too large

            Fixing MDEV-12836 would allow instantaneous changes of AUTO_INCREMENT settings.

            marko Marko Mäkelä added a comment - Fixing MDEV-12836 would allow instantaneous changes of AUTO_INCREMENT settings.

            MDEV-17520 will not be in MariaDB Server 10.4 due to the size overhead of the current implementation (of using a format like ROW_FORMAT=REDUNDANT on clustered index leaf pages). We could implement it later in a different form, using a per-page or per-record format identifier.

            marko Marko Mäkelä added a comment - MDEV-17520 will not be in MariaDB Server 10.4 due to the size overhead of the current implementation (of using a format like ROW_FORMAT=REDUNDANT on clustered index leaf pages). We could implement it later in a different form, using a per-page or per-record format identifier.

            Some related bugs will be fixed later. The MariaDB 10.4 tasks related to instant ALTER TABLE (mainly MDEV-15562, MDEV-15563, MDEV-15564) have been completed.

            marko Marko Mäkelä added a comment - Some related bugs will be fixed later. The MariaDB 10.4 tasks related to instant ALTER TABLE (mainly MDEV-15562 , MDEV-15563 , MDEV-15564 ) have been completed.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              7 Vote for this issue
              Watchers:
              13 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.