Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17520

Instant ALTER TABLE for failure-free column type changes

Details

    Description

      The original InnoDB ROW_FORMAT=REDUNDANT is essentially storing each field as variable-length and possibly NULL. For that format, we can trivially allow an instantaneous change of a column from NOT NULL to NULL.

      The space-optimized row formats COMPACT and DYNAMIC do not allow NULL values to be represented for columns that were originally declared as NOT NULL. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for ROW_FORMAT=REDUNDANT in MDEV-15563 would require the table to be rebuilt when it is in the COMPACT or DYNAMIC format.

      Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:

      1. To prevent downgrade or IMPORT TABLESPACE to older MariaDB releases, add a MDEV-15562 metadata record.
      2. For ROW_FORMAT=COMPACT or ROW_FORMAT=REDUNDANT, if the metadata record is present, allow clustered index leaf pages to be in ROW_FORMAT=REDUNDANT with the differences noted below.

      The clustered index leaf pages of the table would be gradually converted into something that resembles ROW_FORMAT=REDUNDANT as a result of modifications. This will increase the size usage a little. Also, ROW_FORMAT=REDUNDANT limits the maximum in-page record size to 16,383 bytes, which for innodb_page_size=64k is less than the limit for COMPACT or DYNAMIC.

      Because secondary index records would remain in the original ROW_FORMAT, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from NOT NULL to NULL will require the indexes to be rebuilt if ROW_FORMAT is not REDUNDANT.

      Any INSERT or UPDATE after an instant ALTER that removes a NOT NULL constraint (or changes a column to a wider type later in MDEV-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles ROW_FORMAT=REDUNDANT, with the following differences:

      • NULL columns will occupy 0 bytes of storage, also when fixed-length.
      • CHAR(n) (unless NULL) will occupy n*mbminlenn*mbmaxlen bytes, instead of n*mbmaxlen.
      • For ROW_FORMAT=DYNAMIC, no local prefix of off-page columns will be stored. For ROW_FORMAT=COMPACT, we will continue to write the 768-byte prefixes.

      In ROW_FORMAT=REDUNDANT, the record header will store the length of each column (including fixed-length and NULL columns), using n_fields or 2·n_fields bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·n_fields+1 bytes.

      The metadata BLOB that was introduced in MDEV-15562 will be augmented. The flag 1U << 14 will be set in dict_instant_t::non_pk_col_map[] for ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC columns that were originally created as NOT NULL but no longer carry this attribute. Based on this information, we will initialize n_nullable and n_core_null_bytes for the clustered index based on the original column definition, instead of the latest one. Secondary indexes and ROW_FORMAT=REDUNDANT will use the latest definition.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            marko Marko Mäkelä made changes -
            Description The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade to earlier servers, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}}, if the metadata record is present and, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}}.
            # For {{ROW_FORMAT=DYNAMIC}}, if the metadata record is present and, allow clustered index leaf pages to be like in {{ROW_FORMAT=REDUNDANT}}, but do not store any local prefix for off-page columns.

            The clustered index leaf pages of the table would be gradually converted into {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. For example, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt.
            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade to earlier servers, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}}, if the metadata record is present and, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}}.
            # For {{ROW_FORMAT=DYNAMIC}}, if the metadata record is present and, allow clustered index leaf pages to be like in {{ROW_FORMAT=REDUNDANT}}, but do not store any local prefix for off-page columns.

            The clustered index leaf pages of the table would be gradually converted into {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. For example, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint or changes a column to a wider type will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:
            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed-length.
            * {{CHAR\(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.
            marko Marko Mäkelä made changes -
            Description The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade to earlier servers, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}}, if the metadata record is present and, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}}.
            # For {{ROW_FORMAT=DYNAMIC}}, if the metadata record is present and, allow clustered index leaf pages to be like in {{ROW_FORMAT=REDUNDANT}}, but do not store any local prefix for off-page columns.

            The clustered index leaf pages of the table would be gradually converted into {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. For example, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint or changes a column to a wider type will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:
            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed-length.
            * {{CHAR\(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.
            This is an umbrella task for allowing {{ALTER TABLE}} to be instantaneous in cases that cannot fail due to existing records being incompatible with the altered table definition. Later, MDEV-16356 and MDEV-16291 could extend this to support {{ALGORITHM=NOCOPY}} for operations that can avoid rebuilding tables, but need to validate the data.

            h2. Introduction

            Traditionally, {{ALTER TABLE}} would be roughly equivalent to the following SQL statements:
            {code:sql}
            CREATE TABLE `#sql-…` (…);
            INSERT INTO `#sql-…` SELECT … FROM t;
            RENAME TABLE t TO `#sql2-…`, `#sql-…` TO t;
            DROP TABLE `#sql2-…`;
            {code}
            This mode of operation is still available by specifying {{ALGORITHM=COPY}} or {{SET old_alter_table=1}} (or starting with MDEV-13134 in MariaDB 10.3, {{SET alter_algorithm=copy}}).

            Copying a table and rebuilding all its indexes can be a very expensive operation. While InnoDB mostly allows tables to be rebuilt online since MariaDB Server version 10.0, the temporary files can occupy a significant amount of space and I/O capacity.

            There are cases where the data format is not affected by the {{ALTER TABLE}} operation; only metadata needs to be updated. Examples include renaming columns, changing the default values of columns, and changing the maximum length of a {{VARCHAR}} column such that the storage format does not change.

            A goal for MariaDB Server is to allow instantaneous execution of any {{ALTER TABLE}} operation where data conversions cannot fail, and indexes do not need to be rebuilt. Even in cases where some affected indexes have to be rebuilt, it will be more efficient to only rebuild some indexes than to copy the whole table.

            The goal can be reformulated as: Avoid rebuilding the table.

            How to avoid rebuilding the table, if the underlying storage format would be affected by the {{ALTER TABLE}} operation? By extending the storage format in a way that allows the data to be in ‘non-canonical’ format. The main examples of this are MDEV-11369 and MDEV-15562, which implement {{ADD COLUMN}}, {{DROP COLUMN}} and changing the order of columns.

            The original InnoDB storage format (retroactively named {{ROW_FORMAT=REDUNDANT}}) is very generic, basically allowing {{NULL}} values and arbitrary length for every column. For it, MDEV-15562 would be the only storage format change needed to avoid unnecessary rebuild of the table.

            Note: Whenever the {{PRIMARY KEY}} is changed, all indexes will have to be rebuilt. Likewise, some operations on indexed columns may require the indexes to be rebuilt.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} omit ‘is null’ flags for {{NOT NULL}} columns and length information for fixed-length columns. MDEV-17520 will extend MDEV-15563 to these {{ROW_FORMAT}} by using the {{ROW_FORMAT=REDUNDANT}} encoding for all clustered index pages that have been modified since the {{ALTER TABLE}} operation.

            Operations that involve adding or dropping indexes (also {{DROP COLUMN}} can imply this) will not be supported for {{ALGORITHM=INSTANT}}; they will be supported with {{ALGORITHM=NOCOPY}}. {{ALTER TABLE…ADD \[UNIQUE\] INDEX}} supports concurrent modifications to the table since MariaDB 10.0. MDEV-16223 could defer {{ADD INDEX}} to a background operation.

            Operations that will continue to be refused by {{ALGORITHM=INSTANT}} (and {{ALGORITHM=NOCOPY}} even after MDEV-16291) include:
            * Changing {{ROW_FORMAT}} or {{ENGINE}}
            * Altering a table that is in {{ROW_FORMAT=COMPRESSED}}
            * Dropping, adding or changing {{PRIMARY KEY}} columns, or {{ADD}}/{{DROP PRIMARY KEY}}

            Any {{ALTER TABLE}} that would be refused with {{ALGORITHM=NOCOPY}} (anything that rebuilds the clustered index) will drop any ‘instant {{ALTER TABLE}}’ metadata. The metadata would also be deleted if a rebuild is explicitly requested by the use of the {{FORCE}} keyword.

            h2. Metadata format changes
            In InnoDB, instant {{ALTER TABLE}} affects clustered index page leaf records only. The data dictionary will reflect the most recent table definition. Additional metadata for interpreting records that correspond to an earlier version of the table definition will be stored in the clustered index tree as follows.

            * MDEV-11369 in MariaDB 10.3.4 changed the root page type code to {{FIL_PAGE_TYPE_INSTANT}} to indicate that instant {{ALTER TABLE}} has been used. It also introduced a hidden metadata record at the start of the clustered index, on the leftmost leaf page. The new page type code prevents older MariaDB versions from opening the table.
            * MDEV-15562 slightly modifies the format of the metadata record to represent dropped and reordered index fields. (All columns will internally be added last in the user records in the clustered index leaf pages.) MariaDB 10.3 will refuse to open such tables, because new {{info_bits}} will be set in the metadata record.

            h2. Data format changes
            User records in the clustered index leaf pages will have to indicate which format they correspond to.
            * For other than {{ROW_FORMAT=REDUNDANT}}, MDEV-11369 introduced {{REC_STATUS_COLUMNS_ADDED}} that indicates the presence of an optional record header that encodes the number of 'instantly added' columns that are present in the record.
            * For {{ROW_FORMAT=REDUNDANT}}, MDEV-11369 simply stores the number of fields in the record header.
            * In MDEV-11369 and MDEV-15562, any 'instantly added' columns whose values are missing from the end of the clustered index record will be substituted with the values stored in the metadata record.
            * MDEV-15562 will not change the user record format in any way. Instantly added columns are always added as last fields in the clustered index leaf page records.
            * MDEV-17520 would allow clustered index leaf pages of {{ROW_FORMAT=COMPACT}} and {{ROW_FORMAT=DYNAMIC}} to be in a format that resembles {{ROW_FORMAT=REDUNDANT}}.

            h2. A note on MVCC
            Because {{ha_innobase::commit_inplace_alter_table()}} will be invoked while holding {{MDL_EXCLUSIVE}}, any transactions that read or modified the table must finish before the {{ALTER TABLE}} can commit. But it is possible that some old transaction tries to do its first access to the table after the {{ALTER TABLE}} committed. Such transactions may receive an error message 'table definition changed', as noted in [MySQL Bug#28432|https://bugs.mysql.com/bug.php?id=28432]. It would be too much effort to support MVCC if a transaction after {{ALTER}} modified a record (converting it to newer dictionary version) that would otherwise be visible to the old transaction.
            Here is the scenario in SQL:
            {code:SQL}
            connection con1;
            START TRANSACTION WITH CONSISTENT SNAPSHOT; -- creates read view
            connection con2;
            ALTER TABLE t CHANGE COLUMN b b INT NULL;
            UPDATE t SET b=1 WHERE a=100;
            connection con1;
            SELECT * FROM t WHERE a=1; -- might be OK, if the record is still in old format
            SELECT * FROM t WHERE a=100; -- error: record is in to new format
            {code}
            For simplicity and consistency, we could always return an error to the SELECT statements (after any {{ALTER TABLE}}).

            h1. {{ALTER TABLE}} operations that potentially affect the format of a row
            In MariaDB Server 10.2, the following {{alter_table_operations}} might require a table to be rebuilt:
            ||alter_table_operations||SQL||Remarks||
            |ALTER_ADD_STORED_BASE_COLUMN|{color:green}ADD COLUMN{color}|MDEV-11369 (10.3)|
            |ALTER_ADD_VIRTUAL_COLUMN|{color:green}ADD COLUMN…AS{color}|Virtual columns are always added instantly|
            |ALTER_ADD_STORED_GENERATED_COLUMN|{color:red}ADD COLUMN…PERSISTENT AS{color}|cannot be instant; until MDEV-16354 requires rebuild with {{ALGORITHM=COPY}}|
            |ALTER_ADD_PK_INDEX|{color:red}ADD PRIMARY KEY{color}|Requires all indexes to be rebuilt.|
            |ALTER_DROP_PK_INDEX|{color:red}DROP PRIMARY KEY{color}|Requires all indexes to be rebuilt. Without {{ADD PRIMARY KEY}}, cannot even support online rebuilding ({{LOCK=NONE}}).|
            |ALTER_CHANGE_CREATE_OPTION|{color:red}ROW_FORMAT, KEY_BLOCK_SIZE, encryption{color}|Requires rebuild; see MDEV-16291.|
            |ALTER_CHANGE_CREATE_OPTION|{color:green}page_compressed, page_compression_level{color}|MDEV-16328|
            |ALTER_COLUMN_NULLABLE|{color:green}NULL{color}|MDEV-15563 for {{ROW_FORMAT=REDUNDANT}}|
            |ALTER_COLUMN_NULLABLE|{color:darkgreen}NULL{color}|MDEV-17520 for {{COMPACT}} and {{DYNAMIC}}|
            |ALTER_COLUMN_NOT_NULLABLE|{color:blue}NOT NULL{color}|MDEV-16291|
            |ALTER_STORED_COLUMN_ORDER|{color:green}FIRST, LAST, AFTER{color}|MDEV-15562|
            |ALTER_DROP_STORED_COLUMN|{color:darkgreen}DROP COLUMN{color}|MDEV-15562|
            |ALTER_RECREATE_TABLE|{color:red}FORCE{color}|the sole purpose of this keyword is to explicitly request rebuild|
            |ALTER_STORED_COLUMN_TYPE|{color:green}CHANGE to wider type{color}|MDEV-15563 for {{ROW_FORMAT=REDUNDANT}}|
            |ALTER_STORED_COLUMN_TYPE|{color:darkgreen}CHANGE to wider type{color}|MDEV-17520 for {{COMPACT}} and {{DYNAMIC}}|
            |ALTER_ST
            ORED_COLUMN_TYPE|{color:blue}CHANGE type{color}|MDEV-16291|
            |ALTER_VIRTUAL_COLUMN_TYPE|{color:blue}CHANGE type{color}|MDEV-16332|
            |ALTER_STORED_GCOL_EXPR|{color:blue}CHANGE expr{color}|MDEV-17035|
            |ALTER_COLUMN_UNVERSIONED|{color:green}CHANGE…WITH\[OUT\] SYSTEM VERSIONING{color}|MDEV-16330|
            |ALTER_ADD_SYSTEM_VERSIONING|{color:red}ADD SYSTEM VERSIONING{color}|Must rebuild the {{PRIMARY KEY}} and thus the full table|
            |ALTER_DROP_SYSTEM_VERSIONING|{color:red}DROP SYSTEM VERSIONING{color}|Must rebuild the {{PRIMARY KEY}} and thus the full table|
            |ALTER_ADD_PERIOD|{color:red}ADD PERIOD FOR SYSTEM TIME{color}|must be combined with {{ADD SYSTEM VERSIONING}}|
            |ALTER_DROP_PERIOD|{color:red}DROP PERIOD FOR SYSTEM TIME{color}|must be combined with {{DROP SYSTEM VERSIONING}}|
            |ALTER_ADD_CHECK_CONSTRAINT|{color:blue}ADD \[CONSTRAINT\] CHECK{color}|MDEV-16356|
            |ALTER_DROP_CHECK_CONSTRAINT|{color:green}DROP CONSTRAINT{color}|MDEV-16331|

            Legend:
            * {color:green}can be performed instantly{color}
            * {color:darkgreen}can be performed instantly, except if any secondary indexes need to be rebuilt{color}
            * {color:blue}not instantaneous; could later be performed without rebuild, with validation{color}
            * {color:red}will continue to require full table rebuild{color}
            marko Marko Mäkelä made changes -
            Description This is an umbrella task for allowing {{ALTER TABLE}} to be instantaneous in cases that cannot fail due to existing records being incompatible with the altered table definition. Later, MDEV-16356 and MDEV-16291 could extend this to support {{ALGORITHM=NOCOPY}} for operations that can avoid rebuilding tables, but need to validate the data.

            h2. Introduction

            Traditionally, {{ALTER TABLE}} would be roughly equivalent to the following SQL statements:
            {code:sql}
            CREATE TABLE `#sql-…` (…);
            INSERT INTO `#sql-…` SELECT … FROM t;
            RENAME TABLE t TO `#sql2-…`, `#sql-…` TO t;
            DROP TABLE `#sql2-…`;
            {code}
            This mode of operation is still available by specifying {{ALGORITHM=COPY}} or {{SET old_alter_table=1}} (or starting with MDEV-13134 in MariaDB 10.3, {{SET alter_algorithm=copy}}).

            Copying a table and rebuilding all its indexes can be a very expensive operation. While InnoDB mostly allows tables to be rebuilt online since MariaDB Server version 10.0, the temporary files can occupy a significant amount of space and I/O capacity.

            There are cases where the data format is not affected by the {{ALTER TABLE}} operation; only metadata needs to be updated. Examples include renaming columns, changing the default values of columns, and changing the maximum length of a {{VARCHAR}} column such that the storage format does not change.

            A goal for MariaDB Server is to allow instantaneous execution of any {{ALTER TABLE}} operation where data conversions cannot fail, and indexes do not need to be rebuilt. Even in cases where some affected indexes have to be rebuilt, it will be more efficient to only rebuild some indexes than to copy the whole table.

            The goal can be reformulated as: Avoid rebuilding the table.

            How to avoid rebuilding the table, if the underlying storage format would be affected by the {{ALTER TABLE}} operation? By extending the storage format in a way that allows the data to be in ‘non-canonical’ format. The main examples of this are MDEV-11369 and MDEV-15562, which implement {{ADD COLUMN}}, {{DROP COLUMN}} and changing the order of columns.

            The original InnoDB storage format (retroactively named {{ROW_FORMAT=REDUNDANT}}) is very generic, basically allowing {{NULL}} values and arbitrary length for every column. For it, MDEV-15562 would be the only storage format change needed to avoid unnecessary rebuild of the table.

            Note: Whenever the {{PRIMARY KEY}} is changed, all indexes will have to be rebuilt. Likewise, some operations on indexed columns may require the indexes to be rebuilt.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} omit ‘is null’ flags for {{NOT NULL}} columns and length information for fixed-length columns. MDEV-17520 will extend MDEV-15563 to these {{ROW_FORMAT}} by using the {{ROW_FORMAT=REDUNDANT}} encoding for all clustered index pages that have been modified since the {{ALTER TABLE}} operation.

            Operations that involve adding or dropping indexes (also {{DROP COLUMN}} can imply this) will not be supported for {{ALGORITHM=INSTANT}}; they will be supported with {{ALGORITHM=NOCOPY}}. {{ALTER TABLE…ADD \[UNIQUE\] INDEX}} supports concurrent modifications to the table since MariaDB 10.0. MDEV-16223 could defer {{ADD INDEX}} to a background operation.

            Operations that will continue to be refused by {{ALGORITHM=INSTANT}} (and {{ALGORITHM=NOCOPY}} even after MDEV-16291) include:
            * Changing {{ROW_FORMAT}} or {{ENGINE}}
            * Altering a table that is in {{ROW_FORMAT=COMPRESSED}}
            * Dropping, adding or changing {{PRIMARY KEY}} columns, or {{ADD}}/{{DROP PRIMARY KEY}}

            Any {{ALTER TABLE}} that would be refused with {{ALGORITHM=NOCOPY}} (anything that rebuilds the clustered index) will drop any ‘instant {{ALTER TABLE}}’ metadata. The metadata would also be deleted if a rebuild is explicitly requested by the use of the {{FORCE}} keyword.

            h2. Metadata format changes
            In InnoDB, instant {{ALTER TABLE}} affects clustered index page leaf records only. The data dictionary will reflect the most recent table definition. Additional metadata for interpreting records that correspond to an earlier version of the table definition will be stored in the clustered index tree as follows.

            * MDEV-11369 in MariaDB 10.3.4 changed the root page type code to {{FIL_PAGE_TYPE_INSTANT}} to indicate that instant {{ALTER TABLE}} has been used. It also introduced a hidden metadata record at the start of the clustered index, on the leftmost leaf page. The new page type code prevents older MariaDB versions from opening the table.
            * MDEV-15562 slightly modifies the format of the metadata record to represent dropped and reordered index fields. (All columns will internally be added last in the user records in the clustered index leaf pages.) MariaDB 10.3 will refuse to open such tables, because new {{info_bits}} will be set in the metadata record.

            h2. Data format changes
            User records in the clustered index leaf pages will have to indicate which format they correspond to.
            * For other than {{ROW_FORMAT=REDUNDANT}}, MDEV-11369 introduced {{REC_STATUS_COLUMNS_ADDED}} that indicates the presence of an optional record header that encodes the number of 'instantly added' columns that are present in the record.
            * For {{ROW_FORMAT=REDUNDANT}}, MDEV-11369 simply stores the number of fields in the record header.
            * In MDEV-11369 and MDEV-15562, any 'instantly added' columns whose values are missing from the end of the clustered index record will be substituted with the values stored in the metadata record.
            * MDEV-15562 will not change the user record format in any way. Instantly added columns are always added as last fields in the clustered index leaf page records.
            * MDEV-17520 would allow clustered index leaf pages of {{ROW_FORMAT=COMPACT}} and {{ROW_FORMAT=DYNAMIC}} to be in a format that resembles {{ROW_FORMAT=REDUNDANT}}.

            h2. A note on MVCC
            Because {{ha_innobase::commit_inplace_alter_table()}} will be invoked while holding {{MDL_EXCLUSIVE}}, any transactions that read or modified the table must finish before the {{ALTER TABLE}} can commit. But it is possible that some old transaction tries to do its first access to the table after the {{ALTER TABLE}} committed. Such transactions may receive an error message 'table definition changed', as noted in [MySQL Bug#28432|https://bugs.mysql.com/bug.php?id=28432]. It would be too much effort to support MVCC if a transaction after {{ALTER}} modified a record (converting it to newer dictionary version) that would otherwise be visible to the old transaction.
            Here is the scenario in SQL:
            {code:SQL}
            connection con1;
            START TRANSACTION WITH CONSISTENT SNAPSHOT; -- creates read view
            connection con2;
            ALTER TABLE t CHANGE COLUMN b b INT NULL;
            UPDATE t SET b=1 WHERE a=100;
            connection con1;
            SELECT * FROM t WHERE a=1; -- might be OK, if the record is still in old format
            SELECT * FROM t WHERE a=100; -- error: record is in to new format
            {code}
            For simplicity and consistency, we could always return an error to the SELECT statements (after any {{ALTER TABLE}}).

            h1. {{ALTER TABLE}} operations that potentially affect the format of a row
            In MariaDB Server 10.2, the following {{alter_table_operations}} might require a table to be rebuilt:
            ||alter_table_operations||SQL||Remarks||
            |ALTER_ADD_STORED_BASE_COLUMN|{color:green}ADD COLUMN{color}|MDEV-11369 (10.3)|
            |ALTER_ADD_VIRTUAL_COLUMN|{color:green}ADD COLUMN…AS{color}|Virtual columns are always added instantly|
            |ALTER_ADD_STORED_GENERATED_COLUMN|{color:red}ADD COLUMN…PERSISTENT AS{color}|cannot be instant; until MDEV-16354 requires rebuild with {{ALGORITHM=COPY}}|
            |ALTER_ADD_PK_INDEX|{color:red}ADD PRIMARY KEY{color}|Requires all indexes to be rebuilt.|
            |ALTER_DROP_PK_INDEX|{color:red}DROP PRIMARY KEY{color}|Requires all indexes to be rebuilt. Without {{ADD PRIMARY KEY}}, cannot even support online rebuilding ({{LOCK=NONE}}).|
            |ALTER_CHANGE_CREATE_OPTION|{color:red}ROW_FORMAT, KEY_BLOCK_SIZE, encryption{color}|Requires rebuild; see MDEV-16291.|
            |ALTER_CHANGE_CREATE_OPTION|{color:green}page_compressed, page_compression_level{color}|MDEV-16328|
            |ALTER_COLUMN_NULLABLE|{color:green}NULL{color}|MDEV-15563 for {{ROW_FORMAT=REDUNDANT}}|
            |ALTER_COLUMN_NULLABLE|{color:darkgreen}NULL{color}|MDEV-17520 for {{COMPACT}} and {{DYNAMIC}}|
            |ALTER_COLUMN_NOT_NULLABLE|{color:blue}NOT NULL{color}|MDEV-16291|
            |ALTER_STORED_COLUMN_ORDER|{color:green}FIRST, LAST, AFTER{color}|MDEV-15562|
            |ALTER_DROP_STORED_COLUMN|{color:darkgreen}DROP COLUMN{color}|MDEV-15562|
            |ALTER_RECREATE_TABLE|{color:red}FORCE{color}|the sole purpose of this keyword is to explicitly request rebuild|
            |ALTER_STORED_COLUMN_TYPE|{color:green}CHANGE to wider type{color}|MDEV-15563 for {{ROW_FORMAT=REDUNDANT}}|
            |ALTER_STORED_COLUMN_TYPE|{color:darkgreen}CHANGE to wider type{color}|MDEV-17520 for {{COMPACT}} and {{DYNAMIC}}|
            |ALTER_ST
            ORED_COLUMN_TYPE|{color:blue}CHANGE type{color}|MDEV-16291|
            |ALTER_VIRTUAL_COLUMN_TYPE|{color:blue}CHANGE type{color}|MDEV-16332|
            |ALTER_STORED_GCOL_EXPR|{color:blue}CHANGE expr{color}|MDEV-17035|
            |ALTER_COLUMN_UNVERSIONED|{color:green}CHANGE…WITH\[OUT\] SYSTEM VERSIONING{color}|MDEV-16330|
            |ALTER_ADD_SYSTEM_VERSIONING|{color:red}ADD SYSTEM VERSIONING{color}|Must rebuild the {{PRIMARY KEY}} and thus the full table|
            |ALTER_DROP_SYSTEM_VERSIONING|{color:red}DROP SYSTEM VERSIONING{color}|Must rebuild the {{PRIMARY KEY}} and thus the full table|
            |ALTER_ADD_PERIOD|{color:red}ADD PERIOD FOR SYSTEM TIME{color}|must be combined with {{ADD SYSTEM VERSIONING}}|
            |ALTER_DROP_PERIOD|{color:red}DROP PERIOD FOR SYSTEM TIME{color}|must be combined with {{DROP SYSTEM VERSIONING}}|
            |ALTER_ADD_CHECK_CONSTRAINT|{color:blue}ADD \[CONSTRAINT\] CHECK{color}|MDEV-16356|
            |ALTER_DROP_CHECK_CONSTRAINT|{color:green}DROP CONSTRAINT{color}|MDEV-16331|

            Legend:
            * {color:green}can be performed instantly{color}
            * {color:darkgreen}can be performed instantly, except if any secondary indexes need to be rebuilt{color}
            * {color:blue}not instantaneous; could later be performed without rebuild, with validation{color}
            * {color:red}will continue to require full table rebuild{color}
              The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade to earlier servers, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}}, if the metadata record is present and, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}}.
            # For {{ROW_FORMAT=DYNAMIC}}, if the metadata record is present and, allow clustered index leaf pages to be like in {{ROW_FORMAT=REDUNDANT}}, but do not store any local prefix for off-page columns.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. For example, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint or changes a column to a wider type will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:
            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed-length.
            * {{CHAR\(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.
            ralf.gebhardt Ralf Gebhardt made changes -
            Epic Link PT-80 [ 68561 ]
            marko Marko Mäkelä made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Summary Instant NOT NULL removal and conversion to a wider type for ROW_FORMAT=COMPACT or DYNAMIC Instant NOT NULL removal
            marko Marko Mäkelä made changes -
            Description   The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade to earlier servers, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}}, if the metadata record is present and, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}}.
            # For {{ROW_FORMAT=DYNAMIC}}, if the metadata record is present and, allow clustered index leaf pages to be like in {{ROW_FORMAT=REDUNDANT}}, but do not store any local prefix for off-page columns.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. For example, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint or changes a column to a wider type will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:
            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed-length.
            * {{CHAR\(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.
            The original InnoDB {{ROW_FORMAT=REDUNDANT}} is essentially storing each field as variable-length and possibly {{NULL}}. For that format, we can trivially allow an instantaneous change of a column from {{NOT NULL}} to {{NULL}}.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade or {{IMPORT TABLESPACE}} to older MariaDB releases, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=REDUNDANT}}, if the metadata record is present, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}} with the differences noted below.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt if {{ROW_FORMAT}} is not {{REDUNDANT}}.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint (or changes a column to a wider type later in MDEV-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:
            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed-length.
            * {{CHAR\(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Description The original InnoDB {{ROW_FORMAT=REDUNDANT}} is essentially storing each field as variable-length and possibly {{NULL}}. For that format, we can trivially allow an instantaneous change of a column from {{NOT NULL}} to {{NULL}}.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade or {{IMPORT TABLESPACE}} to older MariaDB releases, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=REDUNDANT}}, if the metadata record is present, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}} with the differences noted below.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt if {{ROW_FORMAT}} is not {{REDUNDANT}}.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint (or changes a column to a wider type later in MDEV-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:
            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed-length.
            * {{CHAR\(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.
            The original InnoDB {{ROW_FORMAT=REDUNDANT}} is essentially storing each field as variable-length and possibly {{NULL}}. For that format, we can trivially allow an instantaneous change of a column from {{NOT NULL}} to {{NULL}}.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade or {{IMPORT TABLESPACE}} to older MariaDB releases, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=REDUNDANT}}, if the metadata record is present, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}} with the differences noted below.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt if {{ROW_FORMAT}} is not {{REDUNDANT}}.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint (or changes a column to a wider type later in MDEV-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:
            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed-length.
            * {{CHAR\(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.

            The metadata BLOB that was introduced in MDEV-15562 will be augmented. The flag {{1U << 14}} will be set in {{dict_instant_t::non_pk_col_map[]}} for {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=DYNAMIC}} columns that were originally created as {{NOT NULL}} but no longer carry this attribute. Based on this information, we will initialize {{n_nullable}} and {{n_core_null_bytes}} for the clustered index based on the original column definition, instead of the latest one. Secondary indexes and {{ROW_FORMAT=REDUNDANT}} will use the latest definition.
            ralf.gebhardt Ralf Gebhardt made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            Rank Ranked higher
            marko Marko Mäkelä made changes -
            Attachment MDEV-17520.patch [ 46750 ]
            marko Marko Mäkelä made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.4 [ 22408 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            ralf.gebhardt Ralf Gebhardt made changes -
            Epic Link PT-80 [ 68561 ]
            ralf.gebhardt Ralf Gebhardt made changes -
            NRE Projects NRE-310017 RM_104_NRE RM_removed_104
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Summary Instant NOT NULL removal Instant ALTER TABLE for failure-free column type changes
            marko Marko Mäkelä made changes -
            NRE Projects RM_removed_104 RM_removed_104 RM_long_term
            midenok Aleksey Midenkov made changes -
            Attachment MDEV-17520_sec_indexes.10.4.3.diff [ 47873 ]
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Comment [ For format versioning, we could use the hidden system fields {{DB_TRX_ID}} and {{DB_ROLL_PTR}} as follows:
            * If {{DB_TRX_ID=0}}, we will have {{DB_ROLL_PTR&1ULL<<56}}, and the low-order 55 bits of {{DB_ROLL_PTR}} will be the dictionary format version (starting from 0).
            * Else if {{DB_TRX_ID!=0}}, the metadata will remember the {{DB_TRX_ID}} of each instant {{ALTER TABLE}} operation, so the nonzero {{DB_TRX_ID}} allows up to look up the corresponding table definition. (This is for the rare case that when instant {{ALTER TABLE}} was executed, some purgeable history existed for the table.) We might skip this, and do a full purge before continuing with the {{ALTER TABLE}}.
            * When purge is resetting {{DB_TRX_ID}} to 0 (MDEV-12288), it will also update the dictionary format version in {{DB_ROLL_PTR}}. ]
            julien.fritsch Julien Fritsch made changes -
            NRE Approved Yes [ 10304 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 90210 ] MariaDB v4 [ 131701 ]
            AirFocus AirFocus made changes -
            Description The original InnoDB {{ROW_FORMAT=REDUNDANT}} is essentially storing each field as variable-length and possibly {{NULL}}. For that format, we can trivially allow an instantaneous change of a column from {{NOT NULL}} to {{NULL}}.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV-15563:
            # To prevent downgrade or {{IMPORT TABLESPACE}} to older MariaDB releases, add a MDEV-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=REDUNDANT}}, if the metadata record is present, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}} with the differences noted below.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt if {{ROW_FORMAT}} is not {{REDUNDANT}}.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint (or changes a column to a wider type later in MDEV-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:
            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed-length.
            * {{CHAR\(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.

            The metadata BLOB that was introduced in MDEV-15562 will be augmented. The flag {{1U << 14}} will be set in {{dict_instant_t::non_pk_col_map[]}} for {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=DYNAMIC}} columns that were originally created as {{NOT NULL}} but no longer carry this attribute. Based on this information, we will initialize {{n_nullable}} and {{n_core_null_bytes}} for the clustered index based on the original column definition, instead of the latest one. Secondary indexes and {{ROW_FORMAT=REDUNDANT}} will use the latest definition.
            The original InnoDB {{ROW_FORMAT=REDUNDANT}} is essentially storing each field as variable\-length and possibly {{NULL}}. For that format, we can trivially allow an instantaneous change of a column from {{NOT NULL}} to {{NULL}}.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV\-15563:

            # To prevent downgrade or {{IMPORT TABLESPACE}} to older MariaDB releases, add a MDEV\-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=REDUNDANT}}, if the metadata record is present, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}} with the differences noted below.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in\-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt if {{ROW_FORMAT}} is not {{REDUNDANT}}.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint (or changes a column to a wider type later in MDEV\-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:

            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed\-length.
            * {{CHAR(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.

            The metadata BLOB that was introduced in MDEV\-15562 will be augmented. The flag {{1U << 14}} will be set in {{dict_instant_t::non_pk_col_map[]}} for {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=DYNAMIC}} columns that were originally created as {{NOT NULL}} but no longer carry this attribute. Based on this information, we will initialize {{n_nullable}} and {{n_core_null_bytes}} for the clustered index based on the original column definition, instead of the latest one. Secondary indexes and {{ROW_FORMAT=REDUNDANT}} will use the latest definition.
            Roel Roel Van de Paar made changes -
            Description The original InnoDB {{ROW_FORMAT=REDUNDANT}} is essentially storing each field as variable\-length and possibly {{NULL}}. For that format, we can trivially allow an instantaneous change of a column from {{NOT NULL}} to {{NULL}}.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV\-15563:

            # To prevent downgrade or {{IMPORT TABLESPACE}} to older MariaDB releases, add a MDEV\-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=REDUNDANT}}, if the metadata record is present, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}} with the differences noted below.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in\-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt if {{ROW_FORMAT}} is not {{REDUNDANT}}.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint (or changes a column to a wider type later in MDEV\-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:

            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed\-length.
            * {{CHAR(n)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.

            The metadata BLOB that was introduced in MDEV\-15562 will be augmented. The flag {{1U << 14}} will be set in {{dict_instant_t::non_pk_col_map[]}} for {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=DYNAMIC}} columns that were originally created as {{NOT NULL}} but no longer carry this attribute. Based on this information, we will initialize {{n_nullable}} and {{n_core_null_bytes}} for the clustered index based on the original column definition, instead of the latest one. Secondary indexes and {{ROW_FORMAT=REDUNDANT}} will use the latest definition.
            The original InnoDB {{ROW_FORMAT=REDUNDANT}} is essentially storing each field as variable\-length and possibly {{NULL}}. For that format, we can trivially allow an instantaneous change of a column from {{NOT NULL}} to {{NULL}}.

            The space-optimized row formats {{COMPACT}} and {{DYNAMIC}} do not allow {{NULL}} values to be represented for columns that were originally declared as {{NOT NULL}}. They also do not store any length for fixed-length columns. Because of this, some failure-free conversions that are instantaneous for {{ROW_FORMAT=REDUNDANT}} in MDEV-15563 would require the table to be rebuilt when it is in the {{COMPACT}} or {{DYNAMIC}} format.

            Let us create a hybrid format that allows us to avoid rebuilding the table in the cases covered by MDEV\-15563:

            # To prevent downgrade or {{IMPORT TABLESPACE}} to older MariaDB releases, add a MDEV\-15562 metadata record.
            # For {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=REDUNDANT}}, if the metadata record is present, allow clustered index leaf pages to be in {{ROW_FORMAT=REDUNDANT}} with the differences noted below.

            The clustered index leaf pages of the table would be gradually converted into something that resembles {{ROW_FORMAT=REDUNDANT}} as a result of modifications. This will increase the size usage a little. Also, {{ROW_FORMAT=REDUNDANT}} limits the maximum in\-page record size to 16,383 bytes, which for {{innodb_page_size=64k}} is less than the limit for {{COMPACT}} or {{DYNAMIC}}.

            Because secondary index records would remain in the original {{ROW_FORMAT}}, secondary indexes may have to be rebuilt when an indexed column is changed. That is, changing an indexed column from {{NOT NULL}} to {{NULL}} will require the indexes to be rebuilt if {{ROW_FORMAT}} is not {{REDUNDANT}}.

            Any {{INSERT}} or {{UPDATE}} after an instant {{ALTER}} that removes a {{NOT NULL}} constraint (or changes a column to a wider type later in MDEV\-15563) will cause all records in the affected clustered index leaf page to be rewritten in a format that resembles {{ROW_FORMAT=REDUNDANT}}, with the following differences:

            * {{NULL}} columns will occupy 0 bytes of storage, also when fixed\-length.
            * {{CHAR\(n\)}} (unless {{NULL}}) will occupy {{n*mbminlen}}‥{{n*mbmaxlen}} bytes, instead of {{n*mbmaxlen}}.
            * For {{ROW_FORMAT=DYNAMIC}}, no local prefix of off-page columns will be stored. For {{ROW_FORMAT=COMPACT}}, we will continue to write the 768-byte prefixes.

            In {{ROW_FORMAT=REDUNDANT}}, the record header will store the length of each column (including fixed-length and {{NULL}} columns), using {{n_fields}} or 2·{{n_fields}} bytes. The fixed-length record header is 6 instead of 5 bytes. This will increase the size of each record by at least 1 byte, up to 2·{{n_fields}}+1 bytes.

            The metadata BLOB that was introduced in MDEV\-15562 will be augmented. The flag {{1U << 14}} will be set in {{dict_instant_t::non_pk_col_map[]}} for {{ROW_FORMAT=COMPACT}} or {{ROW_FORMAT=DYNAMIC}} columns that were originally created as {{NOT NULL}} but no longer carry this attribute. Based on this information, we will initialize {{n_nullable}} and {{n_core_null_bytes}} for the clustered index based on the original column definition, instead of the latest one. Secondary indexes and {{ROW_FORMAT=REDUNDANT}} will use the latest definition.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.