Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15562

Instant DROP COLUMN or changing the order of columns



      MDEV-11369 implemented instant ALTER TABLE…ADD COLUMN by introducing a hidden metadata record (originally called ‘default row’) that stores metadata of the original table structure. Based on this record, we know the original structure of the clustered index (where the data is stored together with the PRIMARY KEY). The latest structure is persisted in the data dictionary.

      We can extend the format of the hidden record to store a mapping between clustered index fields and table columns. Any ADD COLUMN would be instantly added as last field in the clustered index record, but the field-to-column map would allow it to be positioned anywhere in the table. Any DROP COLUMN will simply detach the clustered index field from the table columns.

      Note: DROP COLUMN of an indexed column would imply DROP INDEX (and in the case of a non-UNIQUE multi-column index, possibly ADD INDEX). These will no be allowed with ALGORITHM=INSTANT, but unlike before this work, they can be allowed with ALGORITHM=NOCOPY. Also, MDEV-16223 could defer the ADD INDEX, allowing the DROP COLUMN operation to complete faster.

      Note: While this work would enable the ordering of columns in the SELECT * FROM t to be changed instantaneously, changing the ordering of PRIMARY KEY columns (such as changing PRIMARY KEY(a,b) to PRIMARY KEY(b,a)) would continue to require a full table rebuild. The syntax for this is:


      Data format changes

      Instantly added columns are added last in the user records, in the format that was introduced by MDEV-11369.

      For instantly dropped columns, INSERT and UPDATE will continue to write dummy values as follows:

      • In ROW_FORMAT=REDUNDANT, write 0-byte fields.
      • In ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, for columns that allow NULL, write NULL.
      • In ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, for variable-length NOT NULL columns, write 0-byte fields.
      • In ROW_FORMAT=COMPACT or ROW_FORMAT=DYNAMIC, for fixed-length NOT NULL columns, zero-fill the fixed length of bytes.

      Metadata format changes

      We will change the format of the metadata record that was introduced in MDEV-11369 as follows:

      • In info_bits, we will set the REC_INFO_DELETED_FLAG to indicate the presence of column reordering information (a BLOB field).
      • If REC_INFO_DELETED_FLAG is set, then the 'number of instantly added columns' will be encoded as starting from 1, so that 0 will mean that only the new BLOB field is present.
      • If REC_INFO_DELETED_FLAG is set, then after the dummy PRIMARY KEY value and the DB_TRX_ID,DB_ROLL_PTR we will store a BLOB pointer to a page that stores information about dropped and reordered columns.
      • After the BLOB pointer, we will include dummy or default values of all non-dropped columns.
      • In COMPACT or DYNAMIC ROW_FORMAT, for dropped columns, the 'null bits' bitmap will indicate if the dropped column allowed NULL values.
      • In COMPACT or DYNAMIC ROW_FORMAT, for dropped columns, the 'length' array will include 1 or 2 lengths for each dropped column, using variable-length encoding: the fixed length (0 if variable-length), and the maximum length if variable-length. Both lengths will use the variable-length encoding (1 byte for 0..127, else 2 bytes).
      • In ROW_FORMAT=REDUNDANT, for any subsequently updated or inserted records, instantly dropped columns will be stored as 0 bytes. (In this format, each column is basically stored as variable-length.)

      The metadata BLOB format

      The payload of the BLOB will contain the following:

      • 32 bits: Number of fields in the clustered index, including any instantly added or dropped non-virtual columns
      • An array of 16 bits * n_fields: The location of each field in the physical clustered index record.
      • If the most significant bit is set, it means that the column was dropped, and the remaining bits contain the NOT NULL flag and the fixed_len (0 if variable-length).
      • Else, the entry will be 0‥n_fields-1, mapping the dict_index_t::fields[] of the clustered index of the latest table definition to fields in the clustered index records.
      • This BLOB will always be stored off-page. Only the 20-byte BLOB pointer will be added to the metadata record, in any supported ROW_FORMAT.
      • The unused trailer of the BLOB page must be zero-initialized. If there are nonzero bytes in the unknown area, we will refuse to open the table. This area is reserved for future extension.


      The maximum number of fields in a clustered index is a little less than 1023 fields. This limit is imposed by the redo log format as well as some main-memory data structures. This number includes any instantly dropped columns in the history of the table.

      The minimum record length will include instantly dropped columns that were NOT NULL and fixed-length. It must be possible to insert at least 2 minimum-length records into a clustered index leaf page.

      If these limits would be violated by ADD COLUMN after instant DROP COLUMN, then the operation cannot be performed as an instant operation, but it can be executed as a table-rebuild (ALGORITHM=INSTANT or ALGORITHM=COPY, but not ALGORITHM=NOCOPY or ALGORITHM=INSTANT).


          Issue Links



              • Assignee:
                marko Marko Mäkelä
                marko Marko Mäkelä
              • Votes:
                4 Vote for this issue
                11 Start watching this issue


                • Created: