MDEV-11424 introduces support for ALGORITHM=INSTANT for failure-free ALTER TABLE operations that do not involve adding or dropping indexes.
Operations that may fail will require validation by reading all data in the table, and thus they cannot be instantaneous. While we cannot support ALGORITHM=INSTANT, we can support ALGORITHM=NOCOPY, that is, perform the validation and use the MDEV-11424 mechanism for representing the operation, without modifying any data.
Examples of such operations include:
- Adding or removing UNSIGNED attribute of an integer column
- Adding NOT NULL attribute
- Shortening CHAR or VARCHAR (to produce errors or warnings for data truncation)
- Changing the character encoding of a CHAR or VARCHAR
Examples of changes that will require the table to be rebuilt (that will not be supported with ALGORITHM=NOCOPY) include:
- Adding, dropping or changing PRIMARY KEY columns
- Changing the ROW_FORMAT or ENGINE
- Altering a table that is in ROW_FORMAT=COMPRESSED
- This work will extend the 'default row' record introduced in MDEV-11369 further, to store 'conversion recipes' that make it possible to process records that correspond to an earlier definition of the table. To indicate the presence of the recipes in the 'default row', a new bit in info_bits will be set, and an additional BLOB column is appended to the record for storing the recipes.
TO BE DEFINED: What metadata exactly needs to be stored and how? At least the following:
- Character set encoding, so that data can be converted to the newest encoding
- Whether integer columns are signed or unsigned
- The original fixed length of string columns
- Do we want to support conversions between string and integer columns? ENUM? BIT? SET?
Since this work is a refinement of MDEV-11424, any INSERT or UPDATE after an ALTER TABLE operation that changes the data format would cause the clustered index leaf page to be rewritten in ROW_FORMAT=REDUNDANT.
In this work, we must store a 'conversion recipe number' in the record or page header.
- We can repurpose the 64-bit page header field PAGE_MAX_TRX_ID.
MDEV-6076repurposed it for PAGE_ROOT_AUTO_INC in the root page. If the table consists of a single root page or it later shrinks to a single page, we can simply convert the table to non-instant format. Having a per-page format would require that all records of the page be converted to the newest format on any write.
- In the ROW_FORMAT=REDUNDANT record header, there are 3 spare info_bits that we could use to support a small maximum number of conversion recipes.
We may choose to limit the number of conversion recipes that can be supported. If an ALTER TABLE operation would require a conversion recipe, but there is no space to store one, the operation would be refused with ALGORITHM=NOCOPY. After any ALTER TABLE that rebuilds the table, the maximum number of conversion recipes will be available again.
The 'default row' record needs to store the conversion recipes that allow each clustered index leaf page record format version to be converted to the latest table definition. These recipes will be cached in dict_table_t. The 'default row' record will only be read when the table definition is first loaded to the data dictionary cache.
Adding or dropping indexes or renaming columns or indexes would not touch the 'default row' record.
If we store recipes for converting from version 1 to latest, 2 to latest, and so on, then each instant ALTER (
MDEV-11424, MDEV-11369, MDEV-15562) may have to rewrite the old recipes, so that they will work from 1 to latest+1, 2 to latest+1 and so on, and finally add a recipe from latest to latest+1. Because we will store the recipes in a BLOB, and because BLOBs are copy-on-write in InnoDB, all conversion recipe pages will have to be rewritten in any case.
Read and write access to the clustered index would be affected as follows:
Writes would always use the latest dictionary version. If we use per-page identifier (instead of per-record) and the page that is being written to contains other old-format records, we could convert all of them while we are modifying the page.
Reads would have to be prepared to convert records from old format to the latest one.
There could be a separate online operation for clean-up, such as OPTIMIZE INDEX PRIMARY ON t1, which would also defragment the index. This would be useful if writes to the table are rare, because reads would not write back any converted records.
The clean-up operation could also remove old version history from the 'default row' record. After
MDEV-16264, we could also have a background cleanup operation that would (with some granularity) remember the last processed PRIMARY KEY values, by storing them in the (otherwise ignored) keys of the 'default row' records.