Details
-
Task
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
Description
Some character encodings are proper subsets of others. ASCII is a proper subset of latin1 or latin2 or UTF-8. The 3-byte UTF-8 is a proper subset of the 4-byte UTF-8.
When the character encoding of a column changes such that the storage encoding does not change, we should avoid rebuilding the table.
Likewise, when the collation of a column changes such that the encoding stays the same, we should avoid rebuilding the table.
Collations do not matter if the column is not indexed.
If the collation of an indexed column changes, then the affected indexes may have to be rebuilt. For certain collations we might avoid that as well. For example, changing the encoding and collation from binary 3-byte to binary 4-byte UTF-8 might not require any change.
When the collation change involves a column that is part of the PRIMARY KEY and we have determined that a collation change is necessary, then the whole table will have to be rebuilt.
Attachments
Issue Links
- blocks
-
MDEV-11424 Instant ALTER TABLE of failure-free record format changes
-
- Closed
-
- causes
-
MDEV-18584 Avoid copying when altering CHAR column in InnoDB table
-
- Confirmed
-
-
MDEV-19284 INSTANT ALTER with ucs2-to-utf16 conversion produces bad data
-
- Closed
-
-
MDEV-19285 INSTANT ALTER from ascii_general_ci to latin1_general_ci produces corrupt data
-
- Closed
-
-
MDEV-19524 Server crashes in Bitmap<64u>::is_clear_all / Field_longstr::csinfo_change_allows_instant_alter
-
- Closed
-
-
MDEV-20565 Assertion failure on CHANGE COLUMN…SYSTEM VERSIONING
-
- Closed
-
-
MDEV-22333 Assertion `len <= fixed_len' failed in rec_get_converted_size_comp_prefix_low on ALTER + INSERT
-
- Confirmed
-
-
MDEV-22334 Assertion `col->len == len' failed in innobase_rename_or_enlarge_column_try on ALTER
-
- Closed
-
-
MDEV-27280 server crashes on CHECK TABLE after COLLATE change for utf8mb4
-
- Closed
-
- is blocked by
-
MDEV-15563 Instant failure-free data type conversions
-
- Closed
-
-
MDEV-17965 Allow instant VARCHAR increase of indexed fields
-
- Closed
-
- relates to
-
MDEV-17773 Avoid table rebuild in ALTER TABLE on collation or charset changes for ENUM and SET
-
- Open
-
-
MDEV-27859 Instant change of ENUM is refused because of COLLATE mismatch
-
- Confirmed
-
-
MDEV-17301 Change of COLLATE unnecessarily requires ALGORITHM=COPY
-
- Closed
-
-
MDEV-26294 Duplicate entries in unique index not detected when changing collation with INPLACE / NOCOPY algoritm
-
- Closed
-
-
MDEV-27864 Alter table modify column for same data type copying table
-
- Closed
-
-
MDEV-28323 Redundant Item_func_conv_charset on WHERE utf8mb4_field=utf8mb3_field
-
- Open
-
Correct: ROW_FORMAT=REDUNDANT will reserve n*mbmaxlen bytes for {{CHAR(n)}}. It will do that even when the column is NULL. The idea is to allow update-in-place. The later ROW_FORMAT optimize the space and allocate n*mbminlen to n*mbmaxlen bytes. Trailing space is ‘compressed away’. For example, 'фыва' is 4×2 bytes. If the column were CHAR(8), we would allocate only 8 bytes for it, because the trailing space 'фыва ' does not need to be stored. If there was one more letter before the trailing space, we would have to allocate more space.
It could be that {{CHAR(n)}} with a variable-length character encoding is actually internally stored as VARCHAR, that is, an explicit length is stored. I cannot remember it.
For ROW_FORMAT=REDUNDANT, I think that it should be OK to bend the existing rules and allow an instant extension of the maximum length of the column, that is, allow CHAR(200) to be converted to a bigger mbmaxlen without changing the existing data. This would be a step towards
MDEV-15563, which would allow CHAR or VARCHAR to be instantaneously extended arbitrarily in ROW_FORMAT=REDUNDANT.