[MDEV-15564] Avoid table rebuild in ALTER TABLE on collation or charset changes Created: 2018-03-14 Updated: 2022-05-10 Resolved: 2019-02-14 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Fix Version/s: | 10.4.3 |
| Type: | Task | Priority: | Critical |
| Reporter: | Marko Mäkelä | Assignee: | Eugene Kosov (Inactive) |
| Resolution: | Fixed | Votes: | 3 |
| Labels: | instant, online-ddl | ||
| Issue Links: |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Description |
|
Some character encodings are proper subsets of others. ASCII is a proper subset of latin1 or latin2 or UTF-8. The 3-byte UTF-8 is a proper subset of the 4-byte UTF-8. When the character encoding of a column changes such that the storage encoding does not change, we should avoid rebuilding the table. Likewise, when the collation of a column changes such that the encoding stays the same, we should avoid rebuilding the table. Collations do not matter if the column is not indexed. If the collation of an indexed column changes, then the affected indexes may have to be rebuilt. For certain collations we might avoid that as well. For example, changing the encoding and collation from binary 3-byte to binary 4-byte UTF-8 might not require any change. When the collation change involves a column that is part of the PRIMARY KEY and we have determined that a collation change is necessary, then the whole table will have to be rebuilt. |
| Comments |
| Comment by Eugene Kosov (Inactive) [ 2018-09-05 ] | ||||||||||||||||||||||||
|
Not only SYS_COLUMNS.PRTYPE changes on charset/collation change but also SYS_COLUMNS.LEN could increase f.ex. on ascii -> utf8mb3 or utf8mb3 -> utf8mb4. | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-09-13 ] | ||||||||||||||||||||||||
|
Right. 10.2 already supports instant changes of the maximum length of VARCHAR (in bytes) except when the old length was less than 256 bytes and the new length is at least 256 bytes. Other changes to wider type would be covered by As part of this task, we can keep the existing limitation regarding table rebuilds, but still support ALGORITHM=INPLACE, LOCK=NONE instead of the currently only choice ALGORITHM=COPY. | ||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2018-09-13 ] | ||||||||||||||||||||||||
|
Current open questions: | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-09-13 ] | ||||||||||||||||||||||||
|
First of all, ROW_FORMAT=REDUNDANT basically treats every field as if it is variable-length and possibly NULL. I think that we can allow any size extension with that format. For other formats (COMPACT, DYNAMIC, COMPRESSED) the following holds: 1) Internally, long CHAR (maybe the limit is 255 bytes) is treated as VARCHAR, and I think that it should be possible to instantly extend this. As an optimization (which is not available in ROW_FORMAT=REDUNDANT) for variable-length character sets, CHAR | ||||||||||||||||||||||||
| Comment by Eugene Kosov (Inactive) [ 2018-09-17 ] | ||||||||||||||||||||||||
|
So, CHAR | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-09-18 ] | ||||||||||||||||||||||||
|
Correct: ROW_FORMAT=REDUNDANT will reserve n*mbmaxlen bytes for {{CHAR(n)}}. It will do that even when the column is NULL. The idea is to allow update-in-place. The later ROW_FORMAT optimize the space and allocate n*mbminlen to n*mbmaxlen bytes. Trailing space is ‘compressed away’. For example, 'фыва' is 4×2 bytes. If the column were CHAR(8), we would allocate only 8 bytes for it, because the trailing space 'фыва ' does not need to be stored. If there was one more letter before the trailing space, we would have to allocate more space. It could be that {{CHAR(n)}} with a variable-length character encoding is actually internally stored as VARCHAR, that is, an explicit length is stored. I cannot remember it. For ROW_FORMAT=REDUNDANT, I think that it should be OK to bend the existing rules and allow an instant extension of the maximum length of the column, that is, allow CHAR(200) to be converted to a bigger mbmaxlen without changing the existing data. This would be a step towards | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2018-11-09 ] | ||||||||||||||||||||||||
|
Thanks, this looks generally good. I am mainly requesting some code refactoring. Please document (with examples) in which way the file format is being effectively changed, as witnessed by the change to the CHECK TABLE function btr_index_rec_validate(). We must prevent older MariaDB versions from accessing tables where the file format has been changed. | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-01-16 ] | ||||||||||||||||||||||||
|
Some necessary code changes are common between this and | ||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2019-02-11 ] | ||||||||||||||||||||||||
|
I've taken a look into a number of commits: @kevgs pushed 6 commits. 20cb37e add GBK There is no need to list all ASCII compatible character set names and check them using strcmp().
to know if a character set is not ASCII compatible. Note, SJIS is not ASCII compatible. | ||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2019-02-14 ] | ||||||||||||||||||||||||
|
Due to conflicts with The following bugs were filed for limitations that I think we should try to fix soon: |