[MDEV-19284] INSTANT ALTER with ucs2-to-utf16 conversion produces bad data Created: 2019-04-19 Updated: 2022-04-15 Resolved: 2019-05-16 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets, Storage Engine - InnoDB |
| Affects Version/s: | 10.4 |
| Fix Version/s: | 10.4.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Alexander Barkov | Assignee: | Alexander Barkov |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||||||||||||||
| Description |
|
utf16 is not a super-set for ucs2 because these two character sets treat high surrogate codes (0xD800..0xDBFF) and low surrogate codes (0xDC00..0xDFFF) differently:
Non-instant ALTER catches such bad conversion attempts:
Instant ALTER does not catch surrogates and alters the table silently, so bad data is possible after ALTER:
Notice, in the last line OCTET_LENGTH(a) is greater than 0, while CHAR_LENGTH(a) is 0, which is not possible normally. There are two ways to fix this:
The former is probably preferable, but can bring previous version compatibility issues. If we ever disallow surrogates in ucs2, we should probably also disallow them in all other character sets, e.g. utf8, utf8mb4, utf32. |
| Comments |
| Comment by Marko Mäkelä [ 2019-04-23 ] | |||||||||||||
|
I see one test failure that demonstrates a regression:
For ROW_FORMAT=REDUNDANT, we can and should allow an instantaneous conversion of a VARCHAR column from 50*3 bytes to 200*3 bytes. For other InnoDB ROW_FORMAT, we cannot allow this, because the maximum length is growing from 128‥255 bytes to more than 255 bytes. It seems that the logic that is present in Field_varstring::is_equal() is not being correctly applied in all cases. Note: table->file->ha_table_flags() & HA_EXTENDED_TYPES_CONVERSION will distinguish ROW_FORMAT=REDUNDANT. On a related note (see MDEV-18584), for CHAR where mbminlen<mbmaxlen, InnoDB would internally use a variable-length encoding of n*mbminlen‥n*mbmaxlen bytes except when ROW_FORMAT=REDUNDANT. Such CHAR columns can allow instantaneous changes, say, from utf8mb3 to utf8mb4, provided that the n*mbmaxlen is not growing from 128‥255 bytes to more than 255 bytes. |