[MDEV-24857] CHARACTER_MAXIMUM_LENGTH is equal to CHARACTER_OCTET_LENGTH for utf8mb4 TEXT columns Created: 2021-02-13 Updated: 2021-02-18 Resolved: 2021-02-17 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Data types |
| Fix Version/s: | N/A |
| Type: | Task | Priority: | Major |
| Reporter: | Nuno | Assignee: | Sergei Golubchik |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
| Description |
|
My understanding is that CHARACTER_MAXIMUM_LENGTH is how many characters a field can have, while CHARACTER_OCTET_LENGTH is the actual size in bytes (x3, x4) based on the charset. Below is a screenshot of information_schema.COLUMNS: VARCHAR(80) utf8mb4 allows for 80 characters, but the contents may take longer than 80 bytes in storage. However, a TEXT utf8mb4 doesn't seem to allow for contents longer than 65535 bytes. Accepts 65535 ASCII characters, but not Emoji, for example. Is this related to the inconsistency of the numbers in the screenshot? |
| Comments |
| Comment by Sergei Golubchik [ 2021-02-17 ] |
|
This is not a bug, but how TEXT/BLOB field work. They can store at most 65536 bytes. There is no limit on the number of characters, you can put any number of characters that can fit in the 65536 bytes. CHAR/VARCHAR fields, on the other hand, have a limit measured in characters. CHAR(10) can store only 10 characters, it does not matter how many bytes each individual character will occupy. |
| Comment by Nuno [ 2021-02-17 ] |
|
serg Thank you very much for your explanation. Would you mind if I ask you one more question, please? If I have a VARCHAR(80) that is utf8mb4, How much space is the field taking in that row? Thank you very much. |
| Comment by Sergei Golubchik [ 2021-02-18 ] |
|
See https://mariadb.com/kb/en/data-type-storage-requirements/ 80*4 > 255, so your field will take 3 bytes (len + 2, where len is the actual length in bytes of the value) |
| Comment by Nuno [ 2021-02-18 ] |
|
Thank you very much serg ! Really appreciate your time to answer this. |