[MDEV-24857] CHARACTER_MAXIMUM_LENGTH is equal to CHARACTER_OCTET_LENGTH for utf8mb4 TEXT columns Created: 2021-02-13  Updated: 2021-02-18  Resolved: 2021-02-17

Status: Closed
Project: MariaDB Server
Component/s: Data types
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Nuno Assignee: Sergei Golubchik
Resolution: Not a Bug Votes: 0
Labels: None

Attachments: PNG File image-2021-02-13-01-27-19-176.png    

 Description   

My understanding is that CHARACTER_MAXIMUM_LENGTH is how many characters a field can have, while CHARACTER_OCTET_LENGTH is the actual size in bytes (x3, x4) based on the charset.

Below is a screenshot of information_schema.COLUMNS:

VARCHAR(80) utf8mb4 allows for 80 characters, but the contents may take longer than 80 bytes in storage.

However, a TEXT utf8mb4 doesn't seem to allow for contents longer than 65535 bytes. Accepts 65535 ASCII characters, but not Emoji, for example.

Is this related to the inconsistency of the numbers in the screenshot?



 Comments   
Comment by Sergei Golubchik [ 2021-02-17 ]

This is not a bug, but how TEXT/BLOB field work. They can store at most 65536 bytes. There is no limit on the number of characters, you can put any number of characters that can fit in the 65536 bytes.

CHAR/VARCHAR fields, on the other hand, have a limit measured in characters. CHAR(10) can store only 10 characters, it does not matter how many bytes each individual character will occupy.

Comment by Nuno [ 2021-02-17 ]

serg Thank you very much for your explanation. Would you mind if I ask you one more question, please?

If I have a VARCHAR(80) that is utf8mb4,
and in a row I put a simple "a" on the field,

How much space is the field taking in that row?
Are the 80x4 bytes reserved on every row by default (taking 320 bytes on every row), no matter how many characters are in the field,
or will the field only take 1 byte of disk space?

Thank you very much.

Comment by Sergei Golubchik [ 2021-02-18 ]

See https://mariadb.com/kb/en/data-type-storage-requirements/

80*4 > 255, so your field will take 3 bytes (len + 2, where len is the actual length in bytes of the value)

Comment by Nuno [ 2021-02-18 ]

Thank you very much serg ! Really appreciate your time to answer this.
Have a very great day.

Generated at Thu Feb 08 09:33:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.