[MDEV-15938] TINYTEXT CHARACTER SET utf8 COMPRESSED truncates data Created: 2018-04-20  Updated: 2018-04-30  Resolved: 2018-04-30

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.3
Fix Version/s: 10.3.7

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Sergey Vojtovich
Resolution: Fixed Votes: 0
Labels: None


 Description   

I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column writes the entire value, as expected:

SET sql_mode='';
CREATE OR REPLACE TABLE t1(
  a TINYTEXT CHARACTER SET utf8 COMPRESSED,
  b TINYTEXT CHARACTER SET utf8
);
INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;

+----------------+----------------+
| CHAR_LENGTH(a) | CHAR_LENGTH(b) |
+----------------+----------------+
|             84 |            250 |
+----------------+----------------+

Looks wrong. The expected behavior would be to write all 250 characters into both columns.



 Comments   
Comment by Alexander Barkov [ 2018-04-20 ]

The same problem is repeatable with utf16:

SET sql_mode='';
CREATE OR REPLACE TABLE t1(
  a TINYTEXT CHARACTER SET utf16 COMPRESSED,
  b TINYTEXT CHARACTER SET utf16
);
INSERT INTO t1 VALUES (REPEAT(_latin1'a',100), REPEAT(_latin1'a',100));
SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;

+----------------+----------------+
| CHAR_LENGTH(a) | CHAR_LENGTH(b) |
+----------------+----------------+
|             63 |            100 |
+----------------+----------------+

It should be repeatable if:

  • There is a conversion
  • The column character set has a variable encoding (mbminlen < mbmaxlen): euckr, gb2312, gbk, utf8, utf8mb4, utf16, utf16le, eucjpms
Comment by Sergey Vojtovich [ 2018-04-25 ]

bar, please review fix for this bug
https://github.com/MariaDB/server/commit/9d8d3b848142a684a6fdcddf49eccab9c905fb78

Comment by Alexander Barkov [ 2018-04-27 ]

svoj, the patch is fine.
Please move the utf16 part to a separate test file with:

-- source include/have_utf16.inc

Probably it's a good idea to make a new file ctype_utf16_compressed.test for this.

Generated at Thu Feb 08 08:25:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.