[MDEV-24797] Column Compression - ERROR 1265 (01000): Data truncated for column - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Duplicate
Affects Version/s: 10.3(EOL), 10.4(EOL), 10.5, 10.6, 10.7(EOL)
Fix Version/s: 10.4.32, 10.5.23, 10.6.16, 10.10.7, 10.11.6, 11.0.4
Component/s: Data Definition - Alter Table, Data Manipulation - Insert
Labels:
None

Description

Hello,

Testing MariaDB 10.5, and first time implementing column compression as part of upgrade.

Converting normal columns to COMPRESSED columns seem to work ok.

However, when I want to do an ALTER TABLE later, it appears that I have problems with the compressed columns, even if I'm not changing them.

Example:

MariaDB [mydb]> ALTER TABLE `mytable`
CHANGE `question` `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL,
CHANGE `col2` `col2` VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

Query OK, 3708285 rows affected (1 min 8.067 sec)
Records: 3708285 Duplicates: 0 Warnings: 0

MariaDB [mydb]> ALTER TABLE `mytable`
CHANGE `col2` `col2` VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

ERROR 1265 (01000): Data truncated for column 'question' at row 8362

MariaDB [mydb]> ALTER TABLE `mytable`
CHANGE `question` `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL,
CHANGE `col2` `col2` VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

ERROR 1265 (01000): Data truncated for column 'question' at row 8362

–

Any ideas why?

Attachments

Issue Links

duplicates

MDEV-31724 Compressed varchar values lost on joins when sorting on columns from joined table(s)

Closed

is caused by

MDEV-11371 Big column compressed

Closed

relates to

MDEV-22367 Remove write support for ROW_FORMAT=COMPRESSED

Closed

MDEV-22839 ROW_FORMAT=COMPRESSED vs PAGE_COMPRESSION=1 - size comparison

Open

Activity

Ascending order - Click to sort in descending order

View 7 older comments

Nuno added a comment - 2021-09-23 17:35

marko

Just tried this again and got the error on the same line (693).

ALTER TABLE `mytable`

CHANGE `question` `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL,

CHANGE `col2` `col2` VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

ALTER TABLE `mytable`

CHANGE `col2` `col2` VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

> ERROR 1265 (01000): Data truncated for column 'question' at row 693

I tried this too:

SELECT tid, question FROM mytable LIMIT 5 OFFSET 690;

So I can see more rows around 693.

CREATE TABLE mytable2 (col2 VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL);

INSERT INTO mytable2

SELECT col2, question FROM mytable LIMIT 5 OFFSET 690;

SELECT * FROM mytable2;

ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

> ERROR 1265 (01000): Data truncated for column 'question' at row 3

I'll come back to you soon.

Nuno added a comment - 2021-09-23 17:35 marko Just tried this again and got the error on the same line (693). ALTER TABLE `mytable` CHANGE `question` `question` VARCHAR (500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL , CHANGE `col2` `col2` VARCHAR (50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; ALTER TABLE `mytable` CHANGE `col2` `col2` VARCHAR (50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; > ERROR 1265 (01000): Data truncated for column 'question' at row 693 I tried this too: SELECT tid, question FROM mytable LIMIT 5 OFFSET 690; So I can see more rows around 693. CREATE TABLE mytable2 (col2 VARCHAR (50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL , `question` VARCHAR (500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL ); INSERT INTO mytable2 SELECT col2, question FROM mytable LIMIT 5 OFFSET 690; SELECT * FROM mytable2; ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR (50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; > ERROR 1265 (01000): Data truncated for column 'question' at row 3 I'll come back to you soon.

Nuno added a comment - 2021-09-23 17:48 - edited

Ok, so this gives me (one of) the trouble row:

SELECT tid, question FROM forums_topics LIMIT 1 OFFSET 692;

Has to be N-1.

It has 129 characters, so there's no reason for it to be "truncated".
And the characters are normal latin characters. No emojis or anything...

So I did the test in another way:

DROP TABLE mytable2;

CREATE TABLE mytable2 (col2 VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL);

INSERT INTO mytable2

set col2='...',

question='...';

ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

And I get the same:

> #1265 - Data truncated for column 'question' at row 1

Even converting all to "xxx" does it:

DROP TABLE mytable2;

CREATE TABLE mytable2 (col2 VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL);

INSERT INTO mytable2

set col2='...',

question='Xxx xxx xxx xxxxx xxxxx xxxx xxx xxxxx xxx xxxxx xxxx xx xxxx xxxx xxxx xxxxx? .X Xxxx Xxx Xxxxx Xxx Xxxxxx Xxxxxxxxx Xx X Xxx.';

ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

> #1265 - Data truncated for column 'question' at row 1

(you can use the last one as a test to try to reproduce on your side)

MariaDB 10.5.12 on CentOS 7.
(this test was on a Docker container with CentOS 7, but when I reproduced it the first time was on a baremetal CentOS 7)

Nuno added a comment - 2021-09-23 17:48 - edited Ok, so this gives me (one of) the trouble row: SELECT tid, question FROM forums_topics LIMIT 1 OFFSET 692; Has to be N-1. It has 129 characters, so there's no reason for it to be "truncated". And the characters are normal latin characters. No emojis or anything... So I did the test in another way: DROP TABLE mytable2; CREATE TABLE mytable2 (col2 VARCHAR (50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL , `question` VARCHAR (500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL ); INSERT INTO mytable2 set col2= '...' , question= '...' ; ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR (50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; And I get the same: > #1265 - Data truncated for column 'question' at row 1 Even converting all to "xxx" does it: DROP TABLE mytable2; CREATE TABLE mytable2 (col2 VARCHAR (50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL , `question` VARCHAR (500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL ); INSERT INTO mytable2 set col2= '...' , question= 'Xxx xxx xxx xxxxx xxxxx xxxx xxx xxxxx xxx xxxxx xxxx xx xxxx xxxx xxxx xxxxx? .X Xxxx Xxx Xxxxx Xxx Xxxxxx Xxxxxxxxx Xx X Xxx.' ; ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR (50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; > #1265 - Data truncated for column 'question' at row 1 (you can use the last one as a test to try to reproduce on your side) MariaDB 10.5.12 on CentOS 7. (this test was on a Docker container with CentOS 7, but when I reproduced it the first time was on a baremetal CentOS 7)

Marko Mäkelä added a comment - 2021-10-25 10:32

Here is a simpler test case:

CREATE TABLE t1 (a VARCHAR(500) COMPRESSED CHARACTER SET utf8mb3);

INSERT INTO t1 SET a=REPEAT('x',127);

ALTER TABLE t1 FORCE;

DROP TABLE t1;

The warning is being issued in:

10.6 1193a793c40b806c6f1f007bbd87f4d9a73e686d
#1 0x0000555c2cc7d492 in Field::set_warning (this=0x7f9ac4072fc8,
level=level@entry=Sql_state_errno_level::WARN_LEVEL_WARN,
code=code@entry=1265, cut_increment=cut_increment@entry=1,
current_row=current_row@entry=0) at /mariadb/10.6/sql/field.cc:11107
#2 0x0000555c2cc905f8 in do_varstring2_mb (copy=0x7f9ac4017c60)
at /mariadb/10.6/sql/field_conv.cc:597
#3 0x0000555c2cc8feba in do_copy_null (copy=<optimized out>)
at /mariadb/10.6/sql/field_conv.cc:246
#4 0x0000555c2ca900b2 in copy_data_between_tables

The problem appears to be an incorrect condition in do_varstring2_mb() that does not take column compression into account:

  uint char_length= (copy->to_length - HA_KEY_BLOB_LENGTH) / cs->mbmaxlen;

  uint from_length= uint2korr(copy->from_ptr);

  const uchar *from_beg= copy->from_ptr + HA_KEY_BLOB_LENGTH;

  Well_formed_prefix prefix(cs, (char*) from_beg, from_length, char_length);

  if (prefix.length() < from_length)

    if (current_thd->count_cuted_fields > CHECK_FIELD_EXPRESSION)

      copy->to_field->set_warning(Sql_condition::WARN_LEVEL_WARN,

                                  WARN_DATA_TRUNCATED, 1);

Note: copy->to_length is 1503 and from_length (apparently, the length of the compressed column) is 8 in this case. It was 40 for nunop’s example.

Marko Mäkelä added a comment - 2021-10-25 10:32 Here is a simpler test case: CREATE TABLE t1 (a VARCHAR (500) COMPRESSED CHARACTER SET utf8mb3); INSERT INTO t1 SET a=REPEAT( 'x' ,127); ALTER TABLE t1 FORCE ; DROP TABLE t1; The warning is being issued in: 10.6 1193a793c40b806c6f1f007bbd87f4d9a73e686d #1 0x0000555c2cc7d492 in Field::set_warning (this=0x7f9ac4072fc8, level=level@entry=Sql_state_errno_level::WARN_LEVEL_WARN, code=code@entry=1265, cut_increment=cut_increment@entry=1, current_row=current_row@entry=0) at /mariadb/10.6/sql/field.cc:11107 #2 0x0000555c2cc905f8 in do_varstring2_mb (copy=0x7f9ac4017c60) at /mariadb/10.6/sql/field_conv.cc:597 #3 0x0000555c2cc8feba in do_copy_null (copy=<optimized out>) at /mariadb/10.6/sql/field_conv.cc:246 #4 0x0000555c2ca900b2 in copy_data_between_tables The problem appears to be an incorrect condition in do_varstring2_mb() that does not take column compression into account: uint char_length= (copy->to_length - HA_KEY_BLOB_LENGTH) / cs->mbmaxlen; uint from_length= uint2korr(copy->from_ptr); const uchar *from_beg= copy->from_ptr + HA_KEY_BLOB_LENGTH; Well_formed_prefix prefix(cs, ( char *) from_beg, from_length, char_length); if (prefix.length() < from_length) { if (current_thd->count_cuted_fields > CHECK_FIELD_EXPRESSION) copy->to_field->set_warning(Sql_condition::WARN_LEVEL_WARN, WARN_DATA_TRUNCATED, 1); } Note: copy->to_length is 1503 and from_length (apparently, the length of the compressed column) is 8 in this case. It was 40 for nunop ’s example.

Rick James added a comment - 2022-01-20 19:09

Another theory: Could it be that zlib (or the wrapper around it) is returning 0 when the string cannot be compressed? That is, the (500) is not really what it is testing against.

Meanwhile, `REPEAT('x',127)` is highly compressible, hence not useful for borderline tests.

Note that .pdf and .jpg files are already compressed, thereby making them good candidates for 'large' test cases that are likely to expand when compressed.

Rick James added a comment - 2022-01-20 19:09 Another theory: Could it be that zlib (or the wrapper around it) is returning 0 when the string cannot be compressed? That is, the (500) is not really what it is testing against. Meanwhile, `REPEAT('x',127)` is highly compressible, hence not useful for borderline tests. Note that .pdf and .jpg files are already compressed, thereby making them good candidates for 'large' test cases that are likely to expand when compressed.

Marko Mäkelä added a comment - 2022-09-02 08:09

For the record, zlib provides the function compressBound(sourceLen) that gives the worst-case compressed size (the expanded size) corresponding to an input data length sourceLen. In ~~MDEV-27634~~ I learned that a zlib fork exists that makes use of the IBM System Z (s390x) instruction dfltcc. In that version, also compressBound() was changed. That could explain why some non-InnoDB compression tests are failing on s390x.

Marko Mäkelä added a comment - 2022-09-02 08:09 For the record, zlib provides the function compressBound(sourceLen) that gives the worst-case compressed size (the expanded size) corresponding to an input data length sourceLen . In MDEV-27634 I learned that a zlib fork exists that makes use of the IBM System Z (s390x) instruction dfltcc . In that version, also compressBound() was changed. That could explain why some non-InnoDB compression tests are failing on s390x.

MariaDB Server

Column Compression - ERROR 1265 (01000): Data truncated for column

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration