Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24797

Column Compression - ERROR 1265 (01000): Data truncated for column

Details

    Description

      Hello,

      Testing MariaDB 10.5, and first time implementing column compression as part of upgrade.

      Converting normal columns to COMPRESSED columns seem to work ok.

      However, when I want to do an ALTER TABLE later, it appears that I have problems with the compressed columns, even if I'm not changing them.

      Example:

      MariaDB [mydb]> ALTER TABLE `mytable`
      CHANGE `question` `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL,
      CHANGE `col2` `col2` VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

      Query OK, 3708285 rows affected (1 min 8.067 sec)
      Records: 3708285 Duplicates: 0 Warnings: 0

      MariaDB [mydb]> ALTER TABLE `mytable`
      CHANGE `col2` `col2` VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

      ERROR 1265 (01000): Data truncated for column 'question' at row 8362

      MariaDB [mydb]> ALTER TABLE `mytable`
      CHANGE `question` `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL,
      CHANGE `col2` `col2` VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;

      ERROR 1265 (01000): Data truncated for column 'question' at row 8362

      Any ideas why?

      Attachments

        Issue Links

          Activity

            nunop Nuno added a comment -

            marko

            Just tried this again and got the error on the same line (693).

            ALTER TABLE `mytable`
            CHANGE `question` `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL,
            CHANGE `col2` `col2` VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
             
            ALTER TABLE `mytable`
            CHANGE `col2` `col2` VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
            

            > ERROR 1265 (01000): Data truncated for column 'question' at row 693

            I tried this too:

            SELECT tid, question FROM mytable LIMIT 5 OFFSET 690;
            

            So I can see more rows around 693.

            CREATE TABLE mytable2 (col2 VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL);
             
            INSERT INTO mytable2
            SELECT col2, question FROM mytable LIMIT 5 OFFSET 690;
             
            SELECT * FROM mytable2;
             
            ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
            

            > ERROR 1265 (01000): Data truncated for column 'question' at row 3

            I'll come back to you soon.

            nunop Nuno added a comment - marko Just tried this again and got the error on the same line (693). ALTER TABLE `mytable` CHANGE `question` `question` VARCHAR (500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL , CHANGE `col2` `col2` VARCHAR (50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ;   ALTER TABLE `mytable` CHANGE `col2` `col2` VARCHAR (50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; > ERROR 1265 (01000): Data truncated for column 'question' at row 693 I tried this too: SELECT tid, question FROM mytable LIMIT 5 OFFSET 690; So I can see more rows around 693. CREATE TABLE mytable2 (col2 VARCHAR (50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL , `question` VARCHAR (500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL );   INSERT INTO mytable2 SELECT col2, question FROM mytable LIMIT 5 OFFSET 690;   SELECT * FROM mytable2;   ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR (50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; > ERROR 1265 (01000): Data truncated for column 'question' at row 3 I'll come back to you soon.
            nunop Nuno added a comment - - edited

            Ok, so this gives me (one of) the trouble row:

            SELECT tid, question FROM forums_topics LIMIT 1 OFFSET 692;
            

            Has to be N-1.

            It has 129 characters, so there's no reason for it to be "truncated".
            And the characters are normal latin characters. No emojis or anything...

            So I did the test in another way:

            DROP TABLE mytable2;
             
            CREATE TABLE mytable2 (col2 VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL);
             
            INSERT INTO mytable2
            set col2='...',
            question='...';
             
            ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
            

            And I get the same:

            > #1265 - Data truncated for column 'question' at row 1

            Even converting all to "xxx" does it:

            DROP TABLE mytable2;
             
            CREATE TABLE mytable2 (col2 VARCHAR(50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL, `question` VARCHAR(500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL);
             
            INSERT INTO mytable2
            set col2='...',
            question='Xxx xxx xxx xxxxx xxxxx xxxx xxx xxxxx xxx xxxxx xxxx xx xxxx xxxx xxxx xxxxx? .X Xxxx Xxx Xxxxx Xxx Xxxxxx Xxxxxxxxx Xx X Xxx.';
             
            ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR(50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL;
            

            > #1265 - Data truncated for column 'question' at row 1

            (you can use the last one as a test to try to reproduce on your side)

            MariaDB 10.5.12 on CentOS 7.
            (this test was on a Docker container with CentOS 7, but when I reproduced it the first time was on a baremetal CentOS 7)

            nunop Nuno added a comment - - edited Ok, so this gives me (one of) the trouble row: SELECT tid, question FROM forums_topics LIMIT 1 OFFSET 692; Has to be N-1. It has 129 characters, so there's no reason for it to be "truncated". And the characters are normal latin characters. No emojis or anything... So I did the test in another way: DROP TABLE mytable2;   CREATE TABLE mytable2 (col2 VARCHAR (50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL , `question` VARCHAR (500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL );   INSERT INTO mytable2 set col2= '...' , question= '...' ;   ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR (50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; And I get the same: > #1265 - Data truncated for column 'question' at row 1 Even converting all to "xxx" does it: DROP TABLE mytable2;   CREATE TABLE mytable2 (col2 VARCHAR (50) COMPRESSED CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL , `question` VARCHAR (500) COMPRESSED CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci NULL DEFAULT NULL );   INSERT INTO mytable2 set col2= '...' , question= 'Xxx xxx xxx xxxxx xxxxx xxxx xxx xxxxx xxx xxxxx xxxx xx xxxx xxxx xxxx xxxxx? .X Xxxx Xxx Xxxxx Xxx Xxxxxx Xxxxxxxxx Xx X Xxx.' ;   ALTER TABLE mytable2 CHANGE col2 col2 VARCHAR (50) CHARACTER SET utf8 COLLATE utf8_general_ci NOT NULL ; > #1265 - Data truncated for column 'question' at row 1 (you can use the last one as a test to try to reproduce on your side) MariaDB 10.5.12 on CentOS 7. (this test was on a Docker container with CentOS 7, but when I reproduced it the first time was on a baremetal CentOS 7)

            Here is a simpler test case:

            CREATE TABLE t1 (a VARCHAR(500) COMPRESSED CHARACTER SET utf8mb3);
            INSERT INTO t1 SET a=REPEAT('x',127);
            ALTER TABLE t1 FORCE;
            DROP TABLE t1;
            

            The warning is being issued in:

            10.6 1193a793c40b806c6f1f007bbd87f4d9a73e686d

            #1  0x0000555c2cc7d492 in Field::set_warning (this=0x7f9ac4072fc8, 
                level=level@entry=Sql_state_errno_level::WARN_LEVEL_WARN, 
                code=code@entry=1265, cut_increment=cut_increment@entry=1, 
                current_row=current_row@entry=0) at /mariadb/10.6/sql/field.cc:11107
            #2  0x0000555c2cc905f8 in do_varstring2_mb (copy=0x7f9ac4017c60)
                at /mariadb/10.6/sql/field_conv.cc:597
            #3  0x0000555c2cc8feba in do_copy_null (copy=<optimized out>)
                at /mariadb/10.6/sql/field_conv.cc:246
            #4  0x0000555c2ca900b2 in copy_data_between_tables
            

            The problem appears to be an incorrect condition in do_varstring2_mb() that does not take column compression into account:

              uint char_length= (copy->to_length - HA_KEY_BLOB_LENGTH) / cs->mbmaxlen;
              uint from_length= uint2korr(copy->from_ptr);
              const uchar *from_beg= copy->from_ptr + HA_KEY_BLOB_LENGTH;
              Well_formed_prefix prefix(cs, (char*) from_beg, from_length, char_length);
              if (prefix.length() < from_length)
              {
                if (current_thd->count_cuted_fields > CHECK_FIELD_EXPRESSION)
                  copy->to_field->set_warning(Sql_condition::WARN_LEVEL_WARN,
                                              WARN_DATA_TRUNCATED, 1);
              }  
            

            Note: copy->to_length is 1503 and from_length (apparently, the length of the compressed column) is 8 in this case. It was 40 for nunop’s example.

            marko Marko Mäkelä added a comment - Here is a simpler test case: CREATE TABLE t1 (a VARCHAR (500) COMPRESSED CHARACTER SET utf8mb3); INSERT INTO t1 SET a=REPEAT( 'x' ,127); ALTER TABLE t1 FORCE ; DROP TABLE t1; The warning is being issued in: 10.6 1193a793c40b806c6f1f007bbd87f4d9a73e686d #1 0x0000555c2cc7d492 in Field::set_warning (this=0x7f9ac4072fc8, level=level@entry=Sql_state_errno_level::WARN_LEVEL_WARN, code=code@entry=1265, cut_increment=cut_increment@entry=1, current_row=current_row@entry=0) at /mariadb/10.6/sql/field.cc:11107 #2 0x0000555c2cc905f8 in do_varstring2_mb (copy=0x7f9ac4017c60) at /mariadb/10.6/sql/field_conv.cc:597 #3 0x0000555c2cc8feba in do_copy_null (copy=<optimized out>) at /mariadb/10.6/sql/field_conv.cc:246 #4 0x0000555c2ca900b2 in copy_data_between_tables The problem appears to be an incorrect condition in do_varstring2_mb() that does not take column compression into account: uint char_length= (copy->to_length - HA_KEY_BLOB_LENGTH) / cs->mbmaxlen; uint from_length= uint2korr(copy->from_ptr); const uchar *from_beg= copy->from_ptr + HA_KEY_BLOB_LENGTH; Well_formed_prefix prefix(cs, ( char *) from_beg, from_length, char_length); if (prefix.length() < from_length) { if (current_thd->count_cuted_fields > CHECK_FIELD_EXPRESSION) copy->to_field->set_warning(Sql_condition::WARN_LEVEL_WARN, WARN_DATA_TRUNCATED, 1); } Note: copy->to_length is 1503 and from_length (apparently, the length of the compressed column) is 8 in this case. It was 40 for nunop ’s example.
            rjasdfiii Rick James added a comment -

            Another theory: Could it be that zlib (or the wrapper around it) is returning 0 when the string cannot be compressed? That is, the (500) is not really what it is testing against.

            Meanwhile, `REPEAT('x',127)` is highly compressible, hence not useful for borderline tests.

            Note that .pdf and .jpg files are already compressed, thereby making them good candidates for 'large' test cases that are likely to expand when compressed.

            rjasdfiii Rick James added a comment - Another theory: Could it be that zlib (or the wrapper around it) is returning 0 when the string cannot be compressed? That is, the (500) is not really what it is testing against. Meanwhile, `REPEAT('x',127)` is highly compressible, hence not useful for borderline tests. Note that .pdf and .jpg files are already compressed, thereby making them good candidates for 'large' test cases that are likely to expand when compressed.

            For the record, zlib provides the function compressBound(sourceLen) that gives the worst-case compressed size (the expanded size) corresponding to an input data length sourceLen. In MDEV-27634 I learned that a zlib fork exists that makes use of the IBM System Z (s390x) instruction dfltcc. In that version, also compressBound() was changed. That could explain why some non-InnoDB compression tests are failing on s390x.

            marko Marko Mäkelä added a comment - For the record, zlib provides the function compressBound(sourceLen) that gives the worst-case compressed size (the expanded size) corresponding to an input data length sourceLen . In MDEV-27634 I learned that a zlib fork exists that makes use of the IBM System Z (s390x) instruction dfltcc . In that version, also compressBound() was changed. That could explain why some non-InnoDB compression tests are failing on s390x.

            People

              bar Alexander Barkov
              nunop Nuno
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.