Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15938

TINYTEXT CHARACTER SET utf8 COMPRESSED truncates data

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.3(EOL)
    • 10.3.7
    • Server
    • None

    Description

      I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column writes the entire value, as expected:

      SET sql_mode='';
      CREATE OR REPLACE TABLE t1(
        a TINYTEXT CHARACTER SET utf8 COMPRESSED,
        b TINYTEXT CHARACTER SET utf8
      );
      INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
      SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
      

      +----------------+----------------+
      | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
      +----------------+----------------+
      |             84 |            250 |
      +----------------+----------------+
      

      Looks wrong. The expected behavior would be to write all 250 characters into both columns.

      Attachments

        Activity

          bar Alexander Barkov added a comment - - edited

          The same problem is repeatable with utf16:

          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf16 COMPRESSED,
            b TINYTEXT CHARACTER SET utf16
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',100), REPEAT(_latin1'a',100));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          

          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          |             63 |            100 |
          +----------------+----------------+
          

          It should be repeatable if:

          • There is a conversion
          • The column character set has a variable encoding (mbminlen < mbmaxlen): euckr, gb2312, gbk, utf8, utf8mb4, utf16, utf16le, eucjpms
          bar Alexander Barkov added a comment - - edited The same problem is repeatable with utf16: SET sql_mode= '' ; CREATE OR REPLACE TABLE t1( a TINYTEXT CHARACTER SET utf16 COMPRESSED, b TINYTEXT CHARACTER SET utf16 ); INSERT INTO t1 VALUES (REPEAT(_latin1 'a' ,100), REPEAT(_latin1 'a' ,100)); SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1; +----------------+----------------+ | CHAR_LENGTH(a) | CHAR_LENGTH(b) | +----------------+----------------+ | 63 | 100 | +----------------+----------------+ It should be repeatable if: There is a conversion The column character set has a variable encoding (mbminlen < mbmaxlen): euckr, gb2312, gbk, utf8, utf8mb4, utf16, utf16le, eucjpms
          svoj Sergey Vojtovich added a comment - bar , please review fix for this bug https://github.com/MariaDB/server/commit/9d8d3b848142a684a6fdcddf49eccab9c905fb78

          svoj, the patch is fine.
          Please move the utf16 part to a separate test file with:

          -- source include/have_utf16.inc
          

          Probably it's a good idea to make a new file ctype_utf16_compressed.test for this.

          bar Alexander Barkov added a comment - svoj , the patch is fine. Please move the utf16 part to a separate test file with: -- source include/have_utf16.inc Probably it's a good idea to make a new file ctype_utf16_compressed.test for this.

          People

            svoj Sergey Vojtovich
            bar Alexander Barkov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.