Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15938

TINYTEXT CHARACTER SET utf8 COMPRESSED truncates data

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.3(EOL)
    • 10.3.7
    • Server
    • None

    Description

      I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column writes the entire value, as expected:

      SET sql_mode='';
      CREATE OR REPLACE TABLE t1(
        a TINYTEXT CHARACTER SET utf8 COMPRESSED,
        b TINYTEXT CHARACTER SET utf8
      );
      INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
      SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
      

      +----------------+----------------+
      | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
      +----------------+----------------+
      |             84 |            250 |
      +----------------+----------------+
      

      Looks wrong. The expected behavior would be to write all 250 characters into both columns.

      Attachments

        Activity

          bar Alexander Barkov created issue -
          bar Alexander Barkov made changes -
          Field Original Value New Value
          Description I insert a string consisting of 250 ASCII characters in two TINYTEXT column (compressed and non-compressed). The compressed column truncates data. The non-compressed column write the entire value:

          {code:sql}
          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf8 COMPRESSED,
            b TINYTEXT CHARACTER SET utf8
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          {code}
          {noformat}
          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          | 84 | 250 |
          +----------------+----------------+
          {noformat}

          Looks wrong. The expected behavior would be to write all {{250}} characters into both columns.
          I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column write the entire value:

          {code:sql}
          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf8 COMPRESSED,
            b TINYTEXT CHARACTER SET utf8
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          {code}
          {noformat}
          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          | 84 | 250 |
          +----------------+----------------+
          {noformat}

          Looks wrong. The expected behavior would be to write all {{250}} characters into both columns.
          bar Alexander Barkov made changes -
          Description I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column write the entire value:

          {code:sql}
          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf8 COMPRESSED,
            b TINYTEXT CHARACTER SET utf8
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          {code}
          {noformat}
          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          | 84 | 250 |
          +----------------+----------------+
          {noformat}

          Looks wrong. The expected behavior would be to write all {{250}} characters into both columns.
          I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column writes the entire value:

          {code:sql}
          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf8 COMPRESSED,
            b TINYTEXT CHARACTER SET utf8
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          {code}
          {noformat}
          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          | 84 | 250 |
          +----------------+----------------+
          {noformat}

          Looks wrong. The expected behavior would be to write all {{250}} characters into both columns.
          bar Alexander Barkov made changes -
          Description I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column writes the entire value:

          {code:sql}
          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf8 COMPRESSED,
            b TINYTEXT CHARACTER SET utf8
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          {code}
          {noformat}
          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          | 84 | 250 |
          +----------------+----------------+
          {noformat}

          Looks wrong. The expected behavior would be to write all {{250}} characters into both columns.
          I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column writes the entire value, as exprected:

          {code:sql}
          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf8 COMPRESSED,
            b TINYTEXT CHARACTER SET utf8
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          {code}
          {noformat}
          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          | 84 | 250 |
          +----------------+----------------+
          {noformat}

          Looks wrong. The expected behavior would be to write all {{250}} characters into both columns.
          bar Alexander Barkov made changes -
          Description I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column writes the entire value, as exprected:

          {code:sql}
          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf8 COMPRESSED,
            b TINYTEXT CHARACTER SET utf8
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          {code}
          {noformat}
          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          | 84 | 250 |
          +----------------+----------------+
          {noformat}

          Looks wrong. The expected behavior would be to write all {{250}} characters into both columns.
          I insert an latin1 string consisting of 250 ASCII characters in two TINYTEXT columns (compressed and non-compressed) with CHARACTER SET utf8. The compressed column truncates data. The non-compressed column writes the entire value, as expected:

          {code:sql}
          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf8 COMPRESSED,
            b TINYTEXT CHARACTER SET utf8
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',250), REPEAT(_latin1'a',250));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          {code}
          {noformat}
          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          | 84 | 250 |
          +----------------+----------------+
          {noformat}

          Looks wrong. The expected behavior would be to write all {{250}} characters into both columns.
          bar Alexander Barkov added a comment - - edited

          The same problem is repeatable with utf16:

          SET sql_mode='';
          CREATE OR REPLACE TABLE t1(
            a TINYTEXT CHARACTER SET utf16 COMPRESSED,
            b TINYTEXT CHARACTER SET utf16
          );
          INSERT INTO t1 VALUES (REPEAT(_latin1'a',100), REPEAT(_latin1'a',100));
          SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1;
          

          +----------------+----------------+
          | CHAR_LENGTH(a) | CHAR_LENGTH(b) |
          +----------------+----------------+
          |             63 |            100 |
          +----------------+----------------+
          

          It should be repeatable if:

          • There is a conversion
          • The column character set has a variable encoding (mbminlen < mbmaxlen): euckr, gb2312, gbk, utf8, utf8mb4, utf16, utf16le, eucjpms
          bar Alexander Barkov added a comment - - edited The same problem is repeatable with utf16: SET sql_mode= '' ; CREATE OR REPLACE TABLE t1( a TINYTEXT CHARACTER SET utf16 COMPRESSED, b TINYTEXT CHARACTER SET utf16 ); INSERT INTO t1 VALUES (REPEAT(_latin1 'a' ,100), REPEAT(_latin1 'a' ,100)); SELECT CHAR_LENGTH(a), CHAR_LENGTH(b) FROM t1; +----------------+----------------+ | CHAR_LENGTH(a) | CHAR_LENGTH(b) | +----------------+----------------+ | 63 | 100 | +----------------+----------------+ It should be repeatable if: There is a conversion The column character set has a variable encoding (mbminlen < mbmaxlen): euckr, gb2312, gbk, utf8, utf8mb4, utf16, utf16le, eucjpms
          svoj Sergey Vojtovich made changes -
          Status Open [ 1 ] Confirmed [ 10101 ]
          svoj Sergey Vojtovich added a comment - bar , please review fix for this bug https://github.com/MariaDB/server/commit/9d8d3b848142a684a6fdcddf49eccab9c905fb78
          svoj Sergey Vojtovich made changes -
          Assignee Sergey Vojtovich [ svoj ] Alexander Barkov [ bar ]
          Status Confirmed [ 10101 ] In Review [ 10002 ]

          svoj, the patch is fine.
          Please move the utf16 part to a separate test file with:

          -- source include/have_utf16.inc
          

          Probably it's a good idea to make a new file ctype_utf16_compressed.test for this.

          bar Alexander Barkov added a comment - svoj , the patch is fine. Please move the utf16 part to a separate test file with: -- source include/have_utf16.inc Probably it's a good idea to make a new file ctype_utf16_compressed.test for this.
          bar Alexander Barkov made changes -
          Status In Review [ 10002 ] Stalled [ 10000 ]
          bar Alexander Barkov made changes -
          Assignee Alexander Barkov [ bar ] Sergey Vojtovich [ svoj ]
          svoj Sergey Vojtovich made changes -
          issue.field.resolutiondate 2018-04-30 15:35:04.0 2018-04-30 15:35:04.471
          svoj Sergey Vojtovich made changes -
          Component/s Server [ 13907 ]
          Fix Version/s 10.3.7 [ 23005 ]
          Fix Version/s 10.3 [ 22126 ]
          Resolution Fixed [ 1 ]
          Status Stalled [ 10000 ] Closed [ 6 ]
          serg Sergei Golubchik made changes -
          Workflow MariaDB v3 [ 86665 ] MariaDB v4 [ 154194 ]

          People

            svoj Sergey Vojtovich
            bar Alexander Barkov
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.