Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9823

LOAD DATA INFILE silently truncates incomplete byte sequences

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL)
    • 10.2.0
    • Character Sets
    • None

    Description

      If I insert an incomplete multi-byte character into a table:

      DROP TABLE IF EXISTS t1;
      CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET ujis);
      INSERT INTO t1 VALUES (0x8FA1);
      SHOW WARNINGS;
      SELECT HEX(a) FROM t1;
      

      it correctly returns a warning:

      +---------+------+------------------------------------------------------------+
      | Level   | Code | Message                                                    |
      +---------+------+------------------------------------------------------------+
      | Warning | 1366 | Incorrect string value: '\x8F\xA1' for column 'a' at row 1 |
      +---------+------+------------------------------------------------------------+
      

      and translates every byte in the incomplete sequence to QUESTION MARK:

      +--------+
      | HEX(a) |
      +--------+
      | 3F3F   |
      +--------+
      

      If I put the same sequence into a file:

      printf "\x8F\xA1"> /tmp/test.txt
      

      and load it:

      DELETE FROM t1;
      LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET ujis;
      SELECT HEX(a) FROM t1;
      

      it returns no warnings and truncates the incomplete sequence:

      +--------+
      | HEX(a) |
      +--------+
      |        |
      +--------+
      

      LOAD should be fixed to work consistently with INSERT.

      Attachments

        Issue Links

          Activity

            bar Alexander Barkov created issue -
            bar Alexander Barkov made changes -
            Field Original Value New Value
            bar Alexander Barkov made changes -
            Description If I insert an incomplete multi-byte character into a table:
            {code:sql}
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET ujis);
            INSERT INTO t1 VALUES (0x8FA1);
            SHOW WARNINGS;
            SELECT HEX(a) FROM t1;
            {code:sql}
            it correctly returns a warning:
            {noformat}
            +---------+------+------------------------------------------------------------+
            | Level | Code | Message |
            +---------+------+------------------------------------------------------------+
            | Warning | 1366 | Incorrect string value: '\x8F\xA1' for column 'a' at row 1 |
            +---------+------+------------------------------------------------------------+
            {noformat}
            and translates every byte in the incomplete sequence to QUESTION MARK:
            {noformat}
            +--------+
            | HEX(a) |
            +--------+
            | 3F3F |
            +--------+
            {noformat}


            If I put the same sequence into a file:
            {code:shell}
            printf "\x8F\xA1"> /tmp/test.txt
            {code}
            and load it:
            {code:sql}
            DELETE FROM t1;
            LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET ujis;
            SELECT HEX(a) FROM t1;
            {code}
            it returns no warnings and truncates the incomplete sequence:
            {noformat}
            +--------+
            | HEX(a) |
            +--------+
            | |
            +--------+
            {noformat}

            LOAD should be fixed to work consistently with INSERT.
            If I insert an incomplete multi-byte character into a table:
            {code:sql}
            DROP TABLE IF EXISTS t1;
            CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET ujis);
            INSERT INTO t1 VALUES (0x8FA1);
            SHOW WARNINGS;
            SELECT HEX(a) FROM t1;
            {code}
            it correctly returns a warning:
            {noformat}
            +---------+------+------------------------------------------------------------+
            | Level | Code | Message |
            +---------+------+------------------------------------------------------------+
            | Warning | 1366 | Incorrect string value: '\x8F\xA1' for column 'a' at row 1 |
            +---------+------+------------------------------------------------------------+
            {noformat}
            and translates every byte in the incomplete sequence to QUESTION MARK:
            {noformat}
            +--------+
            | HEX(a) |
            +--------+
            | 3F3F |
            +--------+
            {noformat}


            If I put the same sequence into a file:
            {code:shell}
            printf "\x8F\xA1"> /tmp/test.txt
            {code}
            and load it:
            {code:sql}
            DELETE FROM t1;
            LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET ujis;
            SELECT HEX(a) FROM t1;
            {code}
            it returns no warnings and truncates the incomplete sequence:
            {noformat}
            +--------+
            | HEX(a) |
            +--------+
            | |
            +--------+
            {noformat}

            LOAD should be fixed to work consistently with INSERT.
            bar Alexander Barkov made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            bar Alexander Barkov made changes -
            Assignee Alexander Barkov [ bar ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Alexander Barkov [ bar ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            bar Alexander Barkov made changes -
            Fix Version/s 10.2.0 [ 20700 ]
            Fix Version/s 10.2 [ 14601 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 74691 ] MariaDB v4 [ 150270 ]

            People

              bar Alexander Barkov
              bar Alexander Barkov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.