Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11217

Regression: LOAD DATA INFILE started to fail with an error



      I create a file with a UTF8MB4 string:

      SELECT CONCAT('aaa',0xF09F988E,'bbb') INTO OUTFILE '/tmp/test.txt';

      where 0xF09F988E is UTF8MB4 encoding for the character "U+1F60E SMILING FACE WITH SUNGLASSES".

      Now I load this file:

      LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET utf8;
      SELECT * FROM t1;

      Notice, the CHARACTER SET utf8 clause is wrong. It should be CHARACTER SET utf8mb4 instead. But the problem is that in 10.0.24 it loaded the data with a warning, and in 10.0.28 it fails with an error.

      Results in 10.0.24:

      MariaDB [test]> DROP TABLE IF EXISTS t1;
      Query OK, 0 rows affected (0.01 sec)
      MariaDB [test]> CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
      Query OK, 0 rows affected (0.01 sec)
      MariaDB [test]> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET utf8;
      Query OK, 1 row affected, 1 warning (0.00 sec)       
      Records: 1  Deleted: 0  Skipped: 0  Warnings: 1
      MariaDB [test]> SHOW WARNINGS;
      | Level   | Code | Message                                                                 |
      | Warning | 1366 | Incorrect string value: '\xF0\x9F\x98\x8Ebb...' for column 'a' at row 1 |
      1 row in set (0.00 sec)
      MariaDB [test]> SELECT * FROM t1;
      | a    |
      | aaa  |

      Results in 10.0.28:

      MariaDB [test]> DROP TABLE IF EXISTS t1;
      Query OK, 0 rows affected (0.02 sec)
      MariaDB [test]> CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET utf8);
      Query OK, 0 rows affected (0.05 sec)
      MariaDB [test]> LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET utf8;
      ERROR 1300 (HY000): Invalid utf8 character string: 'aaa'
      MariaDB [test]> SHOW WARNINGS;                       
      | Level | Code | Message                              |
      | Error | 1300 | Invalid utf8 character string: 'aaa' |
      1 row in set (0.00 sec)
      MariaDB [test]> SELECT * FROM t1;
      Empty set (0.00 sec)

      As a result, replication from a 10.0.24 master to a 10.0.28 slave stops with an error.

      The bug was likely introduced after merging this commit from MySQL:

      commit 9f7288e2e0179db478d20c74f57b5c7d6c95f793
      Author: Thayumanavar S <thayumanavar.x.sachithanantha@oracle.com>
      Date:   Mon Jun 20 11:35:43 2016 +0530
          BUG#23080148 - BACKPORT BUG 14653594 AND BUG 20683959 TO
          The bug asks for a backport of bug#1463594 and bug#20682959. This
          is required because of the fact that if replication is enabled, master
          transaction can commit whereas slave can't commit due to not exact
          'enviroment'. This manifestation is seen in bug#22024200.


        Issue Links


            jeanfrancois.gagne Jean-François Gagné added a comment - - edited

            I think this is not a bug. The failure in 10.0.28 is the right behavior, and the 10.0.24 behavior is wrong. As for [1], I think MariaDB is doing the right thing in making this fail in 10.0.28. Documenting this incompatible change is however important (and I do not think it well documented).

            [1]: https://bugs.mysql.com/bug.php?id=78758

            jeanfrancois.gagne Jean-François Gagné added a comment - - edited I think this is not a bug. The failure in 10.0.28 is the right behavior, and the 10.0.24 behavior is wrong. As for [1] , I think MariaDB is doing the right thing in making this fail in 10.0.28. Documenting this incompatible change is however important (and I do not think it well documented). [1] : https://bugs.mysql.com/bug.php?id=78758

            The best behavior is implemented in 10.2.
            Instead of giving up on a broken byte, it keeps loading more data.
            Please see here for details:

            As for 10.0.x there are now multiple options:
            a. just document it
            b. fix it to be closer to what 10.2 does
            c. fix it to 10.0.24 behavior

            Perhaps "just document it" is the best solution.
            This code used to be quite fragile. In 10.2 we rewrote it in a much clearer way.
            Unfortunately, backporting the 10.2 changes to 10.0 is not possible (too much dependency code).
            And fixing it on top of the old fragile code is not a good idea (there is a risk to add more bugs).

            bar Alexander Barkov added a comment - The best behavior is implemented in 10.2. Instead of giving up on a broken byte, it keeps loading more data. Please see here for details: https://lists.launchpad.net/maria-developers/msg10047.html As for 10.0.x there are now multiple options: a. just document it b. fix it to be closer to what 10.2 does c. fix it to 10.0.24 behavior Perhaps "just document it" is the best solution. This code used to be quite fragile. In 10.2 we rewrote it in a much clearer way. Unfortunately, backporting the 10.2 changes to 10.0 is not possible (too much dependency code). And fixing it on top of the old fragile code is not a good idea (there is a risk to add more bugs).


              bar Alexander Barkov
              bar Alexander Barkov
              2 Vote for this issue
              7 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.