Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11079

Regression: LOAD DATA INFILE lost BLOB support using utf8 load files

    Details

    • Sprint:
      5.5.54

      Description

      https://github.com/mysql/mysql-server/commit/774e6ffd0897dd763763b69e15028c1fbd44c4e7 changed the way, load data infile parses the data.

      The commit starts validating the whole load file using the file character set. BLOBs have always been copied 1:1 (no character set translations - only escape sequences are processed). If the load file is using UTF-8, blob columns can never be encoded in UTF-8, as binary data can contain character sequences, which are invalid UTF-8 and there is no charset conversion.

      Starting with this commit, LOAD DATA rejects non-UTF8 sequences in blob fields.

      Create a test file:

      $ hexdump -C test
      00000000  22 25 aa ab ac 22 0a                              |"%ª«¬".|
      

      CREATE TABLE `x` ( `y` mediumblob NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
      load data local infile 'test'  into table x charset utf8 FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n';
      

      This will lead to:

      ERROR 1300 (HY000): Invalid utf8 character string: '"%'
      

      Older MariaDB 10.0.X releases can load this file.

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                serg Sergei Golubchik
                Reporter:
                koeglermar Martin Koegler
              • Votes:
                0 Vote for this issue
                Watchers:
                8 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: