Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11079

Regression: LOAD DATA INFILE lost BLOB support using utf8 load files

    XMLWordPrintable

Details

    • 5.5.54

    Description

      https://github.com/mysql/mysql-server/commit/774e6ffd0897dd763763b69e15028c1fbd44c4e7 changed the way, load data infile parses the data.

      The commit starts validating the whole load file using the file character set. BLOBs have always been copied 1:1 (no character set translations - only escape sequences are processed). If the load file is using UTF-8, blob columns can never be encoded in UTF-8, as binary data can contain character sequences, which are invalid UTF-8 and there is no charset conversion.

      Starting with this commit, LOAD DATA rejects non-UTF8 sequences in blob fields.

      Create a test file:

      $ hexdump -C test
      00000000  22 25 aa ab ac 22 0a                              |"%ª«¬".|
      

      CREATE TABLE `x` ( `y` mediumblob NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
      load data local infile 'test'  into table x charset utf8 FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n';
      

      This will lead to:

      ERROR 1300 (HY000): Invalid utf8 character string: '"%'
      

      Older MariaDB 10.0.X releases can load this file.

      Attachments

        Issue Links

          Activity

            People

              serg Sergei Golubchik
              koeglermar Martin Koegler
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.