Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.5.51, 10.0.27, 5.5(EOL), 10.0(EOL), 10.1(EOL)
-
5.5.54
Description
https://github.com/mysql/mysql-server/commit/774e6ffd0897dd763763b69e15028c1fbd44c4e7 changed the way, load data infile parses the data.
The commit starts validating the whole load file using the file character set. BLOBs have always been copied 1:1 (no character set translations - only escape sequences are processed). If the load file is using UTF-8, blob columns can never be encoded in UTF-8, as binary data can contain character sequences, which are invalid UTF-8 and there is no charset conversion.
Starting with this commit, LOAD DATA rejects non-UTF8 sequences in blob fields.
Create a test file:
$ hexdump -C test
|
00000000 22 25 aa ab ac 22 0a |"%ª«¬".|
|
CREATE TABLE `x` ( `y` mediumblob NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
|
load data local infile 'test' into table x charset utf8 FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n';
|
This will lead to:
ERROR 1300 (HY000): Invalid utf8 character string: '"%'
|
Older MariaDB 10.0.X releases can load this file.
Attachments
Issue Links
- relates to
-
MDEV-12240 LOAD DATA INFILE binary blobs failing for UTF8
- Confirmed