Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.5.51, 10.0.27, 5.5(EOL), 10.0(EOL), 10.1(EOL)
-
5.5.54
Description
https://github.com/mysql/mysql-server/commit/774e6ffd0897dd763763b69e15028c1fbd44c4e7 changed the way, load data infile parses the data.
The commit starts validating the whole load file using the file character set. BLOBs have always been copied 1:1 (no character set translations - only escape sequences are processed). If the load file is using UTF-8, blob columns can never be encoded in UTF-8, as binary data can contain character sequences, which are invalid UTF-8 and there is no charset conversion.
Starting with this commit, LOAD DATA rejects non-UTF8 sequences in blob fields.
Create a test file:
$ hexdump -C test
|
00000000 22 25 aa ab ac 22 0a |"%ª«¬".|
|
CREATE TABLE `x` ( `y` mediumblob NOT NULL) ENGINE=MyISAM DEFAULT CHARSET=latin1 COLLATE=latin1_general_ci;
|
load data local infile 'test' into table x charset utf8 FIELDS TERMINATED BY ';' ENCLOSED BY '"' ESCAPED BY '\\' LINES TERMINATED BY '\n';
|
This will lead to:
ERROR 1300 (HY000): Invalid utf8 character string: '"%'
|
Older MariaDB 10.0.X releases can load this file.
Attachments
Issue Links
- relates to
-
MDEV-12240 LOAD DATA INFILE binary blobs failing for UTF8
-
- Confirmed
-
--- sql/sql_load.cc.orig 2016-10-18 17:45:32.156615718 +0200
+++ sql/sql_load.cc 2016-10-18 17:49:19.990569542 +0200
@@ -90,7 +90,7 @@
String &field_term,String &line_start,String &line_term,
String &enclosed,int escape,bool get_it_from_net, bool is_fifo);
~READ_INFO();
- int read_field();
+ int read_field(CHARSET_INFO *field_charset);
int read_fixed_length(void);
int next_line(void);
char unescape(char chr);
@@ -1040,7 +1040,7 @@
uchar *pos;
Item *real_item;
- if (read_info.read_field())
+ if (read_info.read_field(item->real_item()->collation.collation))
break;
/* If this line is to be skipped we don't want to fill field or var */
@@ -1527,10 +1527,13 @@
}
-int READ_INFO::read_field()
+int READ_INFO::read_field(CHARSET_INFO *field_charset)
{
int chr,found_enclosed_char;
uchar *to,*new_buffer;
+ CHARSET_INFO *read_charset = this->read_charset;
+ if (field_charset == &my_charset_bin)
+ read_charset = &my_charset_bin;
found_null=0;
if (found_end_of_line)