[MDEV-11079] Regression: LOAD DATA INFILE lost BLOB support using utf8 load files Created: 2016-10-18 Updated: 2017-03-14 Resolved: 2017-01-09 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Character Sets |
| Affects Version/s: | 5.5.51, 10.0.27, 5.5, 10.0, 10.1 |
| Fix Version/s: | 5.5.55, 10.0.29, 10.1.21 |
| Type: | Bug | Priority: | Major |
| Reporter: | Martin Koegler | Assignee: | Sergei Golubchik |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | contribution, regression, upstream | ||
| Issue Links: |
|
||||||||
| Sprint: | 5.5.54 | ||||||||
| Description |
|
https://github.com/mysql/mysql-server/commit/774e6ffd0897dd763763b69e15028c1fbd44c4e7 changed the way, load data infile parses the data. The commit starts validating the whole load file using the file character set. BLOBs have always been copied 1:1 (no character set translations - only escape sequences are processed). If the load file is using UTF-8, blob columns can never be encoded in UTF-8, as binary data can contain character sequences, which are invalid UTF-8 and there is no charset conversion. Starting with this commit, LOAD DATA rejects non-UTF8 sequences in blob fields. Create a test file:
This will lead to:
Older MariaDB 10.0.X releases can load this file. |
| Comments |
| Comment by Martin Koegler [ 2016-10-18 ] | |||||||||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||||||||
| Comment by Elena Stepanova [ 2016-10-19 ] | |||||||||||||||||||||||||||||||||||
|
Thanks for the report. In server trees, it's this revision:
koeglermar, | |||||||||||||||||||||||||||||||||||
| Comment by Martin Koegler [ 2016-10-28 ] | |||||||||||||||||||||||||||||||||||
|
I filled no upstream bug, because I have not tested with MySQL. PS: JIRA didn't send a email notifications, so I have not noticed the new comment earlier. | |||||||||||||||||||||||||||||||||||
| Comment by Sergey Vojtovich [ 2016-12-20 ] | |||||||||||||||||||||||||||||||||||
|
This change was reverted in last MySQL 5.5 release. Most probably we'll do the same. | |||||||||||||||||||||||||||||||||||
| Comment by Martin Koegler [ 2016-12-27 ] | |||||||||||||||||||||||||||||||||||
|
What about newer version? MySQL 5.7 does not contain a revert. | |||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2017-01-09 ] | |||||||||||||||||||||||||||||||||||
|
In the current 5.5.55 tree it seems to work fine.
and run this SQL script:
It loads the data without errors and returns this result as expected:
10.2 also works fine. | |||||||||||||||||||||||||||||||||||
| Comment by Alexander Barkov [ 2017-01-09 ] | |||||||||||||||||||||||||||||||||||
|
Added a test case into 5.5. | |||||||||||||||||||||||||||||||||||
| Comment by Bill Nokes [ 2017-03-14 ] | |||||||||||||||||||||||||||||||||||
|
Hi, I believe that MariaDB is still treating UTF8 binary data incorrectly while using load infile. I have a test case that seems to introduce an extra escape char into saved binary blob data when using UTF8. Please see linked Jira - MDEV-12240. Kind regards, |