[MDEV-26338] BINARY data differences after SELECT INTO OUTFILE / LOAD DATA INFILE Created: 2021-08-10 Updated: 2021-11-01 Resolved: 2021-10-25 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | N/A |
| Affects Version/s: | 10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.4 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Hartmut Holzgraefe | Assignee: | Unassigned |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | need_feedback | ||
| Issue Links: |
|
||||||||
| Description |
|
When having a BINARY or VARBINARY column and trying to export contents with SELECT INTO OUTFILE, then reading back with LOAD DATA INFILE, data can become mangled depending on character set / encoding. E.g.:
I would expect both SELECT statements to return the same result, but instead I'm getting:
On the 2nd table, the one filled via LOAD DATA, I would assume to see the same result, seeing a byte value of E0 in the HEX(b1) column. But instead I'm seeing an empty b1 column in MariaDB versions up to 10.0, and a value of 3F ('?') starting with MariaDB 10.1. Result on MariaDB 5.5 and 10.0:
Result on MariaDB 10.1 and beyond:
The problem here is that the E0 byte is interpreted as the beginning of an UTF8 encoded multi byte sequence, but with the b1 column being of type VARBINARY no UTF8 sequence handling should be applied, the E0 byte should be interpreted verbatim instead. |
| Comments |
| Comment by Elena Stepanova [ 2021-08-18 ] | ||||||||||||||||||||||
|
Isn't it expected though? When you do SELECT .. INTO OUTFILE .. CHARACTER SET utf8, you explicitly enforce conversion into utf8 upon writing, even though b1 is not convertible. The warning is thrown then already, and 3F is written into the outfile.
Or, in other words,
|