Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Won't Fix
-
10.2.40, 10.3.31, 10.4.21, 10.5.12, 10.6.4
Description
When having a BINARY or VARBINARY column and trying to export contents with SELECT INTO OUTFILE, then reading back with LOAD DATA INFILE, data can become mangled depending on character set / encoding.
E.g.:
DROP TABLE IF EXISTS t1;
|
DROP TABLE IF EXISTS t2;
|
|
CREATE TABLE t1(
|
b1 VARBINARY(16),
|
i1 INT
|
) DEFAULT CHARSET=utf8;
|
|
INSERT INTO t1 VALUES(UNHEX("00"), 23);
|
INSERT INTO t1 VALUES(UNHEX("E0"), 42);
|
|
SELECT * FROM t1 INTO OUTFILE 'data.txt' CHARACTER SET utf8;
|
|
CREATE TABLE t2 LIKE t1;
|
|
LOAD DATA INFILE 'data.txt'
|
INTO TABLE t2
|
CHARACTER SET utf8;
|
|
SHOW WARNINGS;
|
|
SELECT LENGTH(b1), HEX(b1), i1 FROM t1;
|
SELECT LENGTH(b1), HEX(b1), i1 FROM t2;
|
I would expect both SELECT statements to return the same result, but instead I'm getting:
MariaDB [test]> SELECT LENGTH(b1), HEX(b1), i1 FROM t1;
|
+------------+---------+------+
|
| LENGTH(b1) | HEX(b1) | i1 |
|
+------------+---------+------+
|
| 1 | 00 | 23 |
|
| 1 | E0 | 42 |
|
+------------+---------+------+
|
2 rows in set (0.000 sec)
|
On the 2nd table, the one filled via LOAD DATA, I would assume to see the same result, seeing a byte value of E0 in the HEX(b1) column. But instead I'm seeing an empty b1 column in MariaDB versions up to 10.0, and a value of 3F ('?') starting with MariaDB 10.1.
Result on MariaDB 5.5 and 10.0:
MariaDB [test]> SELECT LENGTH(b1), HEX(b1), i1 FROM t2;
|
+------------+---------+------+
|
| LENGTH(b1) | HEX(b1) | i1 |
|
+------------+---------+------+
|
| 1 | 00 | 23 |
|
| 0 | | 42 |
|
+------------+---------+------+
|
2 rows in set (0.000 sec)
|
Result on MariaDB 10.1 and beyond:
MariaDB [test]> SELECT LENGTH(b1), HEX(b1), i1 FROM t2;
|
+------------+---------+------+
|
| LENGTH(b1) | HEX(b1) | i1 |
|
+------------+---------+------+
|
| 1 | 00 | 23 |
|
| 1 | 3F | 42 |
|
+------------+---------+------+
|
2 rows in set (0.000 sec)
|
The problem here is that the E0 byte is interpreted as the beginning of an UTF8 encoded multi byte sequence, but with the b1 column being of type VARBINARY no UTF8 sequence handling should be applied, the E0 byte should be interpreted verbatim instead.
Attachments
Issue Links
- relates to
-
MDEV-23471 LOAD DATA: partial utf8 Sequence in binary data may "swallow" field separator
- Closed