Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL)
-
None
Description
I have am empty table:
DROP TABLE IF EXISTS t1; |
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET gbk); |
and an external file with this GBK text data:
printf "\xB0\x40\x61\xB0\x41\x40\xB0\x42\x40" >/tmp/test.txt
|
Note, the file can be checked with:
SELECT HEX(LOAD_FILE('/tmp/test.txt')); |
+---------------------------------+
|
| HEX(LOAD_FILE('/tmp/test.txt')) |
|
+---------------------------------+
|
| B04061B04140B04240 |
|
+---------------------------------+
|
The file consists of:
[B040] - a GBK double-byte character
|
[61] - ASCII 'a'
|
[B041] - a GBK double-byte character
|
[40] - ASCII '@'
|
[B042] - a GBK double-byte character
|
[40] - ASCII '@'
|
Now I want to treat the '@' characters as line separators and load the file into the table:
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET gbk LINES TERMINATED BY '@'; |
SELECT HEX(a),a FROM t1; |
+------------+---------+
|
| HEX(a) | a |
|
+------------+---------+
|
| B04061B041 | 癅a癆 |
|
| B042 | 癇 |
|
+------------+---------+
|
It correctly recognized two lines.
Now I want to skip the first line and reload the data:
DELETE FROM t1; |
LOAD DATA INFILE '/tmp/test.txt' INTO TABLE t1 CHARACTER SET gbk LINES TERMINATED BY '@' IGNORE 1 LINES; |
SELECT HEX(a),a FROM t1; |
+--------+------+
|
| HEX(a) | a |
|
+--------+------+
|
| 61B041 | a癆 |
|
| B042 | 癇 |
|
+--------+------+
|
The result is wrong. It still returns two lines.
The 0x40 byte which is a part of the first double-byte character 0xB040 was erroneously interpreted as a line terminator.
The expected result is to return one row:
+--------+------+
|
| HEX(a) | a |
|
+--------+------+
|
| B042 | 癇 |
|
+--------+------+
|
Attachments
Issue Links
- blocks
-
MDEV-6353 my_ismbchar() and my_mbcharlen() refactoring
- Closed