Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Not a Bug
-
10.3.10
-
None
-
Windows
Description
Execute the following statements. Make sure to have a file ready at <file path> containing the string '"Réunion"'.
- CREATE TABLE test.test_table (test_column VARCHAR(190) COLLATE utf8mb4_unicode_ci NOT NULL)
- INSERT INTO test.test_table (test_column) VALUES ('Réunion')
- LOAD DATA LOCAL INFILE '<file path>'
INTO TABLE test.test_table
FIELDS ESCAPED BY '
'
TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(test_column)
You will find that only the first of these two insertions will correctly insert the value. The LOAD DATA LOCAL INFILE statement will not respect the column's character set. It produces something like Réunion. In fact, even if you execute 'SET NAMES utf8mb4 COLLATE utf8mb4_unicode_ci', it will not respect the character set. Adding the line 'CHARACTER SET utf8mb4' to the LOAD DATA LOCAL INFILE won't help either.
This causes two problems:
- Obviously, the inserted data will be corrupted
- Secondly, uniqueness constraint checks might fail because whatever collation is used by the LOAD DATA INFILE process might not recognize that two values are different according to the column's collation.
Update: my SELECT INTO OUTFILE did not specify a CHARACTER SET so this is what caused the failure. So maybe this is not a bug.