[MDEV-11885] load data give a prompt when charset error ? Created: 2017-01-23  Updated: 2017-05-27  Resolved: 2017-05-27

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: sysdljr Assignee: Alexander Barkov
Resolution: Won't Fix Votes: 0
Labels: need_feedback


 Description   

Hi,
We use MariaDB 10.1.21 on production environment,
in load data have a little advice, please help confirm , thanks.

example:
there a windows OS's txt file test0123.txt, file contains Chinese characters:
1 data21Tuv 2017-01-09 12:39:41
2 tUe测试aX 2017-01-08 18:32:09
3 星期三test 2017-01-06 11:09:12

in linux , load data import data:

create table test01(id int primary key ,c1 varchar(100), c2 datetime);
load data infile '/tmp/test0123.txt' into table test01;
Query OK, 3 rows affected, 2 warnings (0.00 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 2
 
show warnings;
+---------+------+--------------------------------------------------------------------------------+
| Level   | Code | Message                                                                        |
+---------+------+--------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xB2\xE2\xCA\xD4aX' for column 'c1' at row 2          |
| Warning | 1366 | Incorrect string value: '\xD0\xC7\xC6\xDA\xC8\xFD...' for column 'c1' at row 3 |
+---------+------+--------------------------------------------------------------------------------+
 
select * from test01;
+----+------------+---------------------+
| id | c1         | c2                  |
+----+------------+---------------------+
|  1 | data21Tuv  | 2017-01-09 12:39:41 |
|  2 | tUe????aX  | 2017-01-08 18:32:09 |
|  3 | ??????test | 2017-01-06 11:09:12 |
+----+------------+---------------------+
 
truncate table test01;
 
load data infile '/tmp/test0123.txt' into table test01 charset gbk;
Query OK, 3 rows affected (0.02 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0

in MariaDB 10.2.3 is same too.
when test mysql 5.7.17, it seems can check Character set matching. when import file very big, this prompt is more friendly , it can reduce the workload of repeated import.
Hope MairaDB's new version can also add this feature, thanks.

load data infile '/tmp/test0123.txt' into table test01;
ERROR 1300 (HY000): Invalid utf8 character string: 'tUe'
 
select * from test01;                                  
Empty set (0.00 sec)
 
load data infile '/tmp/test0123.txt' into table test01 charset gbk;
Query OK, 3 rows affected (0.01 sec)
Records: 3  Deleted: 0  Skipped: 0  Warnings: 0



 Comments   
Comment by Alexander Barkov [ 2017-05-26 ]

Did I understand you correctly that you prefer to stop with an error on broken data
(rather than continue and report warnings at the end)?

I tried this script in 10.2:

create or replace table test01(id int primary key ,c1 varchar(100), c2 datetime) character set gbk;
load data infile '/tmp/test0123.txt' into table test01;

ERROR 1366 (22007): Incorrect string value: '\xB2\xE2\xCA\xD4aX' for column 'c1' at row 2

Notice, it stops reading and returns an error immediately on the first broken byte sequence.

It works differently than in 10.1 because 10.2 has STRICT_ALL_TABLES enabled by default.
In 10.1 you can also get the same behavior by setting sql_mode before loading:

SET sql_mode=STRICT_ALL_TABLES;
create or replace table test01(id int primary key ,c1 varchar(100), c2 datetime) character set gbk;
load data infile '/tmp/test0123.txt' into table test01;

ERROR 1366 (22007): Incorrect string value: '\xB2\xE2\xCA\xD4aX' for column 'c1'

Alternatively, you can enable STRICT_ALL_TABLES permanently in my.cnf.

Does the above solve the problem?

Comment by sysdljr [ 2017-05-26 ]

Thank you very much, Your understanding is correct.
In 10.1 , SET sql_mode=STRICT_ALL_TABLES; it could slove the problem

Comment by Alexander Barkov [ 2017-05-26 ]

Thanks for confirming! Can we close the MDEV?

Comment by sysdljr [ 2017-05-27 ]

ok, it can colsed.
thank you again.

Generated at Thu Feb 08 07:53:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.