Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Cannot Reproduce
-
10.0.28, 10.0.29
-
None
Description
Whenever we try to insert invalid utf8 character (in INSERT or, after the fix to https://jira.mariadb.org/browse/MDEV-9823, in LOAD DATA as well) we get ERROR 1300 with the text that pretends to show the problematic string, but shows only the valid part of it, like this:
MariaDB [test]> LOAD DATA INFILE '/tmp/test_jfg1' IGNORE INTO TABLE `test_jfg` CHARACTER SET utf8 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`id`, `a`, `b`);
|
ERROR 1300 (HY000): Invalid utf8 character string: 'q'
|
I think Warning 1366 does a better job in showing the problematic string in hex, like this:
MariaDB [test]> LOAD DATA INFILE '/tmp/test_jfg1' IGNORE INTO TABLE `test_jfg` CHARACTER SET utf8mb4 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`id`, `a`, `b`);
|
Query OK, 1 row affected, 1 warning (0.03 sec)
|
Records: 1 Deleted: 0 Skipped: 0 Warnings: 1
|
 |
MariaDB [test]> show warnings\G
|
*************************** 1. row ***************************
|
Level: Warning
|
Code: 1366
|
Message: Incorrect string value: '\xF0\xA9\x9C\x99 \xE6...' for column 'b' at row 1
|
1 row in set (0.00 sec)
|
Current approach leads to meaningless empty strings mentioned in error messages on slaves when the very first character is a problem, see https://bugs.mysql.com/bug.php?id=82641 for example:
Last_Errno: 1300
|
Last_Error: Error 'Invalid utf8 character string: ''' on query. Default database: 'test'. Query: 'LOAD DATA INFILE '/tmp/SQL_LOAD-4bb3ce89-6567-11e6-a817-984be16e5ae4-2-25.data' REPLACE INTO TABLE `test1` FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`c1`, `c2`)'
|
To summarize, when there is a string that is not valid for a charset, it makes sense to output it in hex instead of/along with character representation of the valid part.