[MDEV-11216] Error 1300 outputs only valid part of string in the message Created: 2016-11-02  Updated: 2023-01-22  Resolved: 2023-01-22

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.0.28, 10.0.29
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Valerii Kravchuk Assignee: Alexander Barkov
Resolution: Cannot Reproduce Votes: 1
Labels: None


 Description   

Whenever we try to insert invalid utf8 character (in INSERT or, after the fix to https://jira.mariadb.org/browse/MDEV-9823, in LOAD DATA as well) we get ERROR 1300 with the text that pretends to show the problematic string, but shows only the valid part of it, like this:

MariaDB [test]> LOAD DATA INFILE '/tmp/test_jfg1' IGNORE INTO TABLE `test_jfg` CHARACTER SET utf8 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`id`, `a`, `b`);
ERROR 1300 (HY000): Invalid utf8 character string: 'q'

I think Warning 1366 does a better job in showing the problematic string in hex, like this:

MariaDB [test]> LOAD DATA INFILE '/tmp/test_jfg1' IGNORE INTO TABLE `test_jfg` CHARACTER SET utf8mb4 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`id`, `a`, `b`);
Query OK, 1 row affected, 1 warning (0.03 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 1
 
MariaDB [test]> show warnings\G
*************************** 1. row ***************************
  Level: Warning
   Code: 1366
Message: Incorrect string value: '\xF0\xA9\x9C\x99 \xE6...' for column 'b' at row 1
1 row in set (0.00 sec)

Current approach leads to meaningless empty strings mentioned in error messages on slaves when the very first character is a problem, see https://bugs.mysql.com/bug.php?id=82641 for example:

                   Last_Errno: 1300
                   Last_Error: Error 'Invalid utf8 character string: ''' on query. Default database: 'test'. Query: 'LOAD DATA INFILE '/tmp/SQL_LOAD-4bb3ce89-6567-11e6-a817-984be16e5ae4-2-25.data' REPLACE INTO  TABLE `test1` FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`c1`, `c2`)'

To summarize, when there is a string that is not valid for a charset, it makes sense to output it in hex instead of/along with character representation of the valid part.



 Comments   
Comment by Elena Stepanova [ 2023-01-22 ]

valerii,

Can you provide a test case? I'm only getting 1366.

Generated at Thu Feb 08 07:48:12 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.