Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11216

Error 1300 outputs only valid part of string in the message

    Details

    • Type: Bug
    • Status: Open (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0.28, 10.0.29
    • Fix Version/s: 10.0
    • Component/s: Character Sets
    • Labels:
      None

      Description

      Whenever we try to insert invalid utf8 character (in INSERT or, after the fix to https://jira.mariadb.org/browse/MDEV-9823, in LOAD DATA as well) we get ERROR 1300 with the text that pretends to show the problematic string, but shows only the valid part of it, like this:

      MariaDB [test]> LOAD DATA INFILE '/tmp/test_jfg1' IGNORE INTO TABLE `test_jfg` CHARACTER SET utf8 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`id`, `a`, `b`);
      ERROR 1300 (HY000): Invalid utf8 character string: 'q'
      

      I think Warning 1366 does a better job in showing the problematic string in hex, like this:

      MariaDB [test]> LOAD DATA INFILE '/tmp/test_jfg1' IGNORE INTO TABLE `test_jfg` CHARACTER SET utf8mb4 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`id`, `a`, `b`);
      Query OK, 1 row affected, 1 warning (0.03 sec)
      Records: 1  Deleted: 0  Skipped: 0  Warnings: 1
       
      MariaDB [test]> show warnings\G
      *************************** 1. row ***************************
        Level: Warning
         Code: 1366
      Message: Incorrect string value: '\xF0\xA9\x9C\x99 \xE6...' for column 'b' at row 1
      1 row in set (0.00 sec)
      

      Current approach leads to meaningless empty strings mentioned in error messages on slaves when the very first character is a problem, see https://bugs.mysql.com/bug.php?id=82641 for example:

                         Last_Errno: 1300
                         Last_Error: Error 'Invalid utf8 character string: ''' on query. Default database: 'test'. Query: 'LOAD DATA INFILE '/tmp/SQL_LOAD-4bb3ce89-6567-11e6-a817-984be16e5ae4-2-25.data' REPLACE INTO  TABLE `test1` FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`c1`, `c2`)'
      

      To summarize, when there is a string that is not valid for a charset, it makes sense to output it in hex instead of/along with character representation of the valid part.

        Attachments

          Activity

            People

            • Assignee:
              bar Alexander Barkov
              Reporter:
              valerii Valerii Kravchuk
            • Votes:
              1 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated: