Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-11216

Error 1300 outputs only valid part of string in the message

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.0.28, 10.0.29
    • N/A
    • Character Sets
    • None

    Description

      Whenever we try to insert invalid utf8 character (in INSERT or, after the fix to https://jira.mariadb.org/browse/MDEV-9823, in LOAD DATA as well) we get ERROR 1300 with the text that pretends to show the problematic string, but shows only the valid part of it, like this:

      MariaDB [test]> LOAD DATA INFILE '/tmp/test_jfg1' IGNORE INTO TABLE `test_jfg` CHARACTER SET utf8 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`id`, `a`, `b`);
      ERROR 1300 (HY000): Invalid utf8 character string: 'q'
      

      I think Warning 1366 does a better job in showing the problematic string in hex, like this:

      MariaDB [test]> LOAD DATA INFILE '/tmp/test_jfg1' IGNORE INTO TABLE `test_jfg` CHARACTER SET utf8mb4 FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`id`, `a`, `b`);
      Query OK, 1 row affected, 1 warning (0.03 sec)
      Records: 1  Deleted: 0  Skipped: 0  Warnings: 1
       
      MariaDB [test]> show warnings\G
      *************************** 1. row ***************************
        Level: Warning
         Code: 1366
      Message: Incorrect string value: '\xF0\xA9\x9C\x99 \xE6...' for column 'b' at row 1
      1 row in set (0.00 sec)
      

      Current approach leads to meaningless empty strings mentioned in error messages on slaves when the very first character is a problem, see https://bugs.mysql.com/bug.php?id=82641 for example:

                         Last_Errno: 1300
                         Last_Error: Error 'Invalid utf8 character string: ''' on query. Default database: 'test'. Query: 'LOAD DATA INFILE '/tmp/SQL_LOAD-4bb3ce89-6567-11e6-a817-984be16e5ae4-2-25.data' REPLACE INTO  TABLE `test1` FIELDS TERMINATED BY ',' ENCLOSED BY '' ESCAPED BY '\\' LINES TERMINATED BY '\n' (`c1`, `c2`)'
      

      To summarize, when there is a string that is not valid for a charset, it makes sense to output it in hex instead of/along with character representation of the valid part.

      Attachments

        Activity

          People

            bar Alexander Barkov
            valerii Valerii Kravchuk
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.