Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-12240

LOAD DATA INFILE binary blobs failing for UTF8

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Confirmed (View Workflow)
    • Priority: Major
    • Resolution: Unresolved
    • Affects Version/s: 10.0, 10.1, 10.0.30, 10.1.21, 10.2.4, 10.2
    • Fix Version/s: 10.2
    • Component/s: Character Sets
    • Labels:
      None
    • Environment:
      Windows and Linux. Latest 10.0, 10.1 and 10.2 versions.

      Description

      LOAD DATA INFILE with a binary blob fails when using UTF8 char sets. The result is additional escaped characters in the binary data when being accessed.

      Test case - run against 10.0/1/2 of MariaDB. Injects binary blob that should only be escape processed for binary content and not treated as utf8; however MariaDB seems to be adding an extra escape to the binary contents - MySQL 5.6.35 is fine. The database and client connection are completely set up to use UTF8.

      This might be related to MDEV-11079 - "Regression: LOAD DATA INFILE lost BLOB support using utf8 load files"

      $ echo -n -e '\xe2\x5c\x30\x0a' > input.bin
      

      > select @@version;
      +----------------+
      | @@version      |
      +----------------+
      | 10.2.4-MariaDB |
      +----------------+
      

      > show variables like '%char%';
      +--------------------------+-----------------------------------------------+
      | Variable_name            | Value                                         |
      +--------------------------+-----------------------------------------------+
      | character_set_client     | utf8                                          |
      | character_set_connection | utf8                                          |
      | character_set_database   | utf8                                          |
      | character_set_filesystem | binary                                        |
      | character_set_results    | utf8                                          |
      | character_set_server     | utf8                                          |
      | character_set_system     | utf8                                          |
      | character_sets_dir       | C:\Program Files\MariaDB 10.2\share\charsets\ |
      +--------------------------+-----------------------------------------------+
      8 rows in set (0.00 sec)
      

      > CREATE TABLE test ( binStuff mediumblob );
       
      > load data local infile 'input.bin' replace into table test;
      Query OK, 1 row affected (0.00 sec)
      Records: 1  Deleted: 0  Skipped: 0  Warnings: 0
       
      > select * into outfile 'output.bin' from test;
      

      ### Dump original input
      $ hexdump -C input.bin
      00000000  e2 5c 30 0a                                       |.\0.|
      00000004
       
      ### MariadDB has added an extra back-slash (5c) - probably as a result of treating it as UTF8 rather than binary. 
      $ hexdump -C output.bin
      00000000  e2 5c 5c 30 0a                                    |.\\0.|
      00000005
      

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              rucha174 Rucha Deodhar
              Reporter:
              bnokes Bill Nokes
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

                Dates

                Created:
                Updated:

                  Git Integration