[MDEV-12240] LOAD DATA INFILE binary blobs failing for UTF8 Created: 2017-03-13  Updated: 2021-11-03

Status: Confirmed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.0, 10.1, 10.0.30, 10.1.21, 10.2.4, 10.2
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Bill Nokes Assignee: Rucha Deodhar
Resolution: Unresolved Votes: 2
Labels: None
Environment:

Windows and Linux. Latest 10.0, 10.1 and 10.2 versions.


Issue Links:
Relates
relates to MDEV-11343 LOAD DATA INFILE fails to load data w... Closed
relates to MDEV-11079 Regression: LOAD DATA INFILE lost BLO... Closed
relates to MDEV-13361 Regression: LOAD DATA INFILE utf8 enc... Closed

 Description   

LOAD DATA INFILE with a binary blob fails when using UTF8 char sets. The result is additional escaped characters in the binary data when being accessed.

Test case - run against 10.0/1/2 of MariaDB. Injects binary blob that should only be escape processed for binary content and not treated as utf8; however MariaDB seems to be adding an extra escape to the binary contents - MySQL 5.6.35 is fine. The database and client connection are completely set up to use UTF8.

This might be related to MDEV-11079 - "Regression: LOAD DATA INFILE lost BLOB support using utf8 load files"

$ echo -n -e '\xe2\x5c\x30\x0a' > input.bin

> select @@version;
+----------------+
| @@version      |
+----------------+
| 10.2.4-MariaDB |
+----------------+

> show variables like '%char%';
+--------------------------+-----------------------------------------------+
| Variable_name            | Value                                         |
+--------------------------+-----------------------------------------------+
| character_set_client     | utf8                                          |
| character_set_connection | utf8                                          |
| character_set_database   | utf8                                          |
| character_set_filesystem | binary                                        |
| character_set_results    | utf8                                          |
| character_set_server     | utf8                                          |
| character_set_system     | utf8                                          |
| character_sets_dir       | C:\Program Files\MariaDB 10.2\share\charsets\ |
+--------------------------+-----------------------------------------------+
8 rows in set (0.00 sec)

> CREATE TABLE test ( binStuff mediumblob );
 
> load data local infile 'input.bin' replace into table test;
Query OK, 1 row affected (0.00 sec)
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0
 
> select * into outfile 'output.bin' from test;

### Dump original input
$ hexdump -C input.bin
00000000  e2 5c 30 0a                                       |.\0.|
00000004
 
### MariadDB has added an extra back-slash (5c) - probably as a result of treating it as UTF8 rather than binary. 
$ hexdump -C output.bin
00000000  e2 5c 5c 30 0a                                    |.\\0.|
00000005



 Comments   
Comment by Elena Stepanova [ 2017-03-18 ]

I think it's related to MDEV-11343.

Generated at Thu Feb 08 07:56:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.