[MDEV-9893] mysqldump -T followed by LOAD DATA does not restore data well Created: 2016-04-10  Updated: 2016-04-10

Status: Open
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 5.5, 10.0, 10.1, 10.2
Fix Version/s: 10.2

Type: Bug Priority: Major
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None


 Description   

I create a database with gbk as a default character set,
create two tables of equal structure and populate table t1:

mysql -uroot --table <<END
DROP DATABASE IF EXISTS db1;
CREATE DATABASE db1 DEFAULT CHARACTER SET gbk;
USE db1;
CREATE TABLE t1 (a TEXT, b BLOB);
CREATE TABLE t2 (a TEXT, b BLOB);
INSERT INTO t1 VALUES (_gbk 0xEE5C, 0xEE5C8060);
SELECT HEX(a), HEX(b) FROM t1;
END

Then I want to dump the table t1 using "mysqldump -T" and then reload the dump into t2.
To test with different locale settings and mysqldump options, I create a helper shell script "dump.sh":

LANG=$LOCALE sudo sh -c '          
rm -rf /var/lib/mysql/backups
mkdir /var/lib/mysql/backups
chmod a+rwx /var/lib/mysql/backups' &&
cd /var/lib/mysql &&
mysqldump $DUMPPARAM -uroot -Tbackups db1 t1 &&                       
mysql -uroot --table db1 << END
TRUNCATE TABLE t2;
LOAD DATA INFILE '/var/lib/mysql/backups/t1.txt' INTO TABLE t2;
SELECT HEX(LOAD_FILE('/var/lib/mysql/backups/t1.txt'));
SELECT HEX(a), HEX(b) FROM t2;
END

Now If I invoke the script using either of these commands:

LOCALE=zh_CN.gbk      DUMPPARAM=--default-character-set=binary sh dump.sh
LOCALE=en_US.utf8     DUMPPARAM=--default-character-set=binary sh dump.sh 
LOCALE=en_US.iso88591 DUMPPARAM=--default-character-set=binary sh dump.sh 
LOCALE=C              DUMPPARAM=--default-character-set=binary sh dump.sh 
LOCALE=zh_CN.gbk      DUMPPARAM=  sh dump.sh
LOCALE=en_US.utf8     DUMPPARAM=  sh dump.sh 
LOCALE=en_US.iso88591 DUMPPARAM=  sh dump.sh 
LOCALE=C              DUMPPARAM=  sh dump.sh 

it works fine and returns this output:

+-------------------------------------------------+
| HEX(LOAD_FILE('/var/lib/mysql/backups/t1.txt')) |
+-------------------------------------------------+
| EE5C09EE5C5C80600A                              |
+-------------------------------------------------+
+--------+----------+
| HEX(a) | HEX(b)   |
+--------+----------+
| EE5C   | EE5C8060 |
+--------+----------+

But if then I invoke the script using these commands:

LOCALE=zh_CN.gbk      DUMPPARAM=--default-character-set=gbk sh dump.sh 
LOCALE=en_US.utf8     DUMPPARAM=--default-character-set=gbk sh dump.sh 
LOCALE=en_US.iso88591 DUMPPARAM=--default-character-set=gbk sh dump.sh 
LOCALE=C              DUMPPARAM=--default-character-set=gbk sh dump.sh 

it returns a wrong BLOB value:

+-------------------------------------------------+
| HEX(LOAD_FILE('/var/lib/mysql/backups/t1.txt')) |
+-------------------------------------------------+
| EE5C09EE5C0A                                    |
+-------------------------------------------------+
+--------+--------+
| HEX(a) | HEX(b) |
+--------+--------+
| EE5C   | EE5C   |
+--------+--------+

This looks wrong. According to the documentation, LOAD DATA INFILE treats the file as having the same character set with the default character set of the database, which is GBK. So dump with explicit --default-character-set=gbk followed by LOAD DATA which implicitly use gbk as the file character set do not work together.



 Comments   
Comment by Alexander Barkov [ 2016-04-10 ]

If I now create a new script dump-restore-gbk.sh with an explicit CHARACTER SET clause in LOAD:

LANG=$LOCALE sudo sh -c '
rm -rf /var/lib/mysql/backups
mkdir /var/lib/mysql/backups
chmod a+rwx /var/lib/mysql/backups' &&
cd /var/lib/mysql &&
mysqldump $DUMPPARAM -uroot -Tbackups db1 t1 &&
mysql -uroot --table db1 << END
TRUNCATE TABLE t2;
LOAD DATA INFILE '/var/lib/mysql/backups/t1.txt' INTO TABLE t2 CHARACTER SET gbk
SELECT HEX(LOAD_FILE('/var/lib/mysql/backups/t1.txt'));
SELECT HEX(a), HEX(b) FROM t2;
END

and invoke it using either of these commands::

LOCALE=zh_CN.gbk      DUMPPARAM=--default-character-set=gbk sh dump-restore-gbk.sh 
LOCALE=en_US.utf8     DUMPPARAM=--default-character-set=gbk sh dump-restore-gbk.sh 
LOCALE=en_US.iso88591 DUMPPARAM=--default-character-set=gbk sh dump-restore-gbk.sh 
LOCALE=C              DUMPPARAM=--default-character-set=gbk sh dump-restore-gbk.sh 

it also returns a wrong BLOB value:

+-------------------------------------------------+
| HEX(LOAD_FILE('/var/lib/mysql/backups/t1.txt')) |
+-------------------------------------------------+
| EE5C09EE5C0A                                    |
+-------------------------------------------------+
+--------+--------+
| HEX(a) | HEX(b) |
+--------+--------+
| EE5C   | EE5C   |
+--------+--------+

Generated at Thu Feb 08 07:38:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.