[MDEV-7421] corrupted text with CSV connect table Created: 2015-01-08  Updated: 2015-01-19  Resolved: 2015-01-19

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.0.13
Fix Version/s: 10.0.16

Type: Bug Priority: Major
Reporter: Takuya Aoki (Inactive) Assignee: Olivier Bertrand
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS release 6.5 (X86_64)


Attachments: File table_utf8.csv    

 Description   

I want to make a CONNECT table directed to a local CSV file saved in character set utf8.
The table data is correctly displayed but the header is corrupting.
Can I configure to fix this or is it a bug?

/var/lib/mysql/table_utf8.csv

[root@kc0022 ~]# cat /var/lib/mysql/table_utf8.csv
"COUPON_NUM","COUPON_NAME","START_DATE","END_DATE"
"A8772","CAMPAIGN1","2005-03-11","2005-04-01"
"A6703","ANNIVERSARY","2005-08-15","2005-09-05"
"A4313","NEW_ITEM","2005-09-13","2005-10-04"

CONNECT table on MariaDB

[root@kc0022 ~]# mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 351
Server version: 10.0.13-MariaDB MariaDB Server
 
Copyright (c) 2000, 2014, Oracle, SkySQL Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> use test;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
MariaDB [test]> CREATE TABLE table_utf8
    -> engine=CONNECT table_type=CSV file_name='../table_utf8.csv'
    -> header=1 sep_char=',' quoted=1;
Query OK, 0 rows affected (0.05 sec)
 
MariaDB [test]> EXPLAIN table_utf8;
+----------------------------------------------------------------------+----------+------+-----+---------+-------+
| Field                                                                | Type     | Null | Key | Default | Extra |
+----------------------------------------------------------------------+----------+------+-----+---------+-------+
| i≫?"i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-"                                  | char(5)  | NO   |     | NULL    |       |
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                    | char(33) | NO   |     | NULL    |       |
| i?3i?´i?!i?2i?´i??i??i?!i?´i?\                                       | char(10) | NO   |     | NULL    |       |
| i?\i?Ri??i??i??i?!i?´i?\                                             | char(10) | NO   |     | NULL    |       |
+----------------------------------------------------------------------+----------+------+-----+---------+-------+
4 rows in set (0.05 sec)
 
MariaDB [test]> SELECT * FROM table_utf8;
+----------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i≫?"i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-"                                  | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i? ´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+----------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                                | CAM PAIGN1                                                 | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                                | ANN IVERSARY                                             | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                                | NEW _ITEM                                                   | 2005-09-13                                                   | 2005-10-04                                       |
+----------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test]> show variables like "%char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)



 Comments   
Comment by Olivier Bertrand [ 2015-01-08 ]

I tried to reproduce this on Windows 7 but everything worked ok.

MariaDB [(none)]> use test
Database changed
 
MariaDB [(none)]> show variables like "%char%";
+--------------------------+---------------------------------------------------------+
| Variable_name            | Value                                                   |
+--------------------------+---------------------------------------------------------+
| character_set_client     | utf8                                                    |
| character_set_connection | utf8                                                    |
| character_set_database   | utf8                                                    |
| character_set_filesystem | binary                                                  |
| character_set_results    | utf8                                                    |
| character_set_server     | utf8                                                    |
| character_set_system     | utf8                                                    |
| character_sets_dir       | e:\MariaDevelopBlkIndx\10.0-connect\sql\share\charsets\ |
+--------------------------+---------------------------------------------------------+
8 rows in set (0.01 sec)
 
MariaDB [test]> CREATE TABLE table_utf8
    -> engine=CONNECT table_type=CSV file_name='E:\\Data\\table_utf8.csv'
    -> header=1 sep_char=',' quoted=1;
Query OK, 0 rows affected (5.75 sec)
 
MariaDB [test]> explain table_utf8;
+-------------+----------+------+-----+---------+-------+
| Field       | Type     | Null | Key | Default | Extra |
+-------------+----------+------+-----+---------+-------+
| COUPON_NUM  | char(5)  | NO   |     | NULL    |       |
| COUPON_NAME | char(11) | NO   |     | NULL    |       |
| START_DATE  | char(10) | NO   |     | NULL    |       |
| END_DATE    | char(10) | NO   |     | NULL    |       |
+-------------+----------+------+-----+---------+-------+
4 rows in set (0.02 sec)
 
MariaDB [test]> select * from table_utf8;
+------------+-------------+------------+------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END_DATE   |
+------------+-------------+------------+------------+
| A8772      | CAMPAIGN1   | 2005-03-11 | 2005-04-01 |
| A6703      | ANNIVERSARY | 2005-08-15 | 2005-09-05 |
| A4313      | NEW_ITEM    | 2005-09-13 | 2005-10-04 |
+------------+-------------+------------+------------+
3 rows in set (12.09 sec)

There is probably something special in your system.

Comment by Takuya Aoki (Inactive) [ 2015-01-09 ]

Can I see your my.cnf file so I can test with the same environment?
It might be because of OS difference since Windows and Linux are of course totally different.

Comment by Olivier Bertrand [ 2015-01-11 ]

It is very difficult to understand where is the configuration file used on Windows by MySQL or MariaDB. Looking in different places I found this one but not sure it is the good one. To reduce its size, I removed all comment lines:

[client]
port=3306
 
[mysql]
default-character-set=latin1
 
[mysqld]
port=3306
basedir="C:/Program Files/MySQL/MySQL Server 5.5/"
datadir="D:/MySQL Server 5.5/Data/"
character-set-server=latin1
default-storage-engine=MYISAM
sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
max_connections=100
query_cache_size=15M
table_cache=256
tmp_table_size=18M
thread_cache_size=8
myisam_max_sort_file_size=100G
myisam_sort_buffer_size=35M
key_buffer_size=45M
read_buffer_size=64K
read_rnd_buffer_size=256K
sort_buffer_size=256K
 
skip-innodb
innodb_additional_mem_pool_size=2M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=1M
innodb_buffer_pool_size=8M
innodb_log_file_size=10M
innodb_thread_concurrency=8

By the way, CONNECT has a DATA_CHARSET option for all tables, did you try to play with it?

Comment by Takuya Aoki (Inactive) [ 2015-01-14 ]

I did some tests to see when texts corrupts.
Everything worked well only in the first case where charcter-set for the system, client, and database are set to latin1 and the table options are not specified.
All other cases failed in some way.
I usually work with configuration file like my.cnf2 so I will have problems.

/etc/my.cnf1

[client]
port=3306
 
[mysql]
default-character-set=latin1
 
[mysqld]
port=3306
character-set-server=latin1
default-storage-engine=MYISAM
sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
max_connections=100
query_cache_size=15M
table_cache=256
tmp_table_size=18M
thread_cache_size=8
myisam_max_sort_file_size=100G
myisam_sort_buffer_size=35M
key_buffer_size=45M
read_buffer_size=64K
read_rnd_buffer_size=256K
sort_buffer_size=256K
 
skip-innodb
innodb_additional_mem_pool_size=2M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=1M
innodb_buffer_pool_size=8M
innodb_log_file_size=10M
innodb_thread_concurrency=8
 
[mariadb]
plugin-load-add=ha_connect.so

[root@kc1060 etc]# cp my.cnf1 my.cnf
cp: overwrite `my.cnf'? y
[root@kc1060 etc]# service mysql restart
Shutting down MySQL. SUCCESS!
Starting MySQL. SUCCESS!
[root@kc1060 etc]# mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.0.15-MariaDB MariaDB Server
 
Copyright (c) 2000, 2014, Oracle, SkySQL Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> SHOW VARIABLES LIKE "%char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
 
MariaDB [(none)]> USE test1;
Database changed
MariaDB [test1]> SHOW CREATE DATABASE test1;
+----------+------------------------------------------------------------------+
| Database | Create Database                                                  |
+----------+------------------------------------------------------------------+
| test1    | CREATE DATABASE `test1` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+------------------------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [test1]> CREATE TABLE table_utf8_1 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1;
Query OK, 0 rows affected (0.07 sec)
 
MariaDB [test1]> CREATE TABLE table_utf8_2 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.06 sec)
 
MariaDB [test1]> CREATE TABLE table_utf8_3 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DATA_CHARSET=utf8;
Query OK, 0 rows affected (0.05 sec)
 
MariaDB [test1]> CREATE TABLE table_utf8_4 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DEFAULT CHARSET=utf8 DATA_CHARSET=utf8;
Query OK, 0 rows affected (0.05 sec)
 
MariaDB [test1]> SELECT * FROM table_utf8_1;
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END _DATE |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| A8772                          | CAMPAIGN1       | 2005-03-11                     | 2005-04-01               |
| A6703                          | ANNIVERSARY | 2005-08-15                     | 2005-09-05               |
| A4313                          | NEW_ITEM          | 2005-09-13                     | 2005-10-04               |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test1]> SELECT * FROM table_utf8_2;
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END _DATE |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| A8772                          | ?????????                         | 2005-03-11                     | 2005-04-01               |
| A6703                          | ???????????                       | 2005-08-15                     | 2005-09-05               |
| A4313                          | ????????                          | 2005-09-13                     | 2005-10-04               |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test1]> SELECT * FROM table_utf8_3;
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END _DATE |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| A8772                          | ?????????                         | 2005-03-11                     | 2005-04-01               |
| A6703                          | ???????????                       | 2005-08-15                     | 2005-09-05               |
| A4313                          | ????????                          | 2005-09-13                     | 2005-10-04               |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
3 rows in set, 6 warnings (0.00 sec)
 
MariaDB [test1]> SELECT * FROM table_utf8_4;
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END _DATE |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| A8772                          | ?????????                         | 2005-03-11                     | 2005-04-01               |
| A6703                          | ???????????                       | 2005-08-15                     | 2005-09-05               |
| A4313                          | ????????                          | 2005-09-13                     | 2005-10-04               |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test1]> USE test2;
Database changed
MariaDB [test2]> SHOW CREATE DATABASE test2;
+----------+----------------------------------------------------------------+
| Database | Create Database                                                |
+----------+----------------------------------------------------------------+
| test2    | CREATE DATABASE `test2` /*!40100 DEFAULT CHARACTER SET utf8 */ |
+----------+----------------------------------------------------------------+
1 row in set (0.00 sec)
 
MariaDB [test2]> CREATE TABLE table_utf8_1 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1;
MariaDB [test2]> CREATE TABLE table_utf8_2 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DEFAULT CHARSET=utf8;
MariaDB [test2]> CREATE TABLE table_utf8_3 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DATA_CHARSET=utf8;
MariaDB [test2]> CREATE TABLE table_utf8_4 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DEFAULT CHARSET=utf8 DATA_CHARSET=utf8;
 
MariaDB [test2]> CREATE TABLE table_utf8_1 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1;
Query OK, 0 rows affected (0.04 sec)
 
MariaDB [test2]> CREATE TABLE table_utf8_2 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DEFAULT CHARSET=utf8;
Query OK, 0 rows affected (0.04 sec)
 
MariaDB [test2]> CREATE TABLE table_utf8_3 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DATA_CHARSET=utf8;
Query OK, 0 rows affected (0.04 sec)
 
MariaDB [test2]> CREATE TABLE table_utf8_4 engine=CONNECT table_type=CSV file_name='../table_utf8.csv' header=1 sep_char=',' quoted=1 DEFAULT CHARSET=utf8 DATA_CHARSET=utf8;
Query OK, 0 rows affected (0.04 sec)
 
MariaDB [test2]> SELECT * FROM table_utf8_1;
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END _DATE |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| A8772                          | ?????????                         | 2005-03-11                     | 2005-04-01               |
| A6703                          | ???????????                       | 2005-08-15                     | 2005-09-05               |
| A4313                          | ????????                          | 2005-09-13                     | 2005-10-04               |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test2]> SELECT * FROM table_utf8_2;
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END _DATE |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| A8772                          | ?????????                         | 2005-03-11                     | 2005-04-01               |
| A6703                          | ???????????                       | 2005-08-15                     | 2005-09-05               |
| A4313                          | ????????                          | 2005-09-13                     | 2005-10-04               |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test2]> SELECT * FROM table_utf8_3;
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END _DATE |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| A8772                          | ?????????                         | 2005-03-11                     | 2005-04-01               |
| A6703                          | ???????????                       | 2005-08-15                     | 2005-09-05               |
| A4313                          | ????????                          | 2005-09-13                     | 2005-10-04               |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test2]> SELECT * FROM table_utf8_4;
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END _DATE |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
| A8772                          | ?????????                         | 2005-03-11                     | 2005-04-01               |
| A6703                          | ???????????                       | 2005-08-15                     | 2005-09-05               |
| A4313                          | ????????                          | 2005-09-13                     | 2005-10-04               |
+--------------------------------+-----------------------------------+--------------------------------+--------------------------+
3 rows in set (0.00 sec)

/etc/my.cnf2

[client]
port=3306
 
[mysql]
default-character-set=utf8
[client]
port=3306
 
[mysql]
default-character-set=utf8
 
[mysqld]
port=3306
character-set-server=utf8
default-storage-engine=MYISAM
sql-mode="STRICT_TRANS_TABLES,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
max_connections=100
query_cache_size=15M
table_cache=256
tmp_table_size=18M
thread_cache_size=8
myisam_max_sort_file_size=100G
myisam_sort_buffer_size=35M
key_buffer_size=45M
read_buffer_size=64K
read_rnd_buffer_size=256K
sort_buffer_size=256K
 
skip-innodb
innodb_additional_mem_pool_size=2M
innodb_flush_log_at_trx_commit=1
innodb_log_buffer_size=1M
innodb_buffer_pool_size=8M
innodb_log_file_size=10M
innodb_thread_concurrency=8
 
[mariadb]
plugin-load-add=ha_connect.so

[root@kc1060 etc]# cp my.cnf2 my.cnf
cp: overwrite `my.cnf'? y
[root@kc1060 etc]# service mysql restart
Shutting down MySQL. SUCCESS!
Starting MySQL. SUCCESS!
[root@kc1060 etc]# mysql -u root
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 3
Server version: 10.0.15-MariaDB MariaDB Server
 
Copyright (c) 2000, 2014, Oracle, SkySQL Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [(none)]> SHOW VARIABLES LIKE "%char%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
 
MariaDB [(none)]> use test1;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
MariaDB [test1]> SELECT * FROM table_utf8_1;
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-                               | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i?´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                        | i?£i?!i?-i?°i?!i?ci?§i?Ri?‘                                        | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                        | i?!i?Ri?Ri?ci? ¶i?\i?2i?3i?!i?2i?1                                  | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                        | i?Ri?\i?・i??i?ci?´i?\i?-                                           | 2005-09-13                                                   | 2005-10-04                                       |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test1]> SELECT * FROM table_utf8_2;
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-                               | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i?´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                        | CAMPAIG N1                                                 | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                        | ANNIVER SARY                                             | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                        | NEW_ITE M                                                   | 2005-09-13                                                   | 2005-10-04                                       |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test1]> SELECT * FROM table_utf8_3;
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-                               | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i?´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                        | ?????????                                                          | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                        | ???????????                                                        | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                        | ????????                                                           | 2005-09-13                                                   | 2005-10-04                                       |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set, 6 warnings (0.00 sec)
 
MariaDB [test1]> SELECT * FROM table_utf8_4;
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-                               | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i?´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                        | CAMPAIG N1                                                 | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                        | ANNIVER SARY                                             | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                        | NEW_ITE M                                                   | 2005-09-13                                                   | 2005-10-04                                       |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test1]> use test2;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
MariaDB [test2]> SELECT * FROM table_utf8_1;
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-                               | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i?´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                        | CAMPAIG N1                                                 | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                        | ANNIVER SARY                                             | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                        | NEW_ITE M                                                   | 2005-09-13                                                   | 2005-10-04                                       |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test2]> SELECT * FROM table_utf8_2;
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-                               | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i?´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                        | CAMPAIG N1                                                 | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                        | ANNIVER SARY                                             | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                        | NEW_ITE M                                                   | 2005-09-13                                                   | 2005-10-04                                       |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test2]> SELECT * FROM table_utf8_3;
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-                               | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i?´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                        | CAMPAIG N1                                                 | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                        | ANNIVER SARY                                             | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                        | NEW_ITE M                                                   | 2005-09-13                                                   | 2005-10-04                                       |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set (0.00 sec)
 
MariaDB [test2]> SELECT * FROM table_utf8_4;
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-                               | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\                                  | i?3i?´i?!i?2i?´i??i??i?!i?´i?\                               | i?\i?Ri??i??i??i?!i?´i?\                         |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
| A8772                                                        | CAMPAIG N1                                                 | 2005-03-11                                                   | 2005-04-01                                       |
| A6703                                                        | ANNIVER SARY                                             | 2005-08-15                                                   | 2005-09-05                                       |
| A4313                                                        | NEW_ITE M                                                   | 2005-09-13                                                   | 2005-10-04                                       |
+--------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
3 rows in set (0.00 sec)

Comment by Olivier Bertrand [ 2015-01-14 ]

I am sorry. Last tests I did on this case I just copied the table from your comment instead of using the attached file. So indeed, it had no special characters, which seemed normal to me as ascii characters are not transformed when tranlated to UTF8.

Now I have used the attached table and got the same result.

The problem is that it is displayed on my editor as:

"COUPON_NUM","COUPON_NAME","START_DATE","END_DATE"
"A8772","CAMPAIGN1","2005-03-11","2005-04-01"
"A6703","ANNIVERSARY","2005-08-15","2005-09-05"
"A4313","NEW_ITEM","2005-09-13","2005-10-04"

I may be wrong but to me this is not an UTF8 encoding.

Besides, when trying to display the table by SELECT I get 6 warnings:

Level	Code	Message
Warning	1366	Incorrect string value: '\xEF\xBC\xA3\xEF\xBC\xA1...' for column 'COUPON_NAME' at row 1
Warning	1105	Out of range value CAMPAIGN1 for column 'COUPON_NAME' at row 1
Warning	1366	Incorrect string value: '\xEF\xBC\xA1\xEF\xBC\xAE...' for column 'COUPON_NAME' at row 2
Warning	1105	Out of range value ANNIVERSARY for column 'COUPON_NAME' at row 2
Warning	1366	Incorrect string value: '\xEF\xBC\xAE\xEF\xBC\xA5...' for column 'COUPON_NAME' at row 3
Warning	1105	Out of range value NEW_ITEM for column 'COUPON_NAME' at row 3

It probably shows that this is not recognized as UTF8.

Comment by Takuya Aoki (Inactive) [ 2015-01-16 ]

It's OK, I attached the file later.

It looks like you are opening the file in latin1 settings.
Please use a text editor like the one below to open utf8 files.

http://sakura-editor.sourceforge.net/
http://gedit.en.softonic.com/

Comment by Olivier Bertrand [ 2015-01-16 ]

Indeed, using UltraEdit it was correctly decoded and I could verify if is in UTF-8 (coded in three bytes).

I did not check yet what happen with column names. They are retrieved by CONNECT from the file header during the discovery process and perhaps not decoded properly. This can be temporarily avoided by manually defining the columns in the create table statement.

However, I traced what happen with the data part. Because it is in UTF-8 (even the ASCII columns can be regarded as UTF-8) I created the table as:

CREATE TABLE table_utf8 engine=CONNECT table_type=CSV default charset=UTF8
file_name='E:/Data/table_utf8.csv' header=1 sep_char=',' quoted=1 data_charset=utf8;

The data_charset option indicates the coding of the table data.

What CONNECT does when reading the table is retrieving the column data and I could verify that it get it correctly and call the matching field store function with the charset parameter set by the data_charset option as UTF-8:

  rc= fp->store(p, strlen(p), charset, CHECK_FIELD_WARN);

From there, all is done by MariaDB. If the default charset of the table is Latin1, MariaDB tries to convert the three bytes characters to Latin1 and fails, then they are displayed as '?' question marks.

If the table default charset is UTF-8, the data is not converted (as the from charset is equal to the to charset)
The way it is displayed depends on the client program and settings.

Therefore this seems to be a MariaDB issue, at least for the data part of the table.

Comment by Olivier Bertrand [ 2015-01-16 ]

Redefining the table as:

CREATE TABLE `table_utf8` (
  `COUPON_NUM` char(5) NOT NULL,
  `COUPON_NAME` char(33) NOT NULL,
  `START_DATE` char(10) NOT NULL,
  `END_DATE` char(10) NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=UTF8 `TABLE_TYPE`='CSV' `FILE_NAME`='E:/Data/table_utf8.csv' `SEP_CHAR`=',' `DATA_CHARSET`='utf8' `HEADER`=1 `QUOTED`=1;

I get the following result when character_set_results='latin1':

+------------+-------------+------------+------------+
| COUPON_NUM | COUPON_NAME | START_DATE | END_DATE   |
+------------+-------------+------------+------------+
| A8772      | ?????????   | 2005-03-11 | 2005-04-01 |
| A6703      | ??????????? | 2005-08-15 | 2005-09-05 |
| A4313      | ????????    | 2005-09-13 | 2005-10-04 |
+------------+-------------+------------+------------+

This seems normal, those three bytes characters cannot be translated to latin1.
Now if character_set_results='utf8' the display is:

+------------+-----------------------------------+------------+------------+
| COUPON_NUM | COUPON_NAME                       | START_DATE | END_DATE   |
+------------+-----------------------------------+------------+------------+
| A8772      | ´╝ú´╝í´╝¡´╝░´╝í´╝®´╝º´╝«´╝æ       | 2005-03-11 | 2005-04-01 |
| A6703      | ´╝í´╝«´╝«´╝®´╝´╝Ñ´╝▓´╝│´╝í´╝▓´╝╣ | 2005-08-15 | 2005-09-05 |
| A4313      | ´╝«´╝Ñ´╝À´╝┐´╝®´╝┤´╝Ñ´╝¡          | 2005-09-13 | 2005-10-04 |
+------------+-----------------------------------+------------+------------+

To me it looks as if the client program were not decoding the result even the character_set_client is set to UTF-8. So it could also be a problem coming from the client program.

Comment by Olivier Bertrand [ 2015-01-17 ]

Concerning the column names, there was a problem when specified in UTF8 because, MariaDB internally handling column names in UTF8, CONNECT was translating them before creating the table. Now for CSV tables specifying DATA_CHARSET=UTF8 this translation is no more done, considering they already are coded in UTF8.

For instance, starting from the file:

Opération,Monnaie,Débit,Crédit
Chèque,£,100,0
Virement,€,0,5600

I can create the table:

create table tab_csv8 engine=connect table_type=csv file_name='E:/Data/csv8.csv' header=1 lrecl=64 data_charset=utf8;

Describe tab_csv8 replies:

Field Type Null Key Default Extra
Opération char(8) NO <null> <null> <null>
Monnaie char(3) NO <null> <null> <null>
Débit int(3) NO <null> <null> <null>
Crédit int(4) NO <null> <null> <null>

and

select * from tab_csv8;

Returns:

Opération Monnaie Débit Crédit
Chèque £ 100 0
Virement 0 5600

However, if instead of using my graphic client program I use the mysql client, the table is displayed as:

MariaDB [test]> select * from tab_csv8;
+-----------+---------+-------+--------+
| Opération | Monnaie | Débit | Crédit |
+-----------+---------+-------+--------+
| Chèque    | £       |   100 |      0 |
| Virement  | ?       |     0 |   5600 |
+-----------+---------+-------+--------+

See; the Euro character is not recognized and printed as a question mark. This shows that the problem is a problem of character recognition by the mysql client program.

Note that this is done with:

MariaDB [test]> show variables like '%char%';
+--------------------------+-----------------------------------------------------------------+
| Variable_name            | Value                                                           |
+--------------------------+-----------------------------------------------------------------+
| character_set_client     | cp850                                                           |
| character_set_connection | cp850                                                           |
| character_set_database   | latin1                                                          |
| character_set_filesystem | binary                                                          |
| character_set_results    | cp850                                                           |
| character_set_server     | latin1                                                          |
| character_set_system     | utf8                                                            |
| character_sets_dir       | d:\CommonSource\mariadb-10.0.5\10.0-connect\sql\share\charsets\ |
+--------------------------+-----------------------------------------------------------------+

If those character sets are set to UTF8, things are even worst and the table is displayed as:

MariaDB [test]> select * from tab_csv8;
+------------+---------+--------+---------+
| Op├®ration | Monnaie | D├®bit | Cr├®dit |
+------------+---------+--------+---------+
| Chèque    | £      |    100 |       0 |
| Virement   | Ôé¼     |      0 |    5600 |
+------------+---------+--------+---------+

Therefore, I think this issue is fixed for CONNECT (column names are now retrieved correctly) but of course there are pending problem for displaying some characters that are not CONNECT wise.

Comment by Takuya Aoki (Inactive) [ 2015-01-19 ]

Indeed, setting default charset=UTF8 and configuring the client solves the problem for the data part.
This is the same in my test I submitted in 2015-01-14.

But for the header part, I don't understand what is fixed at the moment.
Do you mean data is handled correctly inside, but displaying it errors?
Is it possible to fix the client so that all characters can be shown properly?

Comment by Olivier Bertrand [ 2015-01-19 ]

As I said, the header part was a bug that will be fixed in next version.

Generated at Thu Feb 08 07:19:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.