Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.0.13
-
None
-
CentOS release 6.5 (X86_64)
Description
I want to make a CONNECT table directed to a local CSV file saved in character set utf8.
The table data is correctly displayed but the header is corrupting.
Can I configure to fix this or is it a bug?
/var/lib/mysql/table_utf8.csv
[root@kc0022 ~]# cat /var/lib/mysql/table_utf8.csv
|
"COUPON_NUM","COUPON_NAME","START_DATE","END_DATE"
|
"A8772","CAMPAIGN1","2005-03-11","2005-04-01"
|
"A6703","ANNIVERSARY","2005-08-15","2005-09-05"
|
"A4313","NEW_ITEM","2005-09-13","2005-10-04"
|
CONNECT table on MariaDB
[root@kc0022 ~]# mysql -u root
|
Welcome to the MariaDB monitor. Commands end with ; or \g.
|
Your MariaDB connection id is 351
|
Server version: 10.0.13-MariaDB MariaDB Server
|
|
Copyright (c) 2000, 2014, Oracle, SkySQL Ab and others.
|
|
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
|
|
MariaDB [(none)]> use test;
|
Reading table information for completion of table and column names
|
You can turn off this feature to get a quicker startup with -A
|
|
Database changed
|
MariaDB [test]> CREATE TABLE table_utf8
|
-> engine=CONNECT table_type=CSV file_name='../table_utf8.csv'
|
-> header=1 sep_char=',' quoted=1;
|
Query OK, 0 rows affected (0.05 sec)
|
|
MariaDB [test]> EXPLAIN table_utf8;
|
+----------------------------------------------------------------------+----------+------+-----+---------+-------+
|
| Field | Type | Null | Key | Default | Extra |
|
+----------------------------------------------------------------------+----------+------+-----+---------+-------+
|
| i≫?"i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-" | char(5) | NO | | NULL | |
|
| i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\ | char(33) | NO | | NULL | |
|
| i?3i?´i?!i?2i?´i??i??i?!i?´i?\ | char(10) | NO | | NULL | |
|
| i?\i?Ri??i??i??i?!i?´i?\ | char(10) | NO | | NULL | |
|
+----------------------------------------------------------------------+----------+------+-----+---------+-------+
|
4 rows in set (0.05 sec)
|
|
MariaDB [test]> SELECT * FROM table_utf8;
|
+----------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
|
| i≫?"i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?μi?-" | i?£i? ̄i?μi?°i? ̄i?Ri??i?Ri?!i?-i?\ | i?3i? ´i?!i?2i?´i??i??i?!i?´i?\ | i?\i?Ri??i??i??i?!i?´i?\ |
|
+----------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
|
| A8772 | CAM PAIGN1 | 2005-03-11 | 2005-04-01 |
|
| A6703 | ANN IVERSARY | 2005-08-15 | 2005-09-05 |
|
| A4313 | NEW _ITEM | 2005-09-13 | 2005-10-04 |
|
+----------------------------------------------------------------------+--------------------------------------------------------------------+--------------------------------------------------------------+--------------------------------------------------+
|
3 rows in set (0.00 sec)
|
|
MariaDB [test]> show variables like "%char%";
|
+--------------------------+----------------------------+
|
| Variable_name | Value |
|
+--------------------------+----------------------------+
|
| character_set_client | utf8 |
|
| character_set_connection | utf8 |
|
| character_set_database | utf8 |
|
| character_set_filesystem | binary |
|
| character_set_results | utf8 |
|
| character_set_server | utf8 |
|
| character_set_system | utf8 |
|
| character_sets_dir | /usr/share/mysql/charsets/ |
|
+--------------------------+----------------------------+
|
8 rows in set (0.00 sec)
|
Indeed, using UltraEdit it was correctly decoded and I could verify if is in UTF-8 (coded in three bytes).
I did not check yet what happen with column names. They are retrieved by CONNECT from the file header during the discovery process and perhaps not decoded properly. This can be temporarily avoided by manually defining the columns in the create table statement.
However, I traced what happen with the data part. Because it is in UTF-8 (even the ASCII columns can be regarded as UTF-8) I created the table as:
CREATE TABLE table_utf8 engine=CONNECT table_type=CSV default charset=UTF8
file_name='E:/Data/table_utf8.csv' header=1 sep_char=',' quoted=1 data_charset=utf8;
The data_charset option indicates the coding of the table data.
What CONNECT does when reading the table is retrieving the column data and I could verify that it get it correctly and call the matching field store function with the charset parameter set by the data_charset option as UTF-8:
rc= fp->store(p, strlen(p), charset, CHECK_FIELD_WARN);
From there, all is done by MariaDB. If the default charset of the table is Latin1, MariaDB tries to convert the three bytes characters to Latin1 and fails, then they are displayed as '?' question marks.
If the table default charset is UTF-8, the data is not converted (as the from charset is equal to the to charset)
The way it is displayed depends on the client program and settings.
Therefore this seems to be a MariaDB issue, at least for the data part of the table.