[MDEV-7521] CONNECT Engine Column names are not retrieved properly when field values are not ANCII characters Created: 2015-01-28  Updated: 2015-06-10  Resolved: 2015-03-18

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.0.15, 10.0.16
Fix Version/s: 10.0.18

Type: Bug Priority: Major
Reporter: Evgeny Kosolapov Assignee: Olivier Bertrand
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS, MariaDB 10.0.16, Connect Engine 1.3


Attachments: File source.csv     File src.fix    

 Description   

I am trying to replicate example how to create pivot table like from MariaDB documentation: https://mariadb.com/kb/en/mariadb/documentation/storage-engines/connect/connect-table-types/connect-table-types-pivot-table-type/

Everything works as fine as explained.
However, if I replace in source table fields 'Beer' with '啤酒‘ and 'Car' with 'машина‘, I get the following CREATE TABLE statement:

CREATE TABLE `pivex_cn`
(
`who` varchar(135) NOT NULL,
`week` int(11) NOT NULL,
`啤酒` decimal(6,2) NOT NULL `FLAG`=1,
`Food` decimal(6,2) NOT NULL `FLAG`=1,
`машина` decimal(6,2) NOT NULL `FLAG`=1
)
ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='pivot' `TABNAME`='topivot_cn'

Which then in SELECT * FROM pivex_cn; gives the following error: "Error Code: 1296. Got error 122 'Cannot find matching column' from CONNECT"

All tables are using utf8 as default. I also tried uft8mb4, the same result

Bug was discovered first at 10.0.15 MariaDB version, but reproduced in 10.0.16 as well..



 Comments   
Comment by Olivier Bertrand [ 2015-01-28 ]

When you say that in the original data file you replaced 'Car' with 'машина‘ I suppose it is coded in UTF8 in the data file.

The problem is that MariaDB unconditionally converts column names into UTF8. Therefore the corresponding column should be named literally, not by its UTF8 representation to avoid a double conversion. Unfortunately trying to create the pivot table by:

 
CREATE TABLE `pivex_cn`
(
`who` varchar(135) NOT NULL,
`week` int(11) NOT NULL,
`Beer` decimal(6,2) NOT NULL `FLAG`=1,
`Food` decimal(6,2) NOT NULL `FLAG`=1,
`машина` decimal(6,2) NOT NULL `FLAG`=1
)
ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='pivot' `TABNAME`='topivot_cn';

fails with the MariaDB message (1166) Incorrect column name ''.

That's unfortunate because for instance, replacing Beer with UTF8 Bière in the data file I can create the table:

CREATE TABLE `pivex_fr`
(
`who` varchar(135) NOT NULL,
`week` int(11) NOT NULL,
`Bière` decimal(6,2) NOT NULL `FLAG`=1,
`Food` decimal(6,2) NOT NULL `FLAG`=1,
`Car` decimal(6,2) NOT NULL `FLAG`=1
)
ENGINE=CONNECT TABLE_TYPE='pivot' `TABNAME`='topivot_fr';

It is accepted and works perfectly.

The issue is why is MariaDB refusing the create table with a quoted Cyrillic column name.

Comment by Evgeny Kosolapov [ 2015-01-29 ]

Yes, everything is coded in UTF8, its default in my database settings. It dosn't matter if I replace data within the table, or create the table from scratch. Let's do exercise with creating table from scratch:

Step 1:

 CREATE TABLE `original_data` (
 `who` VARCHAR(45) NOT NULL,
 `what` VARCHAR(45) NOT NULL,
 `amount` DECIMAL(4,2) NOT NULL);
 
INSERT INTO `original_data` (who, what, amount)
VALUES ('Beth', 'Pizza', 12), ('Janet', 'Beer', 3), ('Ali', 'Car', 3), ('Tom', 'Bread', 8);

Result:

MariaDB > select * from original_data;
------------------

who what amount

------------------

Beth Pizza 12.00
Janet Beer 3.00
Ali Car 3.00
Tom Bread 8.00

------------------
4 rows in set (0.00 sec)

Step 2:

 create table pivex
engine=connect table_type=pivot tabname=original_data;

Result:

MariaDB > select * from pivex;
-------------------------

who Pizza Beer Car Bread

-------------------------

Ali 0.00 0.00 3.00 0.00
Beth 12.00 0.00 0.00 0.00
Janet 0.00 3.00 0.00 0.00
Tom 0.00 0.00 0.00 8.00

-------------------------
4 rows in set (0.00 sec)

Everything is nice. Perfect! Lets check SHOW CREATE TABLE for pivex:

 | pivex | CREATE TABLE `pivex` (
  `who` varchar(135) NOT NULL,
  `Pizza` decimal(6,2) NOT NULL `FLAG`=1,
  `Beer` decimal(6,2) NOT NULL `FLAG`=1,
  `Car` decimal(6,2) NOT NULL `FLAG`=1,
  `Bread` decimal(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='pivot' `TABNAME`='original_data' |

Now, lets repeat all of this with French, Russian and Chinese:

Step 1:

 drop table if exists original_data;
CREATE TABLE `original_data` (
 `who` VARCHAR(45) NOT NULL,
 `what` VARCHAR(45) NOT NULL,
 `amount` DECIMAL(4,2) NOT NULL);
 
INSERT INTO `original_data` (who, what, amount)
VALUES ('Beth', 'Pizza', 12), ('Janet', 'Bière', 3), ('Ali', 'Машина', 3), ('Tom', '面包', 8);

Result:

MariaDB > select * from original_data;
-------------------------

who what amount

-------------------------

Beth Pizza 12.00
Janet Bière 3.00
Ali Машина 3.00
Tom 面包 8.00

-------------------------
4 rows in set (0.00 sec)

Step 2:

 drop table if exists pivex;
create table pivex
engine=connect table_type=pivot tabname=original_data;

Result:
MariaDB > select * from pivex;
ERROR 1296 (HY000): Got error 122 'Cannot find matching column' from CONNECT

Lets check SHOW CREATE TABLE for pivex:

 | pivex | CREATE TABLE `pivex` (
  `who` varchar(135) NOT NULL,
  `Pizza` decimal(6,2) NOT NULL `FLAG`=1,
  `Bière` decimal(6,2) NOT NULL `FLAG`=1,
  `Машина` decimal(6,2) NOT NULL `FLAG`=1,
  `面包` decimal(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='pivot' `TABNAME`='original_data'                      |

Table 'original_data' was also in utf8 all the time by default:

 | original_data | CREATE TABLE `original_data` (
  `who` varchar(45) NOT NULL,
  `what` varchar(45) NOT NULL,
  `amount` decimal(4,2) NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |

So, basically, the point is that something wrong is happenning between Step 1 and Step 2, when the fields values are transmitted into column names. Of cource, we can re-create CREATE TABLE explicitely and manally (or with some programming around), but what the point then?

And, it doesn't matter if it is Russian, Chinese or only French. If I do

INSERT INTO `original_data` (who, what, amount)
VALUES ('Beth', 'Pizza', 12), ('Janet', 'Bière', 3), ('Ali', 'Car', 3), ('Tom', 'Bread', 8);

I have the same error.

Comment by Evgeny Kosolapov [ 2015-02-06 ]

any hope?

Comment by Olivier Bertrand [ 2015-02-09 ]

I shall work on it.

Comment by Olivier Bertrand [ 2015-02-09 ]

Well, I did try to reproduce your case but unsuccessfully. First of all, I could not execute the command:

INSERT INTO `original_data` (who, what, amount)
VALUES ('Beth', 'Pizza', 12), ('Janet', 'Bière', 3), ('Ali', 'Машина', 3), ('Tom', '面包', 8);

It was rejected, probably due to the russian and chinese values that are not recognised on my machine. So what I did, is to manually make a CONNECT table corresponding to the 'original_data' table using and editor
able to past the 'Машина' and '面包' strings and code the whole thing in UTF8.

I attached this file, "src.fix", to this case.

Then I could made a table on it:

CREATE OR REPLACE TABLE t1 (
 `who` VARCHAR(16) NOT NULL ,
 `what` VARCHAR(16) NOT NULL,
 `amount` DOUBLE(6,2) NOT NULL)
ENGINE=CONNECT TABLE_TYPE=DOS FILE_NAME='src.fix' LRECL=128 DATA_CHARSET=UTF8;

but trying to see it resulted in:

MariaDB [test]> select * from t1;
+-------+--------+--------+
| who   | what   | amount |
+-------+--------+--------+
| Beth  | Pizza  |  12.00 |
| Janet | Bière  |   3.00 |
| Ali   | ?????? |   3.00 |
| Tom   | ??     |   8.00 |
+-------+--------+--------+
4 rows in set, 4 warnings (23 min 9.74 sec)
 
MariaDB [test]> show warnings;
+---------+------+----------------------------------------------------------------------------------+
| Level   | Code | Message                                                                          |
+---------+------+----------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xD0\x9C\xD0\xB0\xD1\x88...' for column 'what' at row 3 |
| Warning | 1105 | Out of range value ?????? for column 'what' at row 3                             |
| Warning | 1366 | Incorrect string value: '\xE9\x9D\xA2\xE5\x8C\x85' for column 'what' at row 4    |
| Warning | 1105 | Out of range value ?? for column 'what' at row 4                                 |
+---------+------+----------------------------------------------------------------------------------+
4 rows in set (0.00 sec)

However, suppressing "DATA_CHARSET=UTF8" from the t1 definition, it was seen as:

MariaDB [test]> select * from t1;
+-------+--------------+--------+
| who   | what         | amount |
+-------+--------------+--------+
| Beth  | Pizza        |  12.00 |
| Janet | Bière       |   3.00 |
| Ali   | Машина |   3.00 |
| Tom   | 面包        |   8.00 |
+-------+--------------+--------+
4 rows in set (11.10 sec)

Now I made the PIVOT table:

CREATE TABLE t2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT TABNAME=t1;

And when looking at it I got:

MariaDB [test]> select * from t2;
+-------+-------+--------+--------------+--------+
| who   | Pizza | Bière | Ð?аÑ?ина | é?¢å?? |
+-------+-------+--------+--------------+--------+
| Ali   |  0.00 |   0.00 |         3.00 |   0.00 |
| Beth  | 12.00 |   0.00 |         0.00 |   0.00 |
| Janet |  0.00 |   3.00 |         0.00 |   0.00 |
| Tom   |  0.00 |   0.00 |         0.00 |   8.00 |
+-------+-------+--------+--------------+--------+
4 rows in set (0.39 sec)

Forgetting the fact that cyrillic and chinese characters are not recognised on my machine, it apparently "worked"
and I got the PIVOT table displayed. The only difference with your sample is that I set the DEFAULT CHARSET of the PIVOT table to UTF8. You should try it and tell me what it does.

Note that this is not a real fix of this problem, because I think that the column names should be the original ones, not their UTF8 translation, as well in the PIVOT create table as in the result headers.

This in fact can be achieved starting from the first flavor of the t1 table, the one specifying "DATA_CHARSET=UTF8". This is interresting. After making the t2 PIVOT table on it we get:

show create table t2;
CREATE TABLE `t2` (
  `who` varchar(16) NOT NULL,
  `Pizza` double(6,2) NOT NULL `FLAG`=1,
  `Bière` double(6,2) NOT NULL `FLAG`=1,
  `??????` double(6,2) NOT NULL `FLAG`=1,
  `??` double(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='PIVOT' `TABNAME`='t1';

And it is displayed as:

MariaDB [test]> select * from t2;
+-------+-------+-------+--------+------+
| who   | Pizza | Bière | ?????? | ??   |
+-------+-------+-------+--------+------+
| Ali   |  0.00 |  0.00 |   3.00 | 0.00 |
| Beth  | 12.00 |  0.00 |   0.00 | 0.00 |
| Janet |  0.00 |  3.00 |   0.00 | 0.00 |
| Tom   |  0.00 |  0.00 |   0.00 | 8.00 |
+-------+-------+-------+--------+------+
4 rows in set (0.02 sec)

Wonderful isn't it?

Comment by Evgeny Kosolapov [ 2015-02-10 ]

Well, this is really confusing why your machine cannot handle Russian and Chinese, I hope there is nothing political
We use CentOS on cloud server.

Anyway, when I was trying to replicate your example, I've got following strange results:

MariaDB [test]> CREATE OR REPLACE TABLE t1 (
    ->  `who` VARCHAR(16) NOT NULL ,
    ->  `what` VARCHAR(16) NOT NULL,
    ->  `amount` DOUBLE(6,2) NOT NULL)
    -> ENGINE=CONNECT TABLE_TYPE=DOS FILE_NAME='/home/src.fix' LRECL=128 DATA_CHARSET=UTF8;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> select * from t1;
+------------------+------+--------+
| who              | what | amount |
+------------------+------+--------+
| Beth             |      |   0.00 |
| Janet            |      |   0.00 |
| Ali              |      |   0.00 |
| Tom              |      |   0.00 |
+------------------+------+--------+
4 rows in set, 8 warnings (0.01 sec)

As we can see, there are more warnings here, but for all rows, even for English one:

MariaDB [test]> show warnings;
+---------+------+-------------------------------------------------------------------------------------+
| Level   | Code | Message                                                                             |
+---------+------+-------------------------------------------------------------------------------------+
| Warning | 1265 | Data truncated for column 'who' at row 1                                            |
| Warning | 1105 | Out of range value Beth            Pizza            12.00 for column 'who' at row 1 |
| Warning | 1265 | Data truncated for column 'who' at row 2                                            |
| Warning | 1105 | Out of range value Janet           Bière            3.00 for column 'who' at row 2  |
| Warning | 1265 | Data truncated for column 'who' at row 3                                            |
| Warning | 1105 | Out of range value Ali             Машина      3.00 for column 'who' at row 3       |
| Warning | 1265 | Data truncated for column 'who' at row 4                                            |
| Warning | 1105 | Out of range value Tom             面包            8.00 for column 'who' at row 4   |
+---------+------+-------------------------------------------------------------------------------------+
8 rows in set (0.00 sec)

Okay, maybe something wrong with actual src.fix file, created on your machine. So, I have created my own CSV file with similar format. Attached here - source.csv

It works much better at first step!

MariaDB [test]> CREATE OR REPLACE TABLE t1 (
    ->  `who` VARCHAR(16) NOT NULL ,
    ->  `what` VARCHAR(16) NOT NULL,
    ->  `amount` DOUBLE(6,2) NOT NULL)
    -> ENGINE=CONNECT TABLE_TYPE=CSV FILE_NAME='/home/source.csv' LRECL=128 DATA_CHARSET=UTF8;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> select * from t1;
+-------+--------------+--------+
| who   | what         | amount |
+-------+--------------+--------+
| Beth  | Pizza        |  12.00 |
| Janet | Bière        |   3.00 |
| Ali   | Машина       |   3.00 |
| Tom   | 面包         |   8.00 |
+-------+--------------+--------+
4 rows in set (0.00 sec)

However, when doing second step creating pivot table:

MariaDB [test]> CREATE TABLE t2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT TABNAME=t1;
Query OK, 0 rows affected (0.07 sec)

I get the following error:

MariaDB [test]> select * from t2;
ERROR 1296 (HY000): Got error 122 'Cannot find matching column' from CONNECT

So, basically it seems that issue is not how we create the table t1, but how we pivot it to table t2.

Comment by Olivier Bertrand [ 2015-02-10 ]

Excellent. Your example probably shows where is the issue... except that it is missing one important item: The SHOW CREATE TABLE of the PIVOT table.

Here is what I get starting from your source.csv file:

CREATE OR REPLACE TABLE tc1 (
 `who` VARCHAR(16) NOT NULL ,
 `what` VARCHAR(16) NOT NULL,
 `amount` DOUBLE(6,2) NOT NULL)
ENGINE=CONNECT DEFAULT CHARSET latin1 TABLE_TYPE=CSV FILE_NAME='source.csv' LRECL=100 DATA_CHARSET=utf8 ending=1;
 
select * from tc1;

who what amount
Beth Pizza 12.00
Janet Bière 3.00
Ali ?????? 3.00
Tom ?? 8.00

show warnings;

Level Code Message
Warning 1366 Incorrect string value: '\xD0\x9C\xD0\xB0\xD1\x88...' for column 'what' at row 3
Warning 1105 Out of range value ?????? for column 'what' at row 3
Warning 1366 Incorrect string value: '\xE9\x9D\xA2\xE5\x8C\x85' for column 'what' at row 4
Warning 1105 Out of range value ?? for column 'what' at row 4

These warnings you don't get because on your system, the russian and chinese values are accepted.
Now creating the PIVOT table:

CREATE TABLE tc2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT TABNAME=tc1;
 
show create table tc2;
Table Create Table
tc2 CREATE TABLE `tc2` (
  `who` varchar(16) NOT NULL,
  `Pizza` double(6,2) NOT NULL `FLAG`=1,
  `Bière` double(6,2) NOT NULL `FLAG`=1,
  `??????` double(6,2) NOT NULL `FLAG`=1,
  `??` double(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='PIVOT' `TABNAME`='tc1';

The columns names for these values are

'??????' and '??'

.
CONNECT is able to retrieve them in the source table because when reading it, these values are also changed the same way, therefore:

select * from tc2;

returns:

who Pizza Bière ?????? ??
Ali 0.00 0.00 3.00 0.00
Beth 12.00 0.00 0.00 0.00
Janet 0.00 3.00 0.00 0.00
Tom 0.00 0.00 0.00 8.00

What happen in your case is that your PIVOT table is created with column names that are Машина and 面包 instead of

'??????' and '??'

.Then, for some reason, these names are not handled correctly and cannot be matched within the source table.

Unfortunately, I cannot reproduce this on my machine and this makes me unable to trace what happens and to find why it fails. I don't know either whether this is a CONNECT bug or a MariaDB issue.

Now you can also try this:

CREATE OR REPLACE TABLE tcx1 (
 `who` VARCHAR(16) NOT NULL ,
 `what` VARCHAR(16) NOT NULL,
 `amount` DOUBLE(6,2) NOT NULL)
ENGINE=CONNECT DEFAULT CHARSET latin1 TABLE_TYPE=CSV FILE_NAME='source.csv' LRECL=100;

That is the same thing but dropping the DATA_CHARSET=utf8 so the table text is no more converted.

select * from tcx1;

who what amount
Beth Pizza 12.00
Janet Bière 3.00
Ali Машина 3.00
Tom 面包 8.00

No more warnings, now let's make the PIVOT table:

CREATE TABLE tcx2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT TABNAME=tcx1;
 
show create table tcx2;
CREATE TABLE `tcx2` (
  `who` varchar(16) NOT NULL,
  `Pizza` double(6,2) NOT NULL `FLAG`=1,
  `Bière` double(6,2) NOT NULL `FLAG`=1,
  `Машина` double(6,2) NOT NULL `FLAG`=1,
  `面包` double(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='PIVOT' `TABNAME`='tcx1';
 
select * from tcx2;

who Pizza Bière Машина 面包
Ali 0.00 0.00 3.00 0.00
Beth 12.00 0.00 0.00 0.00
Janet 0.00 3.00 0.00 0.00
Tom 0.00 0.00 0.00 8.00

This is rather ugly but perhaps can work on your machine.

Comment by Evgeny Kosolapov [ 2015-02-11 ]

hmm.. nope. Strangely, your latest example didn't work on my machine. The steps are the following:

MariaDB [test]> CREATE OR REPLACE TABLE tcx1 (
    ->  `who` VARCHAR(16) NOT NULL ,
    ->  `what` VARCHAR(16) NOT NULL,
    ->  `amount` DOUBLE(6,2) NOT NULL)
    -> ENGINE=CONNECT DEFAULT CHARSET latin1 TABLE_TYPE=CSV FILE_NAME='/home/source.csv' LRECL=100;
Query OK, 0 rows affected (0.08 sec)
 
MariaDB [test]> select * from tcx1;
+-------+--------------------------+--------+
| who   | what                     | amount |
+-------+--------------------------+--------+
| Beth  | Pizza                    |  12.00 |
| Janet | Bière                   |   3.00 |
| Ali   | Машина             |   3.00 |
| Tom   | 面包                   |   8.00 |
+-------+--------------------------+--------+
4 rows in set (0.01 sec)

MariaDB [test]> CREATE TABLE tcx2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT TABNAME=tcx1;
Query OK, 0 rows affected (0.02 sec)
 
MariaDB [test]> show create table tcx2;
| tcx2  | CREATE TABLE `tcx2` (
  `who` varchar(48) NOT NULL,
  `Pizza` double(6,2) NOT NULL `FLAG`=1,
  `Bière` double(6,2) NOT NULL `FLAG`=1,
  `Машина` double(6,2) NOT NULL `FLAG`=1,
  `面包` double(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='PIVOT' `TABNAME`='tcx1'   

but when I try to do select:

MariaDB [test]> select * from tcx2;
ERROR 1296 (HY000): Got error 122 'Cannot find matching column' from CONNECT


in my example above (t1->t2) the SHOW CREATE TABLE t2 gives:

| t2    | CREATE TABLE `t2` (
  `who` varchar(48) NOT NULL,
  `Pizza` double(6,2) NOT NULL `FLAG`=1,
  `Bière` double(6,2) NOT NULL `FLAG`=1,
  `Машина` double(6,2) NOT NULL `FLAG`=1,
  `面包` double(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='PIVOT' `TABNAME`='t1'  

Comment by Olivier Bertrand [ 2015-02-11 ]

I see. The difference is that in the process of making the pivot table column definitions (by Discovery) the columns names are converted to utf8 while they are not on my machine. This is probably due to the setting of system variables. On my machine this is:

Variable_name			Value
character_set_client		latin1
character_set_connection	latin1
character_set_database		latin1
character_set_filesystem	binary
character_set_results		latin1
character_set_server		latin1
character_set_system		utf8

What you can do is to play with these settings or alternatively to manually define the columns in your PIVOT CREATE TABLE statement to see what happens.

Comment by Evgeny Kosolapov [ 2015-02-11 ]

Variables on my machine:

MariaDB [(none)]> SHOW VARIABLES LIKE  'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

If I manually define the columns in PIVOT CREATE TABLE it perfectly works. But isn't it the whole point of convenience of having it automatic? )

Comment by Olivier Bertrand [ 2015-02-11 ]

You're right. But the issue is in the Discovery process when character_set_results is set to utf8. And this happen after returning from the CONNECT connect_assisted_discovery function.

You could try:

set character_set_results=latin1;
CREATE TABLE tcx2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT;
set character_set_results=utf8;

I'd be interrested to see if it works.

Comment by Olivier Bertrand [ 2015-02-12 ]

What I said previously is not true. I was fooled because when character_set_results is set to utf8, this applies to the display of the result of the SHOW CREATE TABLE statement. However, the table is normally created with original column names.
What happens in your case is that for the Russian and Chinese characters, something causes the conversion to be wrong, although it is correct for French ones. But I cannot see where it happens on my machines that does not behave like yours.
Maybe what you could do is to display the SHOW CREATE TABLE of the table created via Discovery after setting the character_set_results to another charset, one allowing to see all your specific characters.

What I want to check is whether there is a difference in the definition of the table via Discovery (without the columns) and the definition of the table you made manually.

If they are identical (same output from SHOW CREATE TABLE) they should behave identically. If not, there is something hidden somewhere that should be discovered.

Comment by Evgeny Kosolapov [ 2015-03-11 ]

Hi Olivier, I was away due to Chinese New Year, but I didn't forget about the issue
Meanwhile, I have launched free instance on Amazon, so that we can both try the issue on the same machine.

So, if we launch fresh new Ubuntu and install fresh new MariaDB (10.0.17 now), add CONNECT Engine, the default Charsets would be:

MariaDB [(none)]> show variables like "%character%"; show variables like "%collation%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.01 sec)
 
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

I have also placed into /home/ubuntu/ two source files: source_en.csv and source_cn.csv

If I try to make Pivot tables from English only source (source_en.csv) everything works as perfect as in manuals. But if I try to make Pivot table from multilingual file (source_cn.cvs) I can replicate something similar what happens on your machine:

MariaDB [test]> CREATE OR REPLACE TABLE t1 (who VARCHAR(16) NOT NULL, what VARCHAR(16) NOT NULL, amount DOUBLE(6,2) NOT NULL) ENGINE=CONNECT TABLE_TYPE=CSV FILE_NAME='/home/ubuntu/source_cn.csv' LRECL=128 DATA_CHARSET=UTF8;
Query OK, 0 rows affected (0.01 sec)
 
MariaDB [test]> select * from t1;
+-------+--------+--------+
| who   | what   | amount |
+-------+--------+--------+
| Beth  | Pizza  |  12.00 |
| Janet | Bière  |   3.00 |
| Ali   | ?????? |   3.00 |
| Tom   | ??     |   8.00 |
+-------+--------+--------+
4 rows in set, 4 warnings (0.00 sec)
 
MariaDB [test]> show warnings;
+---------+------+----------------------------------------------------------------------------------+
| Level   | Code | Message                                                                          |
+---------+------+----------------------------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xD0\x9C\xD0\xB0\xD1\x88...' for column 'what' at row 3 |
| Warning | 1105 | Out of range value Машина for column 'what' at row 3                             |
| Warning | 1366 | Incorrect string value: '\xE9\x9D\xA2\xE5\x8C\x85' for column 'what' at row 4    |
| Warning | 1105 | Out of range value 面包 for column 'what' at row 4                               |
+---------+------+----------------------------------------------------------------------------------+
4 rows in set (0.00 sec)
 
MariaDB [test]> CREATE OR REPLACE TABLE t2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT TABNAME=t1;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test]> select * from t2;
+-------+-------+--------+--------+------+
| who   | Pizza | Bière  | ?????? | ??   |
+-------+-------+--------+--------+------+
| Ali   |  0.00 |   0.00 |   3.00 | 0.00 |
| Beth  | 12.00 |   0.00 |   0.00 | 0.00 |
| Janet |  0.00 |   3.00 |   0.00 | 0.00 |
| Tom   |  0.00 |   0.00 |   0.00 | 8.00 |
+-------+-------+--------+--------+------+
4 rows in set (0.00 sec)

Well, lets try to change Charsets to all uft8. I am modifying /etc/mysql/my.cnf as following:

[mysqld]
init_connect=‘SET collation_connection = utf8_unicode_ci’
character-set-server = utf8
collation-server = utf8_unicode_ci
 
[client]
default-character-set = utf8

Now:

MariaDB [(none)]> show variables like "%character%"; show variables like "%collation%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)
 
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_general_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)

In order to make pure fresh test, I create DB 'test2' and now trying to make pivot table:

MariaDB [test2]> CREATE OR REPLACE TABLE t1 (who VARCHAR(16) NOT NULL, what VARCHAR(16) NOT NULL, amount DOUBLE(6,2) NOT NULL) ENGINE=CONNECT TABLE_TYPE=CSV FILE_NAME='/home/ubuntu/source_cn.csv' LRECL=128 DATA_CHARSET=UTF8;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test2]> select * from t1;
+-------+--------------+--------+
| who   | what         | amount |
+-------+--------------+--------+
| Beth  | Pizza        |  12.00 |
| Janet | Bière        |   3.00 |
| Ali   | Машина       |   3.00 |
| Tom   | 面包         |   8.00 |
+-------+--------------+--------+
4 rows in set (0.01 sec)
 
MariaDB [test2]> CREATE OR REPLACE TABLE t2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT TABNAME=t1;
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [test2]> select * from t2;
*ERROR 1296 (HY000): Got error 122 'Cannot find matching column' from CONNECT
*
MariaDB [test2]> show create table t2;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                                                                                                                                  |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `who` varchar(16) NOT NULL,
  `Pizza` double(6,2) NOT NULL `FLAG`=1,
  `Bière` double(6,2) NOT NULL `FLAG`=1,
  `??????` double(6,2) NOT NULL `FLAG`=1,
  `??` double(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='PIVOT' `TABNAME`='t1'  |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Where should I send you login information for this server? Should I use your email I see here at Jira on gmail?

Comment by Olivier Bertrand [ 2015-03-11 ]

Yes you can use my email.

Comment by Evgeny Kosolapov [ 2015-03-11 ]

sent!

Comment by Olivier Bertrand [ 2015-03-18 ]

With the help of Alexander Barkov we could spot the places where column names were converted to latin1. That was causing the bug.
However, note that it fixes the case of general utf8 use but it would not take in account cases where several columns would be using different charsets.

Columns names are shown as their utf8 code. It cannot be different because the column names belong to different codepages. For instance:

MariaDB [(none)]> use test
Database changed
MariaDB [test]> set names utf8;
Query OK, 0 rows affected (0.04 sec)
 
MariaDB [test]> CREATE TABLE `tc1` (
  `who` varchar(16) NOT NULL,
  `what` varchar(16) NOT NULL,
  `amount` double(6,2) NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`=CSV `FILE_NAME`='source.csv' `LRECL`=100 `DATA_CHARSET`=utf8 `ending`=1;
affected rows 0
 
MariaDB [test]> select * from tc1;
+-------+--------------+--------+
| who   | what         | amount |
+-------+--------------+--------+
| Beth  | Pizza        |  12.00 |
| Janet | Bière       |   3.00 |
| Ali   | ð£ð░Ðêð©ð¢ð░ |   3.00 |
| Tom   | ÚØóÕîà       |   8.00 |
+-------+--------------+--------+
4 rows in set (0.11 sec)
 
MariaDB [test]> CREATE TABLE tc2 ENGINE=CONNECT DEFAULT CHARSET utf8 TABLE_TYPE=PIVOT TABNAME=tc1;
affected rows 0
 
MariaDB [test]> show create table tc2;
 
CREATE TABLE `tc2` (
  `who` varchar(48) NOT NULL,
  `Pizza` double(6,2) NOT NULL `FLAG`=1,
  `Bière` double(6,2) NOT NULL `FLAG`=1,
  `ð£ð░Ðêð©ð¢ð░` double(6,2) NOT NULL `FLAG`=1,
  `ÚØóÕîà` double(6,2) NOT NULL `FLAG`=1
) ENGINE=CONNECT DEFAULT CHARSET=utf8 `TABLE_TYPE`='PIVOT' `TABNAME`='tc1';
 
MariaDB [test]> select * from tc2;
+-------+-------+--------+--------------+--------+
| who   | Pizza | Bi├¿re | ð£ð░Ðêð©ð¢ð░ | ÚØóÕîà |
+-------+-------+--------+--------------+--------+
| Ali   |  0.00 |   0.00 |         3.00 |   0.00 |
| Beth  | 12.00 |   0.00 |         0.00 |   0.00 |
| Janet |  0.00 |   3.00 |         0.00 |   0.00 |
| Tom   |  0.00 |   0.00 |         0.00 |   8.00 |
+-------+-------+--------+--------------+--------+
4 rows in set (0.54 sec)

Comment by Evgeny Kosolapov [ 2015-03-19 ]

Sorry, explanation is not clear. What should be done in order to fix the bug?

I was trying to replicate your example on test machine, still got the error.

Comment by Olivier Bertrand [ 2015-03-19 ]

Nothing you can do now. The bug will be fixed in the next version 10.0.18.

Comment by Evgeny Kosolapov [ 2015-06-10 ]

Thanks a lot! I see it fixed.

Generated at Thu Feb 08 07:20:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.