[MDEV-7375] FEDERATED + DISCOVERY can make UTF8 columns to be corrupted Created: 2014-12-25  Updated: 2023-11-28

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Federated
Affects Version/s: 10.0.15
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Minor
Reporter: Olivier Bertrand Assignee: Alexander Barkov
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Windows 7


Issue Links:
Relates
relates to MDEV-7343 corrupted text using connect Closed

 Description   

Having done on the remote server:

create table t1 (line char(32) character set UTF8 not null) engine=myisam;
insert into t1 values('Et si on était déjà à noël ?');
select * from t1;

returns:

line
Et si on était déjà à noël ?

On the local server, a FEDERATED table to access it can be created as:

create table t1 engine=FEDERATED
connection='mysql://root:tinono@localhost:3307/test/t1';
select * from t1;

returns:

line
Et si on

and issues on warning saying:

Level Code Message
Warning 1366 Incorrect string value: '\xE9tait ...' for column 'line' at row 1

Note: This can be corrected by specifying the default charset of the local FEDERATED table as DEFAULT CHARSET=UTF8 or by explicitely defining the local table column not specifying its character set. But this is not clearly documented.



 Comments   
Comment by Elena Stepanova [ 2014-12-26 ]

bar,

The behavior doesn't look right to me...
With the automatic discovery, the federated table's structure is exactly like the remote one, with charset specified for the column. And it doesn't work. On the other hand, if the column charset is not specified on the federated table and thus stays latin1, it works fine. At the very least it's counter-intuitive.
What do you think?

Comment by Olivier Bertrand [ 2014-12-26 ]

The explanation is that FEDERATED (and now CONNECT) set the connection charset to the default local table charset before connecting. In this case the local table charset is latin1 by default and so is the connection charset. Thus the UTF8 column is translated to latin1 on the connection and this is why it should not be specified as UTF8 on the local table. If is must remain UTF8, the local default charset must be specified to UTF8 and then the connection charset will be UTF8 and the column contains will not be translated.

Note that this does not occur with CONNECT because in the discovery process, CONNECT ignores the column charset specification of the remote table, which also can be wrong in some cases. However, this shows that the whole process could be reconsidered or at least properly documented.

Generated at Thu Feb 08 07:19:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.