[MDEV-10805] CAST to CHAR should use one of the collation settings instead of the charset's default collation Created: 2016-09-13  Updated: 2017-06-13

Status: Confirmed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.1.16
Fix Version/s: 10.1

Type: Bug Priority: Major
Reporter: Hong Dai Thanh Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None
Environment:

RHEL 6



 Description   

Given the following server options:

character-set-client-handshake=FALSE
character-set-server=latin1
collation-server=latin1_general_ci
 
sql_mode=ANSI,TRADITIONAL

And the following code to create database and test tables:

drop database if exists latin;
create database latin;
use latin;
show session variables where variable_name like '%coll%' or variable_name like '%char%' or variable_name like 'sql_mode';
 
create table if not exists num (
	a int,
	b int
);
 
create table if not exists str (
	a char(4),
	b char(4)
);
 
create view combine as
select cast(a as CHAR) || '.' || cast(b as CHAR) as cat
from num
union
select a || '.' || b as cat
from str;
 
create view cond as
select num.a as na, num.b as nb, str.a as sa, str.b as sb
from num
join str
on str.a = cast(num.a as CHAR);
 
show table status;

Running the code with mysql client gives the following output:

Variable_name   Value
character_set_client    latin1
character_set_connection        latin1
character_set_database  latin1
character_set_filesystem        binary
character_set_results   latin1
character_set_server    latin1
character_set_system    utf8
character_sets_dir      /usr/share/mysql/charsets/
collation_connection    latin1_general_ci
collation_database      latin1_general_ci
collation_server        latin1_general_ci
sql_mode        REAL_AS_FLOAT,PIPES_AS_CONCAT,ANSI_QUOTES,IGNORE_SPACE,ANSI,STRICT_TRANS_TABLES,STRICT_ALL_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,TRADITIONAL,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
Name    Engine  Version Row_format      Rows    Avg_row_length  Data_length     Max_data_length Index_length    Data_free       Auto_increment  Create_time     Update_time     Check_time      Collation       Checksum        Create_options       Comment
combine NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    Illegal mix of collations for operation 'UNION'
cond    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    NULL    Illegal mix of collations (latin1_general_ci,IMPLICIT) and (latin1_swedish_ci,IMPLICIT) for operation '='
num     InnoDB  10      Compact 0       0       16384   0       0       0       NULL    2016-09-13 18:05:27     NULL    NULL    latin1_general_ci       NULL
str     InnoDB  10      Compact 0       0       16384   0       0       0       NULL    2016-09-13 18:05:27     NULL    NULL    latin1_general_ci       NULL

We can see the output of the show table status statement shows collation mix error with the 2 views, despite the fact that all collation settings are already set to latin1_general_ci for the session.

Looking at the error message, it seems that while CAST implicitly uses the character set in the setting, it doesn't use the collation settings but use the default collation of the character set instead. This increases the effort from the programmer to keep track of the collation for all the CAST operations. Is it possible to change it to use one of the collation settings (whichever makes the most sense)?



 Comments   
Comment by Elena Stepanova [ 2016-09-19 ]

I've only gotten 'Illegal mix' for the view cond. Assigning to the charset expert bar.

Comment by Alexander Barkov [ 2016-09-21 ]

A smaller script demonstrating the same problem:

SET NAMES latin1 COLLATE latin1_general_ci;
DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (a VARCHAR(10) CHARACTER SET latin1 COLLATE latin1_general_ci, b INT);
INSERT INTO t1 VALUES (10,10);
SELECT * FROM t1 WHERE a=CAST(b AS CHAR);

The above SELECT correctly returns one row.

Now if I do the same with a VIEW:

DROP VIEW IF EXISTS v1;
CREATE VIEW v1 AS SELECT * FROM t1 WHERE a=CAST(b AS CHAR);
SELECT * FROM v1;

the SELECT query returns an error:

ERROR 1267 (HY000): Illegal mix of collations (latin1_general_ci,IMPLICIT) and (latin1_swedish_ci,IMPLICIT) for operation '='

This query returns how the view definition was stored:

SELECT  VIEW_DEFINITION FROM INFORMATION_SCHEMA.VIEWS WHERE TABLE_NAME='v1';

and it returns the following:

+----------------------------------------------------------------------------------------------------------------------------------------------+
| VIEW_DEFINITION                                                                                                                              |
+----------------------------------------------------------------------------------------------------------------------------------------------+
| select `test`.`t1`.`a` AS `a`,`test`.`t1`.`b` AS `b` from `test`.`t1` where (`test`.`t1`.`a` = cast(`test`.`t1`.`b` as char charset latin1)) |
+----------------------------------------------------------------------------------------------------------------------------------------------+

Notice, it adds an explicit charset clause inside CAST, which forces CAST use the default collation of the character set.

Comment by Alexander Barkov [ 2016-09-21 ]

A simple workaround is not to use CAST at all:

DROP VIEW IF EXISTS v1;
CREATE VIEW v1 AS SELECT * FROM t1 WHERE a=b;
SELECT * FROM v1;

This script correctly returns one row.

Generated at Thu Feb 08 07:45:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.