[MDEV-21815] Incorrect character set / collation for user variables Created: 2020-02-25  Updated: 2020-07-22  Resolved: 2020-06-11

Status: Closed
Project: MariaDB Server
Component/s: Character Sets, Variables
Affects Version/s: 10.5.1, 10.5
Fix Version/s: 10.5.4

Type: Bug Priority: Major
Reporter: Todd Michael Assignee: Alexander Barkov
Resolution: Not a Bug Votes: 0
Labels: Compatibility, regression
Environment:

Windows 10 Pro 64 bit


Issue Links:
Relates
relates to MDEV-23246 Illegal mix of collations Open

 Description   

The values of this query differs from MariaDB 10.3 and MariaDB 10.4.

MariaDB 10.5 initially casts these user variables as latin1 instead of the server character set or collation. This leads to errors like "Error Code: 1267. Illegal mix of collations (latin1_swedish_ci,IMPLICIT) and (utf8_general_ci,COERCIBLE) for operation 'if'" when attempting to use such user variables in subsequent comparisons and calculations, which otherwise work perfectly on MariaDB 10.3 and MariaDB 10.4.

SET character_set_server = 'utf8mb4';
SET collation_server = 'utf8mb4_general_ci';
SELECT 
   @literalNULL:=NULL, CHARSET(@literalNULL), /* Correct as binary. */
   @literalINTEGER:=123, CHARSET(@literalINTEGER), /* latin1 then binary: Should be utf8 then binary on subsequent query. */
   @literalFLOAT:=1.23e40, CHARSET(@literalFLOAT), /* latin1 then binary: Should be utf8 then binary on subsequent query. */
   @literalTRUE:=TRUE, CHARSET(@literalTRUE), /* latin1 then binary: Should be utf8 then binary on subsequent query. */
   @literalFALSE:=FALSE, CHARSET(@literalFALSE), /* latin1 then binary: Should be utf8 then binary on subsequent query. */
   @literalDATE:= DATE '2001-01-01', CHARSET(@literalDATE), /* latin1 then latin1: Should be utf8 then utf8 on subsequent query. */
   @literalTIME:= TIME '13:01:02', CHARSET(@literalTIME), /* latin1 then latin1: Should be utf8 then utf8 on subsequent query. */
   @literalDATETIME:= TIMESTAMP '2001-01-01 13:01:02', CHARSET(@literalDATETIME), /* latin1 then latin1: Should be utf8 then utf8 on subsequent query. */
   @literalCHAR:='abc', CHARSET(@literalCHAR), /* Correct as utf8. */
   
   @calcCHAR:=CONCAT(NOW(), ' and then'), CHARSET(@calcCHAR), /* Correct as utf8. */
   
   @castCHAR:=CAST(NOW() AS CHAR), CHARSET(@castCHAR), /* Correct as utf8. */
   @castINT:=CAST(123 AS INTEGER), CHARSET(@castINT), /* latin1 then binary: Should be utf8 then binary on subsequent query. */
   @castDOUBLE:=CAST(1.23e40 AS DOUBLE), CHARSET(@castDOUBLE), /* latin1 then binary: Should be utf8 then binary on subsequent query. */
   @castDECIMAL:=CAST(1.23e4 AS DECIMAL), CHARSET(@castDECIMAL), /* latin1 then binary: Should be utf8 then binary on subsequent query. */
   @castDATE:=CAST(NOW() AS DATE), CHARSET(@castDATE), /* latin1 then latin1: Should be utf8 then utf8 on subsequent query. */
   @castTIME:=CAST(NOW() AS TIME), CHARSET(@castTIME), /* latin1 then latin1: Should be utf8 then utf8 on subsequent query. */
   @castDATETIME:=CAST(NOW() AS DATETIME), CHARSET(@castDATETIME) /* latin1 then latin1: Should be utf8 then utf8 on subsequent query. */



 Comments   
Comment by Elena Stepanova [ 2020-03-08 ]

Thanks for the report.

The change happened with this commit:

commit 4b5a76741eced5b3c39e8438f45d979136d58a33
Author: Alexander Barkov <bar@mariadb.com>
Date:   Tue Oct 1 22:30:28 2019 +0400
 
    MDEV-20712 Wrong data type for CAST(@a AS BINARY) for a numeric variable

I'll leave it to bar to determine whether the described effect was intentional or not.

Comment by Alexander Barkov [ 2020-06-11 ]

This change was intentional.

Please report (as a separate issue) the exact SQL script which worked before 10.5.2 and started to return Illegal mix in 10.5.2.

Note, using functions charset() and collation() is not reliable in this context.

There are two types of charset/collation of an expression.
1. the character set and the collation which is visible in the client-server result metadata
2. the character set and the collation which is used for argument aggregation, e.g. for comparison

In case of numeric data, the character set for the protocol is binary, while the character set for aggregation is latin1.
latin1 is needed to make:

SELECT CONCAT('text ', 5)

return an expression of the session default character set and collation (rather than a binary expression).

Note, functions CHARSET() and COLLATION() return #1, i.e. the protocol attributes. So the above script is not much useful to find the reason of Illegal mix.

Also, CHARSET() and COLLATION() do not calculate its argument, so assignments unside:

SELECT
  CHARSET(@a:=123),
  COLLATION(@b:=123);

do not really happen as of 10.5.2
(which is probably not good).

Comment by Todd Michael [ 2020-07-21 ]

POSTSCRIPT: Followed up at MDEV-23246.

Generated at Thu Feb 08 09:10:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.