[MDEV-27782] Wrong columns when using table level `CHARACTER SET utf8mb4 COLLATE DEFAULT` Created: 2022-02-09  Updated: 2022-06-03  Resolved: 2022-05-24

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 10.2, 10.3, 10.4, 10.5, 10.6, 10.7, 10.8, 10.9
Fix Version/s: 10.9.2

Type: Bug Priority: Critical
Reporter: Alexander Barkov Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-27009 Add UCA-14.0.0 collations Closed
relates to MDEV-28067 Multiple conflicting column COLLATE c... Closed
relates to MDEV-27690 Crash on `CHARACTER SET csname COLLAT... Closed
relates to MDEV-27743 Remove Lex::charset Closed
relates to MDEV-27853 Wrong data type on column `COLLATE DE... Closed
relates to MDEV-28117 Multiple conflicting table COLLATE cl... Closed

 Description   

CREATE OR REPLACE TABLE t1 (a CHAR(10)) CHARACTER SET utf8mb4 COLLATE DEFAULT;
SHOW CREATE TABLE t1;

+-------+----------------------------------------------------------------------------------------+
| Table | Create Table                                                                           |
+-------+----------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `a` char(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+----------------------------------------------------------------------------------------+

The column was created with a wrong character set. The expected character set is utf8mb4.



 Comments   
Comment by peterdd [ 2022-03-26 ]

I tested with a Debian Testing VM: Linux testing 5.15.0-2-amd64 #1 SMP Debian 5.15.5-2 (2021-12-18) x86_64 GNU/Linux

MariaDB [test]> CREATE OR REPLACE TABLE t1 (a CHAR(10)) CHARACTER SET utf8mb4 COLLATE DEFAULT;
Query OK, 0 rows affected (0.005 sec)
 
MariaDB [test]> 
MariaDB [test]> SHOW CREATE TABLE t1;
+-------+-----------------------------------------------------------------------------------------+
| Table | Create Table                                                                            |
+-------+-----------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `a` char(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 |
+-------+-----------------------------------------------------------------------------------------+
1 row in set (0.001 sec)

The result seems ok to me.
Maybe you have different default settings (database default character set/defautl collation)?

What is the result of

select * from information_schema.schemata;

MariaDB [test]> status
--------------
mysql  Ver 15.1 Distrib 10.6.7-MariaDB, for debian-linux-gnu (x86_64) using  EditLine wrapper
 
Connection id:		31
Current database:	test
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server:			MariaDB
Server version:		10.6.7-MariaDB-3 Debian buildd-unstable
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	utf8mb4
Db     characterset:	utf8mb4
Client characterset:	utf8mb3
Conn.  characterset:	utf8mb3
UNIX socket:		/run/mysqld/mysqld.sock
Uptime:			1 min 11 sec

Update: I am able to reproduce using a database test2 with other charset and collation:

select * from information_schema.schemata;
+--------------+--------------------+----------------------------+------------------------+----------+----------------+
| CATALOG_NAME | SCHEMA_NAME        | DEFAULT_CHARACTER_SET_NAME | DEFAULT_COLLATION_NAME | SQL_PATH | SCHEMA_COMMENT |
+--------------+--------------------+----------------------------+------------------------+----------+----------------+
| def          | information_schema | utf8mb3                    | utf8mb3_general_ci     | NULL     |                |
| def          | test2              | latin1                     | latin1_swedish_ci      | NULL     |                |
| def          | sys                | utf8mb3                    | utf8mb3_general_ci     | NULL     |                |
| def          | performance_schema | utf8mb3                    | utf8mb3_general_ci     | NULL     |                |
| def          | mysql              | utf8mb4                    | utf8mb4_general_ci     | NULL     |                |
| def          | test               | utf8mb4                    | utf8mb4_general_ci     | NULL     |                |
+--------------+--------------------+----------------------------+------------------------+----------+----------------+
6 rows in set (0.001 sec)
 
MariaDB [test]> use test2
Database changed
MariaDB [test2]> CREATE OR REPLACE TABLE t1 (a CHAR(10)) CHARACTER SET utf8mb4 COLLATE DEFAULT;
Query OK, 0 rows affected (0.006 sec)
 
MariaDB [test2]> show create table t1;
+-------+----------------------------------------------------------------------------------------+
| Table | Create Table                                                                           |
+-------+----------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `a` char(10) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1 |
+-------+----------------------------------------------------------------------------------------+
1 row in set (0.001 sec)

So it depends on the database settings and the chosen charset/collation for the table creation.

So maybe if charset utf8mb4 and the default collation is incompatible to utf8mb4 there is no fallback to fix this?
Or should the query be denied with an error message?

How behaves Mysql in this cases? (I wish these behave the same on both.)

Comment by peterdd [ 2022-03-26 ]

CREATE OR REPLACE TABLE t2 (a CHAR(10)) CHARACTER SET utf8mb4 COLLATE latin1_swedish_ci;
ERROR 1253 (42000): COLLATION 'latin1_swedish_ci' is not valid for CHARACTER SET 'utf8mb4'

Explicitely setting an incompatible collation for the charset gives error. So either the 'DEFAULT COLLATE' fallbacks automatically to the 'default collation' of the wished CHARACTER SET or simply spits an error, right?

What says the SQL Standard? How does MySQL and other database types bahave?

Comment by Alexander Barkov [ 2022-03-30 ]

Correct, the result depends on the database settings. In my case, when I reported this MDEV, the database was created with latin1 as a default character set.

Before the fix "CHARACTER SET utf8mb4 COLLATE DEFAULT" forgot the "CHARACTER SET" part and used the database level defaults (both character set and collation). MySQL has the same problem.

After the fix it uses the default collation of the specified character set.

SQL Standard supports only explicit collation names in COLLATE clauses and does not have "COLLATE DEFAULT". This is a MySQL/MariaDB extension.

Comment by Alexander Barkov [ 2022-05-24 ]

Pushed in a single patch together with MDEV-27896

Generated at Thu Feb 08 09:55:32 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.