[MDEV-21449] Wrong collation for new table when all defaults set. Created: 2020-01-09  Updated: 2023-12-15

Status: Confirmed
Project: MariaDB Server
Component/s: Documentation
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Frank Forte Assignee: Joe Cotellese
Resolution: Unresolved Votes: 0
Labels: collation
Environment:

Linux and Windows


Issue Links:
PartOf

 Description   

If the database collation is 'utf8mb4_unicode_520_ci', I expect new tables with character set utf8mb4 will have that same collation. However, the new table created has collation 'utf8mb4_general_ci';

 
SET collation_connection = 'utf8mb4_unicode_520_ci';
SET NAMES 'utf8mb4' COLLATE 'utf8mb4_unicode_520_ci';
SET CHARACTER SET 'utf8mb4';
CREATE DATABASE test CHARACTER SET 'utf8mb4' COLLATE 'utf8mb4_unicode_520_ci';
 
CREATE TABLE `test`.`tester` (
  `id` mediumint(8) unsigned NOT NULL AUTO_INCREMENT,
  `name` varchar(191) NOT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
 
select TABLE_COLLATION from tables where TABLE_SCHEMA = 'test' AND TABLE_NAME = 'tester';
 
+--------------------+
| TABLE_COLLATION    |
+--------------------+
| utf8mb4_general_ci |
+--------------------+

why?



 Comments   
Comment by Frank Forte [ 2020-01-09 ]

Because of the unexpected collation, I get this error with some queries:

"Illegal mix of collations (utf8mb4_general_ci,IMPLICIT) and (utf8mb4_unicode_520_ci,IMPLICIT) for operation 'like'"

Comment by Elena Stepanova [ 2020-01-13 ]

This is actually documented behavior:

The CREATE TABLE and ALTER TABLE statements support optional character set and collation clauses, a MariaDB and MySQL extension to standard SQL.

If neither character set nor collation is provided, the database default will be used. If only the character set is provided, the default collation for that character set will be used. If only the collation is provided, the associated character set will be used. See Supported Character Sets and Collations.

It seems reasonable, since you could just as well try to create a table with CHARSET=latin1, for example, and it would be weird to use the default database collation for that.

However, documentation for this subject, while quite detailed, could use some revising and clarification, as it's contradictory in different parts.
For example, here it says:

[DEFAULT] COLLATE is used to set a default collation for the table. This is the collation used for all columns where an explicit character set is not specified. If this option is omitted or DEFAULT is specified, database's default option will be used.

And here it also says

Character sets and collations always cascade down, so a column without a specified collation will look for the table default, the table for the database, and the database for the server.

So I'm switching it to documentation.

Generated at Thu Feb 08 09:07:13 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.