[MDEV-17662] Default to UTF8 Created: 2018-11-10  Updated: 2021-11-04  Resolved: 2021-04-06

Status: Closed
Project: MariaDB Server
Component/s: Character Sets
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Olaf van der Spek Assignee: Unassigned
Resolution: Duplicate Votes: 5
Labels: None

Issue Links:
Blocks
is blocked by MDEV-17081 Make optimizer aware of Mdev-371 (lon... Stalled
is blocked by MDEV-20416 benchmark the effect of default_chara... Closed
Duplicate
duplicates MDEV-19123 Change default charset from latin1 to... Open
Relates
relates to MDEV-8334 Rename utf8 to utf8mb3 Closed
relates to MDEV-23530 ERROR 1071: Specified key was too lon... Open

 Description   

Apparently UTF8 still isn't the default everywhere. Could this be fixed?

https://mariadb.com/kb/en/library/differences-in-mariadb-in-debian-and-ubuntu/



 Comments   
Comment by Otto Kekäläinen [ 2019-04-15 ]

Reported downstream: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923526

Comment by Olaf van der Spek [ 2019-04-15 ]

Note the downstream bug is related but different.

I think both Debian and upstream should update the default to utf8mb4_unicode_ci.

Comment by Olaf van der Spek [ 2019-06-20 ]

Any feedback?
Good idea?
Bad idea?

Comment by Sergei Golubchik [ 2019-08-25 ]

It has both good and bad sides. Good that users who need non-latin1 characters won't longer need to change the default character set. Bad for latin1-only users, because it comes with an unknown slowdown (MDEV-20416) and because it'll truncate indexes. Compare:

MariaDB [test]> create table t1 (a varchar(2000), index(a)) character set utf8mb4;
Query OK, 0 rows affected, 1 warning (0.013 sec)
 
Warning (Code 1071): Specified key was too long; max key length is 1000 bytes
MariaDB [test]> show create table t1;
+-------+-----------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                    |
+-------+-----------------------------------------------------------------------------------------------------------------+
| t1    | CREATE TABLE `t1` (
  `a` varchar(2000) DEFAULT NULL,
  KEY `a` (`a`(250))
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 |
+-------+-----------------------------------------------------------------------------------------------------------------+
 
MariaDB [test]> create table t2 (a varchar(2000), index(a));
Query OK, 0 rows affected, 1 warning (0.017 sec)
 
Warning (Code 1071): Specified key was too long; max key length is 1000 bytes
MariaDB [test]> show create table t2;
+-------+--------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                       |
+-------+--------------------------------------------------------------------------------------------------------------------+
| t2    | CREATE TABLE `t2` (
  `a` varchar(2000) DEFAULT NULL,
  KEY `a` (`a`(1000))
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+--------------------------------------------------------------------------------------------------------------------+

with latin1 the key is 4 times longer.

For UNIQUE keys it's even worse, a UNIQUE cannot be automatically truncated, so it'll be converted to a HASH based constraint, and such a constraint isn't yet supported by the optimizer (MDEV-17081). So, while a UNIQUE will still work (it's a new 10.4 feature, in 10.3 it'd be an error), it won't longer be used as an index unless MDEV-17081 is closed.

Comment by Otto Kekäläinen [ 2020-11-22 ]

Status for 10.6:

https://github.com/MariaDB/server/blob/10.6/cmake/character_sets.cmake

  SET(DEFAULT_CHARSET "latin1")
  SET(DEFAULT_COLLATION "latin1_swedish_ci")

https://github.com/MariaDB/server/blob/10.6/debian/additions/mariadb.conf.d/50-server.cnf

character-set-server  = utf8mb4
collation-server      = utf8mb4_general_ci

According to MDEV-20416 there is no slow-down in read-only benchmarks.

Comment by Olaf van der Spek [ 2021-04-07 ]

Dear Ian,

Isn't this the original issue while MDEV-19123 is the duplicate?

Generated at Thu Feb 08 08:38:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.