Details

    Description

      Apparently UTF8 still isn't the default everywhere. Could this be fixed?

      https://mariadb.com/kb/en/library/differences-in-mariadb-in-debian-and-ubuntu/

      Attachments

        Issue Links

          Activity

            otto Otto Kekäläinen added a comment - Reported downstream: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=923526

            Note the downstream bug is related but different.

            I think both Debian and upstream should update the default to utf8mb4_unicode_ci.

            XTF Olaf van der Spek added a comment - Note the downstream bug is related but different. I think both Debian and upstream should update the default to utf8mb4_unicode_ci.

            Any feedback?
            Good idea?
            Bad idea?

            XTF Olaf van der Spek added a comment - Any feedback? Good idea? Bad idea?

            It has both good and bad sides. Good that users who need non-latin1 characters won't longer need to change the default character set. Bad for latin1-only users, because it comes with an unknown slowdown (MDEV-20416) and because it'll truncate indexes. Compare:

            MariaDB [test]> create table t1 (a varchar(2000), index(a)) character set utf8mb4;
            Query OK, 0 rows affected, 1 warning (0.013 sec)
             
            Warning (Code 1071): Specified key was too long; max key length is 1000 bytes
            MariaDB [test]> show create table t1;
            +-------+-----------------------------------------------------------------------------------------------------------------+
            | Table | Create Table                                                                                                    |
            +-------+-----------------------------------------------------------------------------------------------------------------+
            | t1    | CREATE TABLE `t1` (
              `a` varchar(2000) DEFAULT NULL,
              KEY `a` (`a`(250))
            ) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 |
            +-------+-----------------------------------------------------------------------------------------------------------------+
             
            MariaDB [test]> create table t2 (a varchar(2000), index(a));
            Query OK, 0 rows affected, 1 warning (0.017 sec)
             
            Warning (Code 1071): Specified key was too long; max key length is 1000 bytes
            MariaDB [test]> show create table t2;
            +-------+--------------------------------------------------------------------------------------------------------------------+
            | Table | Create Table                                                                                                       |
            +-------+--------------------------------------------------------------------------------------------------------------------+
            | t2    | CREATE TABLE `t2` (
              `a` varchar(2000) DEFAULT NULL,
              KEY `a` (`a`(1000))
            ) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
            +-------+--------------------------------------------------------------------------------------------------------------------+
            

            with latin1 the key is 4 times longer.

            For UNIQUE keys it's even worse, a UNIQUE cannot be automatically truncated, so it'll be converted to a HASH based constraint, and such a constraint isn't yet supported by the optimizer (MDEV-17081). So, while a UNIQUE will still work (it's a new 10.4 feature, in 10.3 it'd be an error), it won't longer be used as an index unless MDEV-17081 is closed.

            serg Sergei Golubchik added a comment - It has both good and bad sides. Good that users who need non-latin1 characters won't longer need to change the default character set. Bad for latin1-only users, because it comes with an unknown slowdown ( MDEV-20416 ) and because it'll truncate indexes. Compare: MariaDB [test]> create table t1 (a varchar(2000), index(a)) character set utf8mb4; Query OK, 0 rows affected, 1 warning (0.013 sec)   Warning (Code 1071): Specified key was too long; max key length is 1000 bytes MariaDB [test]> show create table t1; +-------+-----------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+-----------------------------------------------------------------------------------------------------------------+ | t1 | CREATE TABLE `t1` ( `a` varchar(2000) DEFAULT NULL, KEY `a` (`a`(250)) ) ENGINE=MyISAM DEFAULT CHARSET=utf8mb4 | +-------+-----------------------------------------------------------------------------------------------------------------+   MariaDB [test]> create table t2 (a varchar(2000), index(a)); Query OK, 0 rows affected, 1 warning (0.017 sec)   Warning (Code 1071): Specified key was too long; max key length is 1000 bytes MariaDB [test]> show create table t2; +-------+--------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------------------------------------------------------------------------+ | t2 | CREATE TABLE `t2` ( `a` varchar(2000) DEFAULT NULL, KEY `a` (`a`(1000)) ) ENGINE=MyISAM DEFAULT CHARSET=latin1 | +-------+--------------------------------------------------------------------------------------------------------------------+ with latin1 the key is 4 times longer. For UNIQUE keys it's even worse, a UNIQUE cannot be automatically truncated, so it'll be converted to a HASH based constraint, and such a constraint isn't yet supported by the optimizer ( MDEV-17081 ). So, while a UNIQUE will still work (it's a new 10.4 feature, in 10.3 it'd be an error), it won't longer be used as an index unless MDEV-17081 is closed.
            otto Otto Kekäläinen added a comment - - edited

            Status for 10.6:

            https://github.com/MariaDB/server/blob/10.6/cmake/character_sets.cmake

              SET(DEFAULT_CHARSET "latin1")
              SET(DEFAULT_COLLATION "latin1_swedish_ci")
            

            https://github.com/MariaDB/server/blob/10.6/debian/additions/mariadb.conf.d/50-server.cnf

            character-set-server  = utf8mb4
            collation-server      = utf8mb4_general_ci
            

            According to MDEV-20416 there is no slow-down in read-only benchmarks.

            otto Otto Kekäläinen added a comment - - edited Status for 10.6: https://github.com/MariaDB/server/blob/10.6/cmake/character_sets.cmake SET(DEFAULT_CHARSET "latin1") SET(DEFAULT_COLLATION "latin1_swedish_ci") https://github.com/MariaDB/server/blob/10.6/debian/additions/mariadb.conf.d/50-server.cnf character-set-server = utf8mb4 collation-server = utf8mb4_general_ci According to MDEV-20416 there is no slow-down in read-only benchmarks.
            XTF Olaf van der Spek added a comment - - edited

            Dear Ian,

            Isn't this the original issue while MDEV-19123 is the duplicate?

            XTF Olaf van der Spek added a comment - - edited Dear Ian, Isn't this the original issue while MDEV-19123 is the duplicate?

            People

              Unassigned Unassigned
              XTF Olaf van der Spek
              Votes:
              5 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.