Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25829

Change default collation to utf8mb4_1400_ai_ci

    XMLWordPrintable

Details

    Description

      Since MySQL 8.0, they changed the default collation to `utf8mb4_0900_ai_ci`:

      > MySQL includes character set support that enables you to store data using a variety of character sets and perform comparisons according to a variety of collations. The default MySQL server character set and collation are utf8mb4 and utf8mb4_0900_ai_ci, but you can specify character sets at the server, database, table, column, and string literal levels.

      https://dev.mysql.com/doc/refman/8.0/en/charset.html

      Frankly, if anything, this change came way too late. I've personally seen hundreds of development hours be wasted on MySQL databases created with wrong (default) collations and only through bug reports ('my characters get mangled') was it found out and fixed. Over. And over. Again. Because every dev that did 'create table whatever (...)' would forget to set the collation and first (back in the day) get a Swedish (why??) collation and then by mistake switch to the (crippled/broken) utf8, thinking it solved it... And then finally got wise and changed to utf8mb4, just to have the whole story start over on the next table someone created.

      I love MySQL, but man this part was a royal mess-up and it cost the world hundreds of millions if not billions of wasted dev hours. I mean this hit everyone running a MySQL server. And now with those emoji I mean I don't even dare to guess at the number of $ wasted.

      So imagine my surprise when I read

      > In MariaDB, the default character set is latin1, and the default collation is latin1_swedish_ci

      https://mariadb.com/kb/en/setting-character-sets-and-collations/

      OMG, please say it isn't so!

      Please! Please change this. Like, NOW. This is costing sooooo much dev hours. Soooo much billions of dollars wasted! Devs are stupid ok? They don't understand character sets and collations ok? Never have, never will. This whole Unicode thing has been a personal interest for me for over a decade and I still don't fully grasp it. But one thing I do know for sure and that is: `latin1_swedish_ci` makes NO SENSE whatsoever, to anyone. This is not a good setting. For no one. Not even for Swedish people. Because it fits only 255 characters! I mean really? Do databases still get created for which this default actually makes any sense?

      Please fix it. Please! For love of the world. For peace. To end poverty. Out of good citizenship. Because it's just a few lines in the default server config. For the lulz. For whatever reason, but just do it. Please!

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              StijnDeWitt Stijn de Witt
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.