Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22217

Make OS character sets "utf8" and "utf-8" map to MariaDB character set "utf8mb4"

    XMLWordPrintable

Details

    Description

      The OS character sets "utf8" and "utf-8" currently map to the MariaDB character set "utf8". In MariaDB, the "utf8" character set refers to the incomplete 3-byte version of the UTF-8 standard (which has "utf8mb3" as an alias").

      It may be more appropriate if the OS character sets "utf8" and "utf-8" instead mapped to the MariaDB character set "utf8mb4". That way, UTF-8 clients would get access to the full UTF-8 standard by default in MariaDB.

      MySQL 8.0 has already made this change:

      The OS character set is mapped to the closest MySQL character set if there is no exact match. If the client does not support the matching character set, it uses the compiled-in default. For example, utf8 and utf-8 map to utf8mb4, and ucs2 is not supported as a connection character set, so it maps to the compiled-in default.

      https://dev.mysql.com/doc/refman/8.0/en/charset-connection.html

      For example, see here for MariaDB's current behavior:

      geoff@geoff-Razer-Blade-Stealth-13:~$ printenv LANG
      en_US.UTF-8
      geoff@geoff-Razer-Blade-Stealth-13:~$ mariadb --host mydb.skysql.net --user myuser --password --ssl-ca ./skysql_chain.pem
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 3663
      Server version: 10.4.12-6-MariaDB-enterprise-log MariaDB Enterprise Server
       
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      MariaDB [(none)]> SHOW SESSION VARIABLES 
        WHERE Variable_name 
        IN('character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection');
      +--------------------------+-----------------+
      | Variable_name            | Value           |
      +--------------------------+-----------------+
      | character_set_client     | utf8            |
      | character_set_connection | utf8            |
      | character_set_results    | utf8            |
      | collation_connection     | utf8_general_ci |
      +--------------------------+-----------------+
      4 rows in set (0.048 sec)
      

      We can see the relevant mapping in the code here:

      https://github.com/MariaDB/server/blob/mariadb-10.5.2/mysys/charset.c#L1384

      Attachments

        Issue Links

          Activity

            People

              bar Alexander Barkov
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.