Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22217

Make OS character sets "utf8" and "utf-8" map to MariaDB character set "utf8mb4"



    • Task
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • None
    • Character Sets
    • None


      The OS character sets "utf8" and "utf-8" currently map to the MariaDB character set "utf8". In MariaDB, the "utf8" character set refers to the incomplete 3-byte version of the UTF-8 standard (which has "utf8mb3" as an alias").

      It may be more appropriate if the OS character sets "utf8" and "utf-8" instead mapped to the MariaDB character set "utf8mb4". That way, UTF-8 clients would get access to the full UTF-8 standard by default in MariaDB.

      MySQL 8.0 has already made this change:

      The OS character set is mapped to the closest MySQL character set if there is no exact match. If the client does not support the matching character set, it uses the compiled-in default. For example, utf8 and utf-8 map to utf8mb4, and ucs2 is not supported as a connection character set, so it maps to the compiled-in default.


      For example, see here for MariaDB's current behavior:

      geoff@geoff-Razer-Blade-Stealth-13:~$ printenv LANG
      geoff@geoff-Razer-Blade-Stealth-13:~$ mariadb --host mydb.skysql.net --user myuser --password --ssl-ca ./skysql_chain.pem
      Welcome to the MariaDB monitor.  Commands end with ; or \g.
      Your MariaDB connection id is 3663
      Server version: 10.4.12-6-MariaDB-enterprise-log MariaDB Enterprise Server
      Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
      MariaDB [(none)]> SHOW SESSION VARIABLES 
        WHERE Variable_name 
        IN('character_set_client', 'character_set_connection', 'character_set_results', 'collation_connection');
      | Variable_name            | Value           |
      | character_set_client     | utf8            |
      | character_set_connection | utf8            |
      | character_set_results    | utf8            |
      | collation_connection     | utf8_general_ci |
      4 rows in set (0.048 sec)

      We can see the relevant mapping in the code here:



        Issue Links



              ralf.gebhardt Ralf Gebhardt
              GeoffMontee Geoff Montee (Inactive)
              0 Vote for this issue
              2 Start watching this issue



                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.