Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32975

Default charset doesn't work with PHP MySQLi extension

Details

    Description

      Some charset change in MariaDB 11.3.1 appears to break PHP clients, raising the error 'Server sent charset (0) unknown to the client'.

      I noticed /etc/mysql/mariadb.conf.d/50-server.cnf changed
      collation-server = utf8mb4_general_ci (11.2.2)
      to
      character-set-collations = utf8mb4=uca1400_ai_ci (11.3.1)

      If I change this back, connections from PHP work again.

      This Plesk user seems to have encountered the same issue: https://talk.plesk.com/threads/500-plesk-exception-database-db-query-failed-sqlstate-hy000-2054-server-sent-charset-0-unknown-to-the-client.372563/

      Steps to reproduce:

      • Install PHP 8.3, including MySQLi extension.
      • Install MariaDB 11.3.1, CREATE USER 'username'@'localhost' IDENTIFIED BY 'password';.
      • Test a PHP page, e.g.

                                                                                            
        <?php
        error_reporting(E_ALL);
        ini_set('display_errors', 'on');
        $conn = new mysqli("localhost", "username", "password");
        if ($conn->connect_error) {
          die("Connection failed: " . $conn->connect_error);
        }
        echo "Connected successfully";
        ?>
        

      • Observe error:

        Fatal error: Uncaught mysqli_sql_exception: Server sent charset (0) unknown to the client. Please, report to the developers in /var/www/html/index.php:4 Stack trace: #0 /var/www/html/index.php(4): mysqli->__construct() #1 {main} thrown in /var/www/html/index.php on line 4
        

      Attachments

        Issue Links

          Activity

            serg Sergei Golubchik added a comment - - edited

            it's coming from https://github.com/php/php-src/blob/642e11140c3a8461ab9a78d6ce3d3a79b0151987/ext/mysqlnd/mysqlnd_commands.c#L619

            and the reason is that the server sends 0 in the server default collation byte of the initial handshake packet for collations that are above 1..255 range.

            But actually the connector is mainly interested in the character set, not the collation. We could try to "gracefully downgrade" the collation and if it's above 255, return another collation of the same character set but with a smaller id.

            serg Sergei Golubchik added a comment - - edited it's coming from https://github.com/php/php-src/blob/642e11140c3a8461ab9a78d6ce3d3a79b0151987/ext/mysqlnd/mysqlnd_commands.c#L619 and the reason is that the server sends 0 in the server default collation byte of the initial handshake packet for collations that are above 1..255 range. But actually the connector is mainly interested in the character set, not the collation. We could try to "gracefully downgrade" the collation and if it's above 255, return another collation of the same character set but with a smaller id.

            There is a difference between how MySQL-4.0 worked comparing to MySQL-4.1 and all further MySQL and MariaDB releases:

            • In MySQL-4.0 the client received the collation ID in the initial handshake packet from the server and set its own localization specific variables accordingly, to make things like strcasemp() work like on the server side
            • Since MySQL-4.1 (which was released in ~2003) it works the other way around: the client sends its desired character set to the server. This collation ID in the handshake packet does not mean anything useful. It was preserved only for 4.0 client compatibility.

            We can gracefully downgrade the collation ID. But the real fix should also be done in the PHP connector - it should not read this collation ID at all.

            bar Alexander Barkov added a comment - There is a difference between how MySQL-4.0 worked comparing to MySQL-4.1 and all further MySQL and MariaDB releases: In MySQL-4.0 the client received the collation ID in the initial handshake packet from the server and set its own localization specific variables accordingly, to make things like strcasemp() work like on the server side Since MySQL-4.1 (which was released in ~2003) it works the other way around: the client sends its desired character set to the server. This collation ID in the handshake packet does not mean anything useful. It was preserved only for 4.0 client compatibility. We can gracefully downgrade the collation ID. But the real fix should also be done in the PHP connector - it should not read this collation ID at all.

            it seems that mysqlnd sets the desired character set if the client specified it, otherwise it sets the server charset from the initial handshake packet: https://github.com/php/php-src/blob/master/ext/mysqlnd/mysqlnd_auth.c#L285-L289

            And if we do a graceful downgrade, we might get it back as the requested character_set_client.
            Which looks like a reasonable behavior here.

            serg Sergei Golubchik added a comment - it seems that mysqlnd sets the desired character set if the client specified it, otherwise it sets the server charset from the initial handshake packet: https://github.com/php/php-src/blob/master/ext/mysqlnd/mysqlnd_auth.c#L285-L289 And if we do a graceful downgrade, we might get it back as the requested character_set_client. Which looks like a reasonable behavior here.

            instead of "graceful downgrade" we can implement MDEV-32966 instead. It has a value on its own, but can also be used here as "graceful downgrade" values.

            serg Sergei Golubchik added a comment - instead of "graceful downgrade" we can implement MDEV-32966 instead. It has a value on its own, but can also be used here as "graceful downgrade" values.
            bar Alexander Barkov added a comment - - edited

            Reproduces as described.

            1. Start the server as follows:

            mariadbd --character-set-server=utf8mb3 --collation-server=utf8mb3_uca1400_ai_ci
            

            2. Run php with this program:

            <?php
            error_reporting(E_ALL);
            ini_set('display_errors', 'on');
            $conn = new mysqli("localhost", "root", "");
            if ($conn->connect_error) {
              die("Connection failed: " . $conn->connect_error);
            }
            echo "Connected successfully; ";
            echo 'Character set: ' . $conn->character_set_name() . "\n";
            ?>
            

            It produces the following output:

            PHP Fatal error:  Uncaught mysqli_sql_exception: Server sent charset (0) unknown to the client. Please, report to the developers in /home/bar/a.php:4
            Stack trace:
            #0 /home/bar/a.php(4): mysqli->__construct()
            #1 {main}
              thrown in /home/bar/a.php on line 4
             
            Fatal error: Uncaught mysqli_sql_exception: Server sent charset (0) unknown to the client. Please, report to the developers in /home/bar/a.php:4
            Stack trace:
            #0 /home/bar/a.php(4): mysqli->__construct()
            #1 {main}
              thrown in /home/bar/a.php on line 4
            

            bar Alexander Barkov added a comment - - edited Reproduces as described. 1. Start the server as follows: mariadbd --character-set-server=utf8mb3 --collation-server=utf8mb3_uca1400_ai_ci 2. Run php with this program: <?php error_reporting (E_ALL); ini_set ( 'display_errors' , 'on' ); $conn = new mysqli( "localhost" , "root" , "" ); if ( $conn ->connect_error) { die ( "Connection failed: " . $conn ->connect_error); } echo "Connected successfully; " ; echo 'Character set: ' . $conn ->character_set_name() . "\n" ; ?> It produces the following output: PHP Fatal error: Uncaught mysqli_sql_exception: Server sent charset (0) unknown to the client. Please, report to the developers in /home/bar/a.php:4 Stack trace: #0 /home/bar/a.php(4): mysqli->__construct() #1 {main} thrown in /home/bar/a.php on line 4   Fatal error: Uncaught mysqli_sql_exception: Server sent charset (0) unknown to the client. Please, report to the developers in /home/bar/a.php:4 Stack trace: #0 /home/bar/a.php(4): mysqli->__construct() #1 {main} thrown in /home/bar/a.php on line 4
            bar Alexander Barkov added a comment - - edited

            The problem with the fact that the collation ID is sent from the server to the client using one byte in the handshake packet is also repeatable with all version.

            I run a 10.4 server as follows:

            r --character-set-server=utf8mb3 --collation-server=utf8mb3_croatian_ci
            

            then run the above PHP program. It produces the following output:

            Connected successfully, Character set: armscii8
            

            Notice, the connection character set was erroneously set to armscii8.

            The problem happens with all collations listed by this statement:

            SELECT
              t1.collation_name, t1.id, t1.id & 0xFF, t2.collation_name,
              t1.character_set_name, t2.character_set_name
            FROM information_schema.collations t1
            LEFT JOIN information_schema.collations t2
            ON (t1.id &0xFF = t2.id)
            WHERE NOT (t1.character_set_name<=>t2.character_set_name);
            

            +---------------------+-----+--------------+----------------------+--------------------+--------------------+
            | collation_name      | id  | t1.id & 0xFF | collation_name       | character_set_name | character_set_name |
            +---------------------+-----+--------------+----------------------+--------------------+--------------------+
            | utf8_myanmar_ci     | 577 |           65 | ascii_bin            | utf8               | ascii              |
            | utf8_thai_520_w2    | 578 |           66 | cp1250_bin           | utf8               | cp1250             |
            | utf8_croatian_ci    | 576 |           64 | armscii8_bin         | utf8               | armscii8           |
            | utf8mb4_croatian_ci | 608 |           96 | cp932_bin            | utf8mb4            | cp932              |
            | utf8mb4_myanmar_ci  | 609 |           97 | eucjpms_japanese_ci  | utf8mb4            | eucjpms            |
            | utf8mb4_thai_520_w2 | 610 |           98 | eucjpms_bin          | utf8mb4            | eucjpms            |
            | utf16_croatian_ci   | 672 |          160 | utf32_unicode_ci     | utf16              | utf32              |
            | utf16_myanmar_ci    | 673 |          161 | utf32_icelandic_ci   | utf16              | utf32              |
            | utf16_thai_520_w2   | 674 |          162 | utf32_latvian_ci     | utf16              | utf32              |
            | utf32_croatian_ci   | 736 |          224 | utf8mb4_unicode_ci   | utf32              | utf8mb4            |
            | utf32_myanmar_ci    | 737 |          225 | utf8mb4_icelandic_ci | utf32              | utf8mb4            |
            | utf32_thai_520_w2   | 738 |          226 | utf8mb4_latvian_ci   | utf32              | utf8mb4            |
            +---------------------+-----+--------------+----------------------+--------------------+--------------------+
            

            Instead of the true collation ID (displayed in the second column), the server only sends its lower 8 bits, which changes not only the collation, but also the character set for the listed collations.

            bar Alexander Barkov added a comment - - edited The problem with the fact that the collation ID is sent from the server to the client using one byte in the handshake packet is also repeatable with all version. I run a 10.4 server as follows: r --character-set-server=utf8mb3 --collation-server=utf8mb3_croatian_ci then run the above PHP program. It produces the following output: Connected successfully, Character set: armscii8 Notice, the connection character set was erroneously set to armscii8 . The problem happens with all collations listed by this statement: SELECT t1.collation_name, t1.id, t1.id & 0xFF, t2.collation_name, t1.character_set_name, t2.character_set_name FROM information_schema.collations t1 LEFT JOIN information_schema.collations t2 ON (t1.id &0xFF = t2.id) WHERE NOT (t1.character_set_name<=>t2.character_set_name); +---------------------+-----+--------------+----------------------+--------------------+--------------------+ | collation_name | id | t1.id & 0xFF | collation_name | character_set_name | character_set_name | +---------------------+-----+--------------+----------------------+--------------------+--------------------+ | utf8_myanmar_ci | 577 | 65 | ascii_bin | utf8 | ascii | | utf8_thai_520_w2 | 578 | 66 | cp1250_bin | utf8 | cp1250 | | utf8_croatian_ci | 576 | 64 | armscii8_bin | utf8 | armscii8 | | utf8mb4_croatian_ci | 608 | 96 | cp932_bin | utf8mb4 | cp932 | | utf8mb4_myanmar_ci | 609 | 97 | eucjpms_japanese_ci | utf8mb4 | eucjpms | | utf8mb4_thai_520_w2 | 610 | 98 | eucjpms_bin | utf8mb4 | eucjpms | | utf16_croatian_ci | 672 | 160 | utf32_unicode_ci | utf16 | utf32 | | utf16_myanmar_ci | 673 | 161 | utf32_icelandic_ci | utf16 | utf32 | | utf16_thai_520_w2 | 674 | 162 | utf32_latvian_ci | utf16 | utf32 | | utf32_croatian_ci | 736 | 224 | utf8mb4_unicode_ci | utf32 | utf8mb4 | | utf32_myanmar_ci | 737 | 225 | utf8mb4_icelandic_ci | utf32 | utf8mb4 | | utf32_thai_520_w2 | 738 | 226 | utf8mb4_latvian_ci | utf32 | utf8mb4 | +---------------------+-----+--------------+----------------------+--------------------+--------------------+ Instead of the true collation ID (displayed in the second column), the server only sends its lower 8 bits, which changes not only the collation, but also the character set for the listed collations.
            bar Alexander Barkov added a comment - Hello serg , Please review a patch: https://github.com/MariaDB/server/commit/7ff9e91a7d2872881526ff80366bf4857be61414 Thanks.
            Quppa David Warner added a comment -

            Thanks for looking into this.

            Is it feasible to add integration tests for popular connectors to catch issues like this in the future?

            Quppa David Warner added a comment - Thanks for looking into this. Is it feasible to add integration tests for popular connectors to catch issues like this in the future?
            serg Sergei Golubchik added a comment - - edited

            We have some, builders at buildbot.mariadb.org with the "eco" in the name:

            • amd64-debian-10-eco-mysqljs
            • amd64-debian-10-eco-pymysql
            • amd64-ubuntu-2004-eco-php

            but they aren't enforced yet, still work in progress.

            serg Sergei Golubchik added a comment - - edited We have some, builders at buildbot.mariadb.org with the "eco" in the name: amd64-debian-10-eco-mysqljs amd64-debian-10-eco-pymysql amd64-ubuntu-2004-eco-php but they aren't enforced yet, still work in progress.

            We should maybe considering fixing / reverting this patch as part of this: https://github.com/MariaDB/server/pull/2775

            TheLinuxJedi Andrew Hutchings (Inactive) added a comment - We should maybe considering fixing / reverting this patch as part of this: https://github.com/MariaDB/server/pull/2775
            danblack Daniel Black added a comment - - edited

            Backout of config default in the Docker Official Image as workaround:
            https://github.com/MariaDB/mariadb-docker/pull/565

            And re-released:

            https://github.com/docker-library/official-images/pull/16293
            (will be up once Docker Library CI finishes (few hrs max)) - there now. Just re-pull if you are depending on this.

            danblack Daniel Black added a comment - - edited Backout of config default in the Docker Official Image as workaround: https://github.com/MariaDB/mariadb-docker/pull/565 And re-released: https://github.com/docker-library/official-images/pull/16293 (will be up once Docker Library CI finishes (few hrs max)) - there now. Just re-pull if you are depending on this.
            danblack Daniel Black added a comment -

            FYI 10.11 compile fix:

            ql_acl.cc.o -MF sql/CMakeFiles/sql.dir/sql_acl.cc.o.d -o sql/CMakeFiles/sql.dir/sql_acl.cc.o -c /home/dan/repos/mariadb-server-10.11/sql/sql_acl.cc
            /home/dan/repos/mariadb-server-10.11/sql/sql_acl.cc:13365:59: error: no member named 'csname' in 'charset_info_st'
             13365 |     CHARSET_INFO *cs= get_charset_by_csname(handshake_cs->csname,
                   |                                             ~~~~~~~~~~~~  ^
            1 error generated.
            

            diff --git a/sql/sql_acl.cc b/sql/sql_acl.cc
            index e7cded79fd1..6885e406548 100644
            --- a/sql/sql_acl.cc
            +++ b/sql/sql_acl.cc
            @@ -13362,7 +13362,7 @@ static bool send_server_handshake_packet(MPVIO_EXT *mpvio,
                   A workaround for a 2-byte collation ID: translate it into
                   the ID of the primary collation of this character set.
                 */
            -    CHARSET_INFO *cs= get_charset_by_csname(handshake_cs->csname,
            +    CHARSET_INFO *cs= get_charset_by_csname(handshake_cs->cs_name.str,
                                                         MY_CS_PRIMARY, MYF(MY_WME));
                 /*
                   cs should not normally be NULL, however it may be possible
            
            

            danblack Daniel Black added a comment - FYI 10.11 compile fix: ql_acl.cc.o -MF sql/CMakeFiles/sql.dir/sql_acl.cc.o.d -o sql/CMakeFiles/sql.dir/sql_acl.cc.o -c /home/dan/repos/mariadb-server-10.11/sql/sql_acl.cc /home/dan/repos/mariadb-server-10.11/sql/sql_acl.cc:13365:59: error: no member named 'csname' in 'charset_info_st' 13365 | CHARSET_INFO *cs= get_charset_by_csname(handshake_cs->csname, | ~~~~~~~~~~~~ ^ 1 error generated. diff --git a/sql/sql_acl.cc b/sql/sql_acl.cc index e7cded79fd1..6885e406548 100644 --- a/sql/sql_acl.cc +++ b/sql/sql_acl.cc @@ -13362,7 +13362,7 @@ static bool send_server_handshake_packet(MPVIO_EXT *mpvio, A workaround for a 2-byte collation ID: translate it into the ID of the primary collation of this character set. */ - CHARSET_INFO *cs= get_charset_by_csname(handshake_cs->csname, + CHARSET_INFO *cs= get_charset_by_csname(handshake_cs->cs_name.str, MY_CS_PRIMARY, MYF(MY_WME)); /* cs should not normally be NULL, however it may be possible
            bar Alexander Barkov added a comment - - edited

            This is expected. A some point the CHARSET_INFO member "const char * csname" was changed to "LEX_CSTRING *cs_name".

            During the merge:

            handshake_cs->csname
            

            should be changed to:

            handshake_cs->cs_name.str
            

            bar Alexander Barkov added a comment - - edited This is expected. A some point the CHARSET_INFO member "const char * csname" was changed to "LEX_CSTRING *cs_name". During the merge: handshake_cs->csname should be changed to: handshake_cs->cs_name.str

            People

              bar Alexander Barkov
              Quppa David Warner
              Votes:
              3 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.