[MDEV-32975] Default charset doesn't work with PHP MySQLi extension Created: 2023-12-08  Updated: 2024-01-26

Status: In Review
Project: MariaDB Server
Component/s: Character Sets
Affects Version/s: 11.3.1
Fix Version/s: 10.4, 10.5, 10.6, 10.11, 11.0, 11.1, 11.2

Type: Bug Priority: Critical
Reporter: David Warner Assignee: Sergei Golubchik
Resolution: Unresolved Votes: 0
Labels: None
Environment:

Ubuntu 22.04 LTS
Kernel 5.15.0-91-generic
PHP 8.3.0


Issue Links:
Problem/Incident
is caused by MDEV-32336 deb: 50-server.cnf:collation-server ... Closed
Relates
relates to MDEV-32966 "default collation" ids for the protocol Closed
relates to MDEV-33182 Server assertion fails when trying to... Closed

 Description   

Some charset change in MariaDB 11.3.1 appears to break PHP clients, raising the error 'Server sent charset (0) unknown to the client'.

I noticed /etc/mysql/mariadb.conf.d/50-server.cnf changed
collation-server = utf8mb4_general_ci (11.2.2)
to
character-set-collations = utf8mb4=uca1400_ai_ci (11.3.1)

If I change this back, connections from PHP work again.

This Plesk user seems to have encountered the same issue: https://talk.plesk.com/threads/500-plesk-exception-database-db-query-failed-sqlstate-hy000-2054-server-sent-charset-0-unknown-to-the-client.372563/

Steps to reproduce:

  • Install PHP 8.3, including MySQLi extension.
  • Install MariaDB 11.3.1, CREATE USER 'username'@'localhost' IDENTIFIED BY 'password';.
  • Test a PHP page, e.g.

                                                                                        
    <?php
    error_reporting(E_ALL);
    ini_set('display_errors', 'on');
    $conn = new mysqli("localhost", "username", "password");
    if ($conn->connect_error) {
      die("Connection failed: " . $conn->connect_error);
    }
    echo "Connected successfully";
    ?>
    

  • Observe error:

    Fatal error: Uncaught mysqli_sql_exception: Server sent charset (0) unknown to the client. Please, report to the developers in /var/www/html/index.php:4 Stack trace: #0 /var/www/html/index.php(4): mysqli->__construct() #1 {main} thrown in /var/www/html/index.php on line 4
    



 Comments   
Comment by Sergei Golubchik [ 2023-12-11 ]

it's coming from https://github.com/php/php-src/blob/642e11140c3a8461ab9a78d6ce3d3a79b0151987/ext/mysqlnd/mysqlnd_commands.c#L619

and the reason is that the server sends 0 in the server default collation byte of the initial handshake packet for collations that are above 1..255 range.

But actually the connector is mainly interested in the character set, not the collation. We could try to "gracefully downgrade" the collation and if it's above 255, return another collation of the same character set but with a smaller id.

Comment by Alexander Barkov [ 2023-12-12 ]

There is a difference between how MySQL-4.0 worked comparing to MySQL-4.1 and all further MySQL and MariaDB releases:

  • In MySQL-4.0 the client received the collation ID in the initial handshake packet from the server and set its own localization specific variables accordingly, to make things like strcasemp() work like on the server side
  • Since MySQL-4.1 (which was released in ~2003) it works the other way around: the client sends its desired character set to the server. This collation ID in the handshake packet does not mean anything useful. It was preserved only for 4.0 client compatibility.

We can gracefully downgrade the collation ID. But the real fix should also be done in the PHP connector - it should not read this collation ID at all.

Comment by Sergei Golubchik [ 2023-12-12 ]

it seems that mysqlnd sets the desired character set if the client specified it, otherwise it sets the server charset from the initial handshake packet: https://github.com/php/php-src/blob/master/ext/mysqlnd/mysqlnd_auth.c#L285-L289

And if we do a graceful downgrade, we might get it back as the requested character_set_client.
Which looks like a reasonable behavior here.

Comment by Sergei Golubchik [ 2023-12-12 ]

instead of "graceful downgrade" we can implement MDEV-32966 instead. It has a value on its own, but can also be used here as "graceful downgrade" values.

Comment by Alexander Barkov [ 2024-01-26 ]

Reproduces as described.

1. Start the server as follows:

mariadbd --character-set-server=utf8mb3 --collation-server=utf8mb3_uca1400_ai_ci

2. Run php with this program:

<?php
error_reporting(E_ALL);
ini_set('display_errors', 'on');
$conn = new mysqli("localhost", "root", "");
if ($conn->connect_error) {
  die("Connection failed: " . $conn->connect_error);
}
echo "Connected successfully; ";
echo 'Character set: ' . $conn->character_set_name() . "\n";
?>

It produces the following output:

PHP Fatal error:  Uncaught mysqli_sql_exception: Server sent charset (0) unknown to the client. Please, report to the developers in /home/bar/a.php:4
Stack trace:
#0 /home/bar/a.php(4): mysqli->__construct()
#1 {main}
  thrown in /home/bar/a.php on line 4
 
Fatal error: Uncaught mysqli_sql_exception: Server sent charset (0) unknown to the client. Please, report to the developers in /home/bar/a.php:4
Stack trace:
#0 /home/bar/a.php(4): mysqli->__construct()
#1 {main}
  thrown in /home/bar/a.php on line 4

Comment by Alexander Barkov [ 2024-01-26 ]

The problem with the fact that the collation ID is sent from the server to the client using one byte in the handshake packet is also repeatable with all version.

I run a 10.4 server as follows:

r --character-set-server=utf8mb3 --collation-server=utf8mb3_croatian_ci

then run the above PHP program. It produces the following output:

Connected successfully, Character set: armscii8

Notice, the connection character set was erroneously set to armscii8.

The problem happens with all collations listed by this statement:

SELECT
  t1.collation_name, t1.id, t1.id & 0xFF, t2.collation_name,
  t1.character_set_name, t2.character_set_name
FROM information_schema.collations t1
LEFT JOIN information_schema.collations t2
ON (t1.id &0xFF = t2.id)
WHERE NOT (t1.character_set_name<=>t2.character_set_name);

+---------------------+-----+--------------+----------------------+--------------------+--------------------+
| collation_name      | id  | t1.id & 0xFF | collation_name       | character_set_name | character_set_name |
+---------------------+-----+--------------+----------------------+--------------------+--------------------+
| utf8_myanmar_ci     | 577 |           65 | ascii_bin            | utf8               | ascii              |
| utf8_thai_520_w2    | 578 |           66 | cp1250_bin           | utf8               | cp1250             |
| utf8_croatian_ci    | 576 |           64 | armscii8_bin         | utf8               | armscii8           |
| utf8mb4_croatian_ci | 608 |           96 | cp932_bin            | utf8mb4            | cp932              |
| utf8mb4_myanmar_ci  | 609 |           97 | eucjpms_japanese_ci  | utf8mb4            | eucjpms            |
| utf8mb4_thai_520_w2 | 610 |           98 | eucjpms_bin          | utf8mb4            | eucjpms            |
| utf16_croatian_ci   | 672 |          160 | utf32_unicode_ci     | utf16              | utf32              |
| utf16_myanmar_ci    | 673 |          161 | utf32_icelandic_ci   | utf16              | utf32              |
| utf16_thai_520_w2   | 674 |          162 | utf32_latvian_ci     | utf16              | utf32              |
| utf32_croatian_ci   | 736 |          224 | utf8mb4_unicode_ci   | utf32              | utf8mb4            |
| utf32_myanmar_ci    | 737 |          225 | utf8mb4_icelandic_ci | utf32              | utf8mb4            |
| utf32_thai_520_w2   | 738 |          226 | utf8mb4_latvian_ci   | utf32              | utf8mb4            |
+---------------------+-----+--------------+----------------------+--------------------+--------------------+

Instead of the true collation ID (displayed in the second column), the server only sends its lower 8 bits, which changes not only the collation, but also the character set for the listed collations.

Comment by Alexander Barkov [ 2024-01-26 ]

Hello serg,

Please review a patch:

https://github.com/MariaDB/server/commit/7ff9e91a7d2872881526ff80366bf4857be61414

Thanks.

Comment by David Warner [ 2024-01-26 ]

Thanks for looking into this.

Is it feasible to add integration tests for popular connectors to catch issues like this in the future?

Comment by Sergei Golubchik [ 2024-01-26 ]

We have some, builders at buildbot.mariadb.org with the "eco" in the name:

  • amd64-debian-10-eco-mysqljs
  • amd64-debian-10-eco-pymysql
  • amd64-ubuntu-2004-eco-php

but they aren't enforced yet, still work in progress.

Generated at Thu Feb 08 10:35:26 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.