[MDEV-32966] "default collation" ids for the protocol Created: 2023-12-07  Updated: 2024-01-26  Resolved: 2024-01-26

Status: Closed
Project: MariaDB Server
Component/s: Protocol
Fix Version/s: N/A

Type: New Feature Priority: Critical
Reporter: Sergei Golubchik Assignee: Alexander Barkov
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-33182 Server assertion fails when trying to... Closed
relates to MDEV-32975 Default charset doesn't work with PHP... In Review

 Description   

Connection protocol only has one byte for selecting a collation. This is no longer enough. So connectors are forced to do SET NAMES on every connection.

On the other hand, normally connectors don't care about a collation and only need to set a correct character set, and that most of the time is utf8.

There are 41 unused values in the 0-255 range of collation ids.
They could be used for "default collation" ids, like, for example, "220 = default collation of utf8mb4", etc.

Then a connector will still be able to use one-byte in the protocol to set a character set of its choice and won't need SET NAMES.



 Comments   
Comment by Sergei Golubchik [ 2024-01-15 ]

see the this comment in MDEV-32975.

I think that MDEV-32966 will solve MDEV-32975, but there's also another solution for MDEV-32975 which doesn't include MDEV-32966

Comment by Alexander Barkov [ 2024-01-26 ]

I don't think we need special "default collation" IDs.

After MDEV-33182, it works as follows:

  • a. If the client sent a non-default collation ID, then the server uses this ID for @@collation_connection.
  • b. If the client sent a compiled default collation ID (like 33 == utf8mb3_general_ci), then the server user @@character_set_collations to resolve @@collation_connection.

So the compiled default collation IDs already work as "default collation".

However, the comment added by MDEV-33182 suggests:

      Perhaps eventually we should change (b) also to resolve non-default
      collations according to @@character_set_collations. Clients that used to
      send a non-default collation ID in the handshake packet will have to set
      @@character_set_collations instead.

If we decide to go this way, we'll need something to allow the client to indicate
if it wants its collation ID to be processed by @@character_set_collations, or to be used as is.

Anyway, adding special collations IDs looks like over-engineering for me. I'd prefer some bit flag in the handshake header.

Comment by Sergei Golubchik [ 2024-01-26 ]

This turned out to be already the case, default collations of all character sets always have id < 256, so they can be set in the handshake and don't require SET NAMES

Generated at Thu Feb 08 10:35:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.