[MDEV-30824] Binlog contains unsupported statement when using non-default character set Created: 2023-03-09  Updated: 2023-04-21  Resolved: 2023-03-21

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.9.2, 10.10.1, 10.11.1, 11.0.1
Fix Version/s: 10.11.3, 10.5.20, 10.6.13, 10.7.8, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Critical
Reporter: Tingyao Nian Assignee: Alexander Barkov
Resolution: Fixed Votes: 1
Labels: None

Issue Links:
Problem/Incident
causes MDEV-31018 Replica of 10.3, 10.4, <10.5.19 and <... Closed
is caused by MDEV-28769 Assertion `(m_ci->state & 32) || m_wi... Closed

 Description   

A commit https://github.com/MariaDB/server/commit/a923d6f49c1ad6fd3f4d6ec02e444c26e4d1dfa8 merged in 10.9.2+ disabled numeric setting of character_set_* variables with non-default values. However the corresponding binlog functionality also needs to be fixed.

During binlog generation, server writes charactor_set_client information to binlog based on what character set the client is using, which looks like:

/*!\C utf8mb3 *//*!*/;
SET @@session.character_set_client=33,@@session.collation_connection=33,@@session.collation_server=8/*!*/;

https://github.com/MariaDB/server/blob/ce4a289f1c367987977f1a02bbb8d8b8e8e6bb53/sql/log_event_client.cc#L1973

The problem is that it's totally possible for client to use a non-default value, which is written into binlog as it is:

/*!\C utf8mb4 *//*!*/;
SET @@session.character_set_client=224,@@session.collation_connection=224,@@session.collation_server=33/*!*/;

which is now an invalid statement due to commit https://github.com/MariaDB/server/commit/a923d6f49c1ad6fd3f4d6ec02e444c26e4d1dfa8 and will cause error if this binlog is ever fed back to the server:

ERROR 1115 (42000) at line 38: Unknown character set: '224'

In addition, MariaDB Connector/J has a tendency to use 224 as the default character set, which is non-default, therefore this issue is expected to impact many user who connect to database via Connector/J
https://github.com/mariadb-corporation/mariadb-connector-j/blob/master/src/main/java/org/mariadb/jdbc/client/impl/ConnectionHelper.java#L233

  public static byte decideLanguage(InitialHandshakePacket handshake) {
    short serverLanguage = handshake.getDefaultCollation();
    // return current server utf8mb4 collation
    return (byte)
        ((serverLanguage == 45 // utf8mb4_general_ci
                || serverLanguage == 46 // utf8mb4_bin
                || (serverLanguage >= 224 && serverLanguage <= 247))
            ? serverLanguage
            : 224); // UTF8MB4_UNICODE_CI;
  }



 Comments   
Comment by Daniel Lenski [ 2023-03-10 ]

Having discussed this with tingynia, it appears that this bug means that replication using any non-default character set will be broken on 10.9.2+.

Comment by Tingyao Nian [ 2023-03-15 ]

I'm working on a PR to fix the binlog to always use 'string' type.

Comment by Otto Kekäläinen [ 2023-03-18 ]

PR was submitted at https://github.com/MariaDB/server/pull/2557 - no reviews so far.

Comment by Alexander Barkov [ 2023-03-21 ]

Pushed to 10.5 (instead of 10.9 originally proposed in the PR), so 10.9 can read mysqlbinlog dumps created by any version starting from 10.5.

Thanks for your contribution!

Comment by Otto Kekäläinen [ 2023-03-21 ]

Hi!

Why did you push your by yourself to 10.5 instead of asking contributor to
rebase on 10.5 so you can merge the PR?

Comment by Alexander Barkov [ 2023-03-21 ]

Hi otto,

I needed to check something, to decide which version is the best to backport to.
Tried 10.4 first, but it did not work well. With 10.5 it went smoothly.

Btw, I preserved the authorship in the patch:

commit dccbb5a6dba21b241e1796af82a0db85de28d195 (HEAD -> 10.5, origin/bb-10.5-bar-MDEV-30824, origin/HEAD, origin/10.5)
Author: Tingyao Nian <tingynia@amazon.com>
Date:   Wed Mar 15 19:14:01 2023 +0000

Generated at Thu Feb 08 10:19:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.