Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-953

Charset error when server configued in utf8mb4

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.0.2
    • Component/s: Core
    • Labels:
      None

      Description

      Maxscale actually send latin1 charset to client.
      Depending on client (driver) implementation, that may create encoding issues.

      test case :
      example :

      try (Connection connection = DriverManager.getConnection("jdbc:mariadb://192.168.1.154:4006/testj?user=diego&password=diego")) {
                  Statement stmt = connection.createStatement();
                  stmt.execute("drop table if exists unicodeTestChar");
                  stmt.execute("create table unicodeTestChar (id int unsigned, field1 varchar(4) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci) DEFAULT CHARSET=utf8mb4");
       
                  String emoji = "\uD83C\uDF1F"; // 4 bytes character : star
       
                  try (PreparedStatement ps = connection.prepareStatement("INSERT INTO unicodeTestChar (id, field1) VALUES (1, ?)")) {
                      ps.setString(1, emoji);
                      ps.execute();
                  }
       
                  ResultSet rs = stmt.executeQuery("SELECT field1 FROM unicodeTestChar");
                  rs.next();
                  
                  System.out.println("initial : " + emoji);
                  System.out.println("stored in DB : " + rs.getString(1));
              }
      

      Using java connector (MySQL or MariaDB) will throw an exception using maxscale : "java.sql.SQLDataException: Incorrect string value: '\xF0\x9F\x8C\x9F' for column 'field1' at row 1
      Query is: INSERT INTO unicodeTestChar (id, field1) VALUES (1, ?), parameters ['-star-']"

      Issue Description :

      Server has to know the charset client is using :
      C/C will send the charset defined with character_set_client.
      C/J always use utf8 (that permit to have optimization)
      PHP pdo seems to work the same way than C/J

      So in Initial Handshake Packet server will indicate his default
      charset.

      Client will then send the encoding it use in Hanshake response packet

      C/J will send utf8 (33) or utf8mb4(the value send by server) according to what server send initially.
      problem is masxcale always send a value "8" corresponding to latin1
      charset.
      so C/J will send UTF8(33), even if server is configured to use utf8mb4.

      That will cause problems afterwhile, because server check that data are correct according to client charset. when client send a 4 byte utf8 character, server will throw an exception : " Incorrect string value".

        Attachments

          Activity

            People

            Assignee:
            markus makela markus makela
            Reporter:
            diego dupin Diego Dupin
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration