[CONJ-374] mascale issue handling utfbmb4 Created: 2016-10-20  Updated: 2016-10-20  Resolved: 2016-10-20

Status: Closed
Project: MariaDB Connector/J
Component/s: Other
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Diego Dupin Assignee: Diego Dupin
Resolution: Duplicate Votes: 0
Labels: None


 Description   

Maxscale actually send latin1 charset to client.
Depending on client (driver) implementation, that may create encoding issues.

test case :
example :

try (Connection connection = DriverManager.getConnection("jdbc:mariadb://192.168.1.154:4006/testj?user=diego&password=diego")) {
            Statement stmt = connection.createStatement();
            stmt.execute("drop table if exists unicodeTestChar");
            stmt.execute("create table unicodeTestChar (id int unsigned, field1 varchar(4) CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci) DEFAULT CHARSET=utf8mb4");
 
            String emoji = "\uD83C\uDF1F"; // 4 bytes character star
 
            try (PreparedStatement ps = connection.prepareStatement("INSERT INTO unicodeTestChar (id, field1) VALUES (1, ?)")) {
                ps.setString(1, emoji);
                ps.execute();
            }
 
            ResultSet rs = stmt.executeQuery("SELECT field1 FROM unicodeTestChar");
            rs.next();
            
            System.out.println("initial : " + emoji);
            System.out.println("stored in DB : " + rs.getString(1));
        }

Using java connector (MySQL or MariaDB) will throw an exception using maxscale : "java.sql.SQLDataException: Incorrect string value: '\xF0\x9F\x8C\x9F' for column 'field1' at row 1
Query is: INSERT INTO unicodeTestChar (id, field1) VALUES (1, ?), parameters ['-star-']"

Issue Description :

Server has to know the charset client is using :
C/C will send the charset defined with character_set_client.
C/J always use utf8 (that permit to have optimization)
PHP pdo seems to work the same way than C/J

So in Initial Handshake Packet server will indicate his default
charset.

Client will then send the encoding it use in Hanshake response packet

C/J will send utf8 (33) or utf8mb4(the value send by server) according to what server send initially.
problem is masxcale always send a value "8" corresponding to latin1
charset.
so C/J will send UTF8(33), even if server is configured to use utf8mb4.

That will cause problems afterwhile, because server check that data are correct according to client charset. when client send a 4 byte utf8 character, server will throw an exception : " Incorrect string value".



 Comments   
Comment by Diego Dupin [ 2016-10-20 ]

issue not created for good project : https://jira.mariadb.org/browse/MXS-953

Generated at Thu Feb 08 03:15:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.