[CONJ-417] Support characterEncoding and collation in jdbc Created: 2017-01-26  Updated: 2020-10-27  Resolved: 2019-10-01

Status: Closed
Project: MariaDB Connector/J
Component/s: Other
Affects Version/s: 1.5.7
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: paladox Assignee: Diego Dupin
Resolution: Won't Do Votes: 1
Labels: None
Environment:

gerrit


Issue Links:
Relates
relates to CONJ-836 Single Update with nested functions w... Closed

 Description   

Hi, I'm wonder could we implement support for characterEncoding and collation settings in jdbc please? Mysql version has it but mariadbs doesn't. But could we make this better then mysql's by allowing setting utf8mb4 in characterEncoding. This will allow enterprise users to be able to set a connection to use a specific character encoding and charset without needing to change the db.

But could we support it as in allowing it in jdbc like for example

jdbc:mariadb://10.68.23.211/reviewdb?useUnicode=true&characterSetResults=utf8&characterEncoding=utf8mb4&connectionCollation=utf8mb4_unicode_ci

could we also support executeQuery in there so we could do SET NAME utf8mb4 in jdbc.



 Comments   
Comment by Diego Dupin [ 2017-01-27 ]

hi paladox.

This need more documentation, but it's actually already possible :
Option "sessionVariables" permit to set session variable during connection:
for example :
"jdbc:mariadb://localhost/db?user=xx&password=yy&sessionVariables=character_set_client=utf8mb4,character_set_results=utf8mb4,character_set_connection=utf8mb4,collation_connection=utf8_spanish_ci"

this will do the equivalent of SET NAME utf8mb4 for the connection.

Comment by Vladislav Vaintroub [ 2017-01-27 ]

paladox why asking for "characterEncoding" support, if the only thing you'd want is utf8mb4, and special language rules?
maybe you only could ask for collation support, and then it would be up to the JDBC driver to check if your collation contains mb4 or not, and use castrated UTF8, or the full one.

diegos example is correct, but a little wordy, I think one could derive mb4 from collation name

And this useUnicode in MySQL driver is a not an example of how it must be done, right ?

Comment by Björn Raupach [ 2019-02-22 ]

Hi there,

sorry for raising the dead but I found this issue while browsing SO. With emojis becoming so prevalent people actually want to store their smiley faces and whatnot in their apps, which means in their databases. For MySQL server (not sure about MariaDB) this needs utf8mb4 in the server and in the client.

Is it possible to either document this better or add a property to the jdbc connection url?

Comment by Diego Dupin [ 2019-02-22 ]

Actual implementation depend on server default collation (@@character_set_server) :
if default collation is utf8_* (like "utf8_general_ci") server will use 3 bytes utf8 collation.
if not, connector use utf8mb4 (4 bytes utf8)

Implementation is misleading.
(In reality, java connector always send "real" utf8 to server, but server will truncate 4 bytes encoding characters)
Always setting utf8mb4 for communication between client and server would really simplify this.

Comment by Diego Dupin [ 2019-10-01 ]

Since 2.2.4, Connection encoding always uses utf8mb4, but for PRE 5.1 version that doesn't know utf8mb4. I think this solves any ambiguity on that.

Having different collation implementation can still make sense, but only to avoid any conversion for database encoded in other collation than utf8, but who would do that intentionnally now?

Generated at Thu Feb 08 03:15:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.