[CONJ-117] MariaDB prepareStatement does not convert characterSet for where clause Created: 2014-10-17  Updated: 2014-11-10  Resolved: 2014-11-10

Status: Closed
Project: MariaDB Connector/J
Component/s: None
Affects Version/s: 1.1.7
Fix Version/s: 1.1.8

Type: Bug Priority: Minor
Reporter: Marcel Schneider Assignee: Massimo Siani (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Windows 7, 64bit



 Description   

I posted a description of this bug here:

http://stackoverflow.com/questions/26380888/mariadb-preparestatement-does-not-convert-characterset-for-where-clause

Additional Infos:
I feel this is a bug and not a feature because the insert works perfectly.
If I use the UF8 character "C2 92" ([-62, -110]) instead of the japanese character, I get the same error AND the posted workaround does not work.
Originally, this problem comes from a data replication between oracle (source) and mariaDB (destination), where it is possible that I get characters which are not assigned in latin1.



 Comments   
Comment by Massimo Siani (Inactive) [ 2014-11-06 ]

As a first fix, just for the latin1 character set, see https://code.launchpad.net/~massimo-siani/mariadb-java-client/CONJ-117

Comment by Vladislav Vaintroub [ 2014-11-07 ]

It is not a bug, thus I do not think it has to be "fixed". It is not possible to store japanese characters in latin1, without losing information, because latin1 does not include any japanese characters.
It is a user error, if you intend to store bytes, the columns should be BINARY (http://dev.mysql.com/doc/refman/5.0/en/binary-varbinary.html) , rather than CHAR, and then you can use PreparedStatement.setBytes and ResultSet.getBytes

But you probably want it to be some kind of string/text, then use can UTF8 or so on the database side for the column. UTF8 has Japanese. LATIN1 does not have it.

Currently, you have mislabeled character data in the database (it is not really latin1 , it is binary).
This driver cant handle mislabeled data well , this was not the intention

Comment by Marcel Schneider [ 2014-11-07 ]

> It is not possible to store japanese characters in latin1

That's not the intention here. The problem is that I expect the database to convert unknown characters in the where-clause as it does with the insert (a conversion to "?" would be fine). Sometimes you don't have control over where you get a certain string in Java. In my case I got it from a latin1 Oracle database. And it was not a japanese character, it was an undefined character according to latin1 (ASCII 149). To make things even worse: the posted workaround does not work for that case.

Comment by Vladislav Vaintroub [ 2014-11-07 ]

Driver does not convert anything anywhere. It does not know what charset was set for what column. It declares that it is UTF8 to the server and uses UTF8 to communicate with the server. Do you see any bug there?

Comment by Vladislav Vaintroub [ 2014-11-07 ]

While I agree that "unknown character" behavior should be consistent, it is outside of the driver's responsibility. You may want to file bug against MariaDB server, then.

Comment by Marcel Schneider [ 2014-11-07 ]

Ah ok, I was unaware which software layer is supposed to convert the characters. I will file the bug against MariaDB Server then. Just a last question: What did Mr. Siani fix (see first comment)?

Comment by Vladislav Vaintroub [ 2014-11-07 ]

MAssimo can answer. To me it looks like emulation of your (commented out as "works, but ugly") hack in your stackoverflow example.
It does not really work, because during connection Mariadb driver says it will use UTF8, so it needs to really use UTF8 when turning string to bytes,
otherwise even trivial prepared statements like "select ?" will break (i.e corrupt input string and return junk back for anything outside of plain ASCII)

Comment by Massimo Siani (Inactive) [ 2014-11-07 ]

Well, despite I agree that this is a user error, I think the driver can handle it differently because of two reasons. The first one is consistency with the insert case, which you may consider a server bug, too. The second one is a kind of consistency with MySQL connector, which does perform some conversion.
Let me elaborate more. I thought that the driver can retrieve the column collation and convert the string accordingly in the where clause, which would solve the two problems (consistency with the insert and with mysql connector) at once, but of course this will introduce some overhead. On the other hand, using the server language (the driver knows it after CONJ-118) would not be a good solution, because the client may modify the connection character set or the column may have a different collation with respect to the default one.

I need more tests to find a good solution, if any. In the meanwhile, issuing a bug to the MariaDB server may solve the issue.

Comment by Vladislav Vaintroub [ 2014-11-07 ]

Massimo, once driver announced its encoding to the server (i.e during connection), after that it is fixed , cannot be changed (for simplicity, lets ignore "set names", because you cannot do a "set names" for each column). Server is aware of driver encoding and cannot to handle a mix, where parts of the query are differently encoded. I would stick to UTF8 generally, because that allows to convert from String to bytes without losing any info. "Illegal mix of collations" is certainly a server deficiency.

Comment by Marcel Schneider [ 2014-11-07 ]

Thx a lot, I filed a bug here: https://mariadb.atlassian.net/browse/MDEV-7044

Comment by Elena Stepanova [ 2014-11-07 ]

Could any of the participants please describe in MDEV-7044 what exactly the server issue is? I've read the comment chain but didn't see any obvious indication of the server bug.

Comment by Vladislav Vaintroub [ 2014-11-07 ]

Elena, I think "Illegal mix of collations" is a problem.
I reproduce it from the command line client , and you should be able too.

create table h( c varchar(10) character set ascii);
select * from h2 where c='ä'; # any non-ascii character will work

gives
ERROR 1267 (HY000): Illegal mix of collations (ascii_general_ci,IMPLICIT) and (cp850_general_ci,COERCIBLE) for operation '='

The original problem was involved prepared statements (JDBC client-side), and UTF8 as the client side encoding, and latin1 column on the server side, but the essence of the problem is that if you have a character in the query outside of column encoding range, this ugly "Illegal mix" arises. Why would not the query just return an empty set.

There is another part of the bug, Marcel states he was able to insert non-latin1 into latin1 column, but I was not able to reproduce it .

Comment by Elena Stepanova [ 2014-11-07 ]

Thanks, I've quoted this in MDEV-7044.

Generated at Thu Feb 08 03:13:17 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.