[CONJ-117] MariaDB prepareStatement does not convert characterSet for where clause Created: 2014-10-17 Updated: 2014-11-10 Resolved: 2014-11-10 |
|
| Status: | Closed |
| Project: | MariaDB Connector/J |
| Component/s: | None |
| Affects Version/s: | 1.1.7 |
| Fix Version/s: | 1.1.8 |
| Type: | Bug | Priority: | Minor |
| Reporter: | Marcel Schneider | Assignee: | Massimo Siani (Inactive) |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Windows 7, 64bit |
||
| Description |
|
I posted a description of this bug here: Additional Infos: |
| Comments |
| Comment by Massimo Siani (Inactive) [ 2014-11-06 ] |
|
As a first fix, just for the latin1 character set, see https://code.launchpad.net/~massimo-siani/mariadb-java-client/CONJ-117 |
| Comment by Vladislav Vaintroub [ 2014-11-07 ] |
|
It is not a bug, thus I do not think it has to be "fixed". It is not possible to store japanese characters in latin1, without losing information, because latin1 does not include any japanese characters. But you probably want it to be some kind of string/text, then use can UTF8 or so on the database side for the column. UTF8 has Japanese. LATIN1 does not have it. Currently, you have mislabeled character data in the database (it is not really latin1 , it is binary). |
| Comment by Marcel Schneider [ 2014-11-07 ] |
|
> It is not possible to store japanese characters in latin1 That's not the intention here. The problem is that I expect the database to convert unknown characters in the where-clause as it does with the insert (a conversion to "?" would be fine). Sometimes you don't have control over where you get a certain string in Java. In my case I got it from a latin1 Oracle database. And it was not a japanese character, it was an undefined character according to latin1 (ASCII 149). To make things even worse: the posted workaround does not work for that case. |
| Comment by Vladislav Vaintroub [ 2014-11-07 ] |
|
Driver does not convert anything anywhere. It does not know what charset was set for what column. It declares that it is UTF8 to the server and uses UTF8 to communicate with the server. Do you see any bug there? |
| Comment by Vladislav Vaintroub [ 2014-11-07 ] |
|
While I agree that "unknown character" behavior should be consistent, it is outside of the driver's responsibility. You may want to file bug against MariaDB server, then. |
| Comment by Marcel Schneider [ 2014-11-07 ] |
|
Ah ok, I was unaware which software layer is supposed to convert the characters. I will file the bug against MariaDB Server then. Just a last question: What did Mr. Siani fix (see first comment)? |
| Comment by Vladislav Vaintroub [ 2014-11-07 ] |
|
MAssimo can answer. To me it looks like emulation of your (commented out as "works, but ugly") hack in your stackoverflow example. |
| Comment by Massimo Siani (Inactive) [ 2014-11-07 ] |
|
Well, despite I agree that this is a user error, I think the driver can handle it differently because of two reasons. The first one is consistency with the insert case, which you may consider a server bug, too. The second one is a kind of consistency with MySQL connector, which does perform some conversion. I need more tests to find a good solution, if any. In the meanwhile, issuing a bug to the MariaDB server may solve the issue. |
| Comment by Vladislav Vaintroub [ 2014-11-07 ] |
|
Massimo, once driver announced its encoding to the server (i.e during connection), after that it is fixed , cannot be changed (for simplicity, lets ignore "set names", because you cannot do a "set names" for each column). Server is aware of driver encoding and cannot to handle a mix, where parts of the query are differently encoded. I would stick to UTF8 generally, because that allows to convert from String to bytes without losing any info. "Illegal mix of collations" is certainly a server deficiency. |
| Comment by Marcel Schneider [ 2014-11-07 ] |
|
Thx a lot, I filed a bug here: https://mariadb.atlassian.net/browse/MDEV-7044 |
| Comment by Elena Stepanova [ 2014-11-07 ] |
|
Could any of the participants please describe in |
| Comment by Vladislav Vaintroub [ 2014-11-07 ] |
|
Elena, I think "Illegal mix of collations" is a problem. create table h( c varchar(10) character set ascii); gives The original problem was involved prepared statements (JDBC client-side), and UTF8 as the client side encoding, and latin1 column on the server side, but the essence of the problem is that if you have a character in the query outside of column encoding range, this ugly "Illegal mix" arises. Why would not the query just return an empty set. There is another part of the bug, Marcel states he was able to insert non-latin1 into latin1 column, but I was not able to reproduce it . |
| Comment by Elena Stepanova [ 2014-11-07 ] |
|
Thanks, I've quoted this in |