[ODBC-200] Converts wide characters to current ACP on Windows Created: 2018-11-12 Updated: 2018-11-12 Resolved: 2018-11-12 |
|
| Status: | Closed |
| Project: | MariaDB Connector/ODBC |
| Component/s: | General |
| Affects Version/s: | 3.0.6 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Henrik Holst | Assignee: | Lawrin Novitsky |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Environment: |
Windows |
||
| Description |
|
In ma_platform_win32.c in the MADB_ConvertFromWChar() function WideCharToMultiByte() is called with cc->CodePage instead of CP_UTF8 which makes WideCharToMultiByte() to convert the Wide Unicode data to whatever codepage the ODBC driver is executed on (unsure if this is copied from the calling application or if the driver keeps the system code page) instead of UTF-8 which is the most likely character set that the MariaDB server is expecting. It looks like the developer thought that the CodePage argument to WideCharToMultiByte() was the codepage of the source when it really is the destination (i.e which code page the resulting multibyte data should be in). I noticed this recently when trying out ODBC with various sql servers and MySQL 8.0, PostgreSQL 11 and MSSQL 2017 all worked like intended (i.e use the Unicode functions of ODBC to send all possible Unicode characters) but MariaDB borked when it converted the wchar_t data to cp-1252 (in my case) and then sending it to the MariaDB server that exepected to receive UTF-8 data. |
| Comments |
| Comment by Lawrin Novitsky [ 2018-11-12 ] |
|
Thank you for your report! As far as I can see there is no bug here. Apparently you used ANSI part of ODBC API to connect, i.e. SQLDriverConnect and not SQLDriverConnectW. And thus cp-1252 is used - that has to be default ANSI charset or your system or you have set it as the connection charset in your DSN/connection string. Otherwise connector would pick up utf8 as default charset and it would be used in the function you mentioned in WideCharToMultiByte call. I have just tested that this works fine(default utf8 charset if Unicode API is used) If you disagree, of verify that your application use Unicode part of API - please re-open this issue, and we will loos deeper in your problem |
| Comment by Henrik Holst [ 2018-11-12 ] |
|
Interesting, calling SQLExecDirectW() after SQLDriverConnect() was what caused this just as you thought. Worked fine with the other sql servers which was what somewhat hid the real issue I guess. I just switched to SQLDriverConnectW() as well and it worked just fine. Cannot really figure out how CodePage is nulled in SQLDriverConnectW() since it looks like it's always set to "cc->CodePage= GetACP();" regardless of entry point but since it works I must just have missed some logic |
| Comment by Lawrin Novitsky [ 2018-11-12 ] |
|
If you used SQLDriverConnect(), Driver manager set SQL_ATTR_ANSI_APP connetion attribute. And that is the only way for the Unicode driver to know what part of the API the application uses, since for Unicode Driver DM translates all ANSI API calls to Unicode calls. |
| Comment by Henrik Holst [ 2018-11-12 ] |
|
Yes I made the mistake of seeing the ANSI and Unicode functions as just separate functions and not as two independent sets of API:s. Why I mix them is because I'm not a Unicode (in the ODBC sense) application at all but a native utf-8 application which processes data and input in utf-8. For Mysql/MariaDB/PostgreSQL we use the native connectors and there utf-8 is of course no problem but we got a customer request to add ODBC support so that they could connect to MSSQL from Linux. So instead of having to convert every possible string to wchar_t I used the ANSI functions for everything other than to execute the SQL queries which worked for MSSQL on both Linux and Windows. Being thorough though I made sure to test it against all other SQL servers to catch any hidden bugs (and this is of course a golden example of why one should do just that) and that is when i stumbled upon this issue. So once again thanks for helping me create a more compliant application! |
| Comment by Lawrin Novitsky [ 2018-11-12 ] |
|
You can use ANSI functions, and set connection charset to utf8. That should work. And on Linux that is absolutely ok to do - UnixODBC does not do that mapping that DM in Windows does. On Windows you will have some overhead. First DM converts strings to utf16, and then connector convert them to utf8. As a matter of fact, it will convert them to system's default ANSI charset, since, as I said, DM is using it as source charset when converts to utf16. While application should pass utf8 strings. This way at the end connector gets original utf8 and sends over to the server. A bit tricky |
| Comment by Henrik Holst [ 2018-11-12 ] |
|
Yes but playing around with various sql servers I noticed that they set the connection charset in different ways and I didn't want it to be left up to the end user to figure out. On unixODBC and MSSQL I couldn't get this to work at all (the MSSQL driver for unixODBC only implements the Unicode functions so not sure how unixODBC translated the data to the MSSQL driver considering that I'm on a utf-8 locale in Linux but the inserted data in the nvarchar:s in the mssql was still in utf-8 format and not the ucs-2 that it should be), but then there might have been a setting in odbcinst.ini for this that I missed (but avoiding such settings as much as possible was also a goal). For MySQL/MariaDB and PostgreSQL however it was quite easy to set the charset to utf-8 and just use the ansi version of ODBC and the utf-8 data would be pumped directly to the server unaltered by the DM but there it was just differences on how to set the connection charset and the fact that we had to support mssql anyway (since that was the main purpose of the odbc driver on our end). However using the Unicode connect and the Unicode execute functions solves it, the slight overhead of translating utf-8 to ucs-2 is way less than the time it takes for the sql server to perform my inserts/updates anyway (we are write intensive and perform almost no reads) so I've seen no direct performance penalty and for MariaDB we will always recommend the native driver anyway. |