[MDEV-8232] CONNECT fails for NVARCHAR Created: 2015-05-26  Updated: 2023-04-27

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.0.17, 10.3, 10.4
Fix Version/s: 10.4

Type: Bug Priority: Major
Reporter: Takuya Aoki (Inactive) Assignee: Andrew Hutchings
Resolution: Unresolved Votes: 1
Labels: connect-engine
Environment:

CentOS release 6.5 (X86_64), PureData Release 7.0 (P-3) , NetezzaSQL ODBCDriver



 Description   

I am trying to make a CONNECT table to a table on an IBM database.
It seems to me no test has been done yet for the IBM server.

The original table includes columns in NVARCHAR(N) for UTF-8 characters.

http://www-01.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_odiagntz_016.htm

I have two problems.

1.
When I try creating a CONNECT table without specifying the column definition,
I get the below error.

Error (Code 1105): Unsupported SQL type -9

It doesn't happen when the table does not have any NVARCHAR(N) columns,
so I guess the CONNECT engine is not familiar with this data type

2.
When I specify the column definition such as VARCHAR(32),
the CONNECT table is created with no error.
But I get ? marks for all UTF-8 characters.

MariaDB [test]> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.06 sec)
 
MariaDB [test]> SELECT c1 FROM CONNECT_TABLE;
+-------+
| c1      |
+-------+
| ???     |
| JCB     |
| ???     |
| ???     |
| ???     |
| ???     |
| ???     |
| ???     |
| UFJ     |
| ???     |
+-------+
10 rows in set (0.00 sec)

I found it is not a problem of the ODBC driver, queries on isql returns the UTF-8 values correctly.
Changing the table's DEFAULT CHARSET or DATA_CHARSET did not work.



 Comments   
Comment by Olivier Bertrand [ 2015-05-28 ]

Indeed types NCHAR or NVARCHAR correspond to SQL_WCHAR (-8) and SQL_WVARCHAR (-9). These types were not recognized by CONNECT. It will be fixed just by handling them as CHAR and VARCHAR with a warning (NOTE) saying that the column contains wide characters.

Remains the issue of UTF8 characters displayed as '?'. It looks as if the column text was translated twice. The first time, UTF8 characters are replaced by, say, latin1 characters that are not recognized the second time and replaced by '?'.

Perhaps this could be avoided by setting character_set_client to latin1. Indeed the character_set_results variable only applies to MYSQL tables but does not for ODBC tables.

Comment by Olivier Bertrand [ 2015-05-30 ]

I don't know about IBM but, according to the Oracle documentation, the NCHAR, NVARCHAR types (N for Native) are character strings supposely containing text in native character set. It seems that when writing and reading, the data source (at least Oracle) automatically does the translation between the internal coding (Unicode as UTF8 or UT16) and the native character set.
Thereby, the table charset and the character_set_client variable should not be set to UTF8 but to the native character set. Note that with Oracle there is no issue of mixed character sets as the default charset is defined for the entire database.
I think that there is probably nothing CONNECT can do in addition to address such problems.

Comment by Takuya Aoki (Inactive) [ 2015-06-02 ]

Hello Olivier,
I understand the translation is happening twice and not working correctly.
(the first time by the CONNECT engine and the second time by the client?)

Do you mean that the table charset is for the first translation and character_set_client for the second?
By the way, I don't understand what native character set stands for.

Comment by Takuya Aoki (Inactive) [ 2015-06-02 ]

Netezza seems to handle all characters in UTF8 and does not do any conversions.

http://www-01.ibm.com/support/knowledgecenter/SSBJG3_2.5.0/com.ibm.gen_busug.doc/c_fgl_odiagntz_016.htm

NCHAR/NVARCHAR data is always stored in UTF-8. The database character defines the encoding for CHAR and VARCHAR columns and is defined when creating the database with the CREATE DATABASE command; the default is latin9. Note that, at the time of writing these lines, Netezza V6 does not yet support a different database character set than latin9.
 
No automatic character set conversion is done by the Netezza software, this means that the application/client character set must match the database character set.

Comment by Elena Stepanova [ 2023-01-22 ]

TheLinuxJedi,
As I understand from the comments, it wasn't fixed, so I'm updating the affected/fix version based on this guess and leaving it to you to decide what to do about this issue.

Generated at Thu Feb 08 07:25:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.