[MDEV-17501] Connect-Storage engine: character set failure Created: 2018-10-19  Updated: 2018-12-08  Resolved: 2018-12-08

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.3.6, 10.3.7, 10.3.8, 10.3.9
Fix Version/s: 10.3.12

Type: Bug Priority: Major
Reporter: Karsten Budde Assignee: Olivier Bertrand
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

CentOS7, Oracle 11g, Character Set Oracle WE8ISO8859P1, Character Set MariaDB utf8mb4



 Description   

Hello,

in MariaDB 10.2.11 I create a connect-table from mariadb to orcacle with the following statement:

CREATE TABLE `hzEdatValues` (
  `EDAT_DEF_ID` int(10) unsigned NOT NULL,
  `VALUE` varchar(40) NOT NULL,
  `TEXT_K` varchar(512) NOT NULL,
  `TEXT_E` varchar(4000) DEFAULT NULL,
  `CODES` varchar(4000) DEFAULT NULL,
  `SORT` int(10) unsigned NOT NULL,
  `SP_KZ` varchar(1) NOT NULL,
  `SP_TEXT` varchar(255) DEFAULT NULL,
  `NEU_DATUM` datetime NOT NULL,
  `NEU_USR_ID` int(10) unsigned NOT NULL,
  `AEND_DATUM` datetime NOT NULL,
  `AEND_USR_ID` int(10) unsigned NOT NULL
) ENGINE=CONNECT DEFAULT CHARSET=utf8mb4 CONNECTION='DSN=HHZ1' `TABLE_TYPE`='ODBC' `DATA_CHARSET`=latin1 `tabname`='CHZ.HZ_A_EDAT_VALUES'

In oracle the character set is WE8ISO8859P1 and in mariadb utf8mb4.
Characters as 'ä', 'Ü' was shown correct in the connect-table.

Then we update mariadb to 10.3.6. Now the characters as 'ä' or 'Ü' are shown wrong in the connect-table: 'ä' -> 'a' and 'Ü' -> 'U'.

Now we use mariadb 10.3.9 and the characters are still wrong.

Can you help me?



 Comments   
Comment by Olivier Bertrand [ 2018-10-22 ]

Do you mean that these characters were shown correct with MariaDB 10.2 (or 1 or 0) and not with the 10.3 version?

Comment by Karsten Budde [ 2018-10-22 ]

Yes, with MariaDB 10.2 these characters were shown correct.

Comment by Alexander Barkov [ 2018-10-29 ]

Karsten, can you please try the following:

  • In Oracle:

    DROP TABLE t1;
    CREATE TABLE t1 (a VARCHAR(4000));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C4')));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C5')));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C6')));
    SELECT a, RAWTOHEX(a) FROM t1;
    

  • In MariaDB:

    DROP TABLE IF EXISTS t1;
    CREATE TABLE t1 (
      a VARCHAR(4000)
    ) ENGINE=CONNECT DEFAULT CHARSET=utf8mb4 CONNECTION='DSN=HHZ1' TABLE_TYPE='ODBC' DATA_CHARSET=latin1 tabname='CHZ.T1';
    SELECT a, HEX(a) FROM t1;
    

and paste output of the above two scripts.

Thanks.

Comment by Olivier Bertrand [ 2018-10-29 ]

I did that and the SELECT command replied:

1296: Got error 122 'Fetching: [Oracle][ODBC][ORA]ORA-29275: partial multibyte character' from CONNECT

Comment by Alexander Barkov [ 2018-10-29 ]

Olivier, I think your database uses UTF8 rather than latin1.

You can check using this query:

SELECT * FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER LIKE 'NLS_CHARACTERSET';

The above script (namely, the INSERT queries) assumes that the Oracle database is configured to use Latin1.

Comment by Karsten Budde [ 2018-10-29 ]

Hello,
here are the outputs:

  • in SQLPlus/Oracle:

    SQL> @[%NLS_LANG%]
    SP2-0310: Datei "[GERMAN_GERMANY.WE8ISO8859P1]" konnte nicht ge÷ffnet werden
    SQL> SELECT a, RAWTOHEX(a) FROM t1;

    A
    --------------------------------------------------------------------------------
    RAWTOHEX(A)
    --------------------------------------------------------------------------------

    C4


    C5

    ã
    C6

    SQL>

  • in DBeaver/Oracle:

    A |RAWTOHEX(A)
    ------|-----------
    Ä |C4
    Å |C5
    Æ |C6

  • in mysql/MariaDB:

    MariaDB [(none)]> SELECT a, HEX(a) FROM test.t1;
    ------------+

    a HEX(a)

    ------------+

    A 41
    ? 3F
    ? 3F

    ------------+
    3 rows in set (0.086 sec)

    MariaDB [(none)]>

  • in DBeaver/MariaDB:

    a |HEX(a)
    ------|------
    A |41
    ? |3F
    ? |3F

In DBeaver the client-charset is utf-8.

regards,
Karsten

Comment by Olivier Bertrand [ 2018-10-29 ]

It replies:

PARAMETER VALUE
NLS_CHARACTERSET AL32UTF8

What can I do to use latin1?

Comment by Alexander Barkov [ 2018-10-30 ]

Karsten,

This can be possible if you don't have the NLS_LANG environment variable set properly on the machine running mysqld.

Please make sure that NLS_LANG is set to, for example:

  • AMERICAN_AMERICA.WE8ISO8859P1
  • GERMAN_GERMANY.WE8ISO8859P1
  • AMERICAN_AMERICA.WE8MSWIN1252
  • GERMAN_GERMANY.WE8MSWIN1252

so mysqld sees this environment variable at start up time.

Please make sure to restart mysqld after changing this variable.

Note, using WE8MSWIN1252 (instead of WE8ISO8859P1) is preferrable, because MariaDB's latin1 is in fact cp1252 rather than iso-8859-1.

Comment by Alexander Barkov [ 2018-10-30 ]

Olivier, I'm afraid there is no a quick way to change the database character set in Oracle.

There are some articles on the Internet how to do so, e.g.:
https://easyoradba.com/2010/07/02/change-oracle-database-character-set-nls_characterset/

Comment by Karsten Budde [ 2018-10-30 ]

Alexander,

I will contact our server administrator for checking and changing the environment variable NLS_LANG. Then I will post the output of the selects here.

Comment by Alexander Barkov [ 2018-10-30 ]

Karsten, thanks! We'll be waiting for your feedback.

Btw, thanks for your previous feedback. I forgot to mention:
the fact that both "in mysql/MariaDB:" and "in DBeaver/MariaDB"
strip diaeresis accent (umlaut) from "A" made me think that something
might be wrong with the mysqld side NLS_LANG.

In my case I have NLS_LANG set to AMERICAN_AMERICA.AL32UTF8.
But when I unset NLS_LANG and restarted the server, I got exactly the same results with you.
It seems to fallback to pure ASCII when NLS_LANG is not set.

Oracle documentations says:
If the Oracle Installer does not populate NLS_LANG , and it is not otherwise set then its value by default is A MERICAN_AMERICA.US7ASCII .

Comment by Olivier Bertrand [ 2018-10-30 ]

On my machine NLS_LANG was not set.
In Oracle I cannot change the character set.

command number message
ALTER DATABASE CHARACTER SET WE8MSWIN1252 -1 Remote SQLExecDirect: [Oracle][ODBC][Ora]ORA-12712: new character set must be a superset of old character set
Comment by Karsten Budde [ 2018-11-07 ]

Alexander,

our server administrator has set the following environment variables
LANG=de_DE.iso885915@euro
NLS_LANG=GERMAN_GERMANY.iso885915@euro
and restart mariadb.

I am sorry, but the output of the select in mariadb does not change:

MariaDB [(none)]> select * from test.t1;
------

a

------

A
?
?

------
3 rows in set (0.087 sec)

Comment by Alexander Barkov [ 2018-11-07 ]

It should be something like this:

NLS_LANG=GERMAN_GERMANY.WE8MSWIN1252

Or some other character set known to Oracle:
https://docs.oracle.com/goldengate/1212/gg-winux/GWUAD/wu_charsets.htm#GWUAD733

iso885915@euro is a Posix locale character set name, it's not an Oracle character set name.

Comment by Karsten Budde [ 2018-11-07 ]

Now, the server administrator reset NLS_LANG, but the output does not change.

[budde@server ~]$ set | grep LANG
LANG=de_DE.iso885915@euro
NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1
[budde@server ~]$ mysql --user budde --password
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 66
Server version: 10.3.9-MariaDB MariaDB Server

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show create table test.t1;
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Table Create Table

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

t1 CREATE TABLE `t1` (
`a` varchar(4000) DEFAULT NULL
) ENGINE=CONNECT DEFAULT CHARSET=utf8mb4 CONNECTION='DSN=HZO1' `TABLE_TYPE`='ODBC' `DATA_CHARSET`=latin1 `tabname`='BUDDE.T1'

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.000 sec)

MariaDB [(none)]> select a, hex(a) from test.t1;
------------+

a hex(a)

------------+

A 41
? 3F
? 3F

------------+
3 rows in set (0.070 sec)

Comment by Alexander Barkov [ 2018-11-07 ]

Karsten, can you please try the following:

In Oracle:

DROP VIEW v100;
CREATE VIEW v100 AS SELECT * FROM V$SESSION_CONNECT_INFO WHERE SID=sys_context('userenv','sid');

In MariaDB

DROP TABLE IF EXISTS t1;
CREATE TABLE t1
(
  SID DECIMAL(36),
  AUTHENTICATION_TYPE VARCHAR(255),
  OSUSER VARCHAR(255),
  CLIENT_CHARSET VARCHAR(255),
  CLIENT_DRIVER VARCHAR(255),
  CLIENT_OCI_LIBRARY VARCHAR(255)
)
ENGINE=CONNECT TABLE_TYPE=ODBC
CONNECTION='DSN=dsn;UID=user;PWD=password' TABNAME='v100';
SELECT * FROM t1;

What does it return?

(please make sure to change the CONNECTION parameter according to your installation).

Comment by Karsten Budde [ 2018-11-07 ]

Alexander,

here the output:
SID |AUTHENTICATION_TYPE|OSUSER|CLIENT_CHARSET|CLIENT_DRIVER|CLIENT_OCI_LIBRARY
-----|-----------------|----|------------|-----------|------------------
105 |DATABASE |mysql |UTF16 |ODBCCLNT |Full Instant Clien
105 |DATABASE |mysql |UTF16 |ODBCCLNT |Full Instant Clien
105 |DATABASE |mysql |UTF16 |ODBCCLNT |Full Instant Clien

Comment by Alexander Barkov [ 2018-11-07 ]

On my Fedora box, I have about the same output, but NULL in the CLIENT_DRIVER.
It seems we're using different drivers.

Can you please paste fragments of /etc/odbc.ini and /etc/odbcinst.ini, relevant to the Oracle ODBC driver and the data source?

Also the output from "rpm -qa|grep odbc" would be helpful.
I have this on my box: libiodbc-3.52.7-9.fc23.x86_64

Comment by Alexander Barkov [ 2018-11-07 ]

Karsten, can you also please try one more thing:

  • In Oracle:

    DROP TABLE t1;
    CREATE TABLE t1 (a VARCHAR(4000));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C4')));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C5')));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C6')));
    SELECT a, RAWTOHEX(a) FROM t1;
    

  • In MariaDB:

    DROP TABLE IF EXISTS t1;
    CREATE TABLE t1 (
      a VARBINARY(4000)
    ) ENGINE=CONNECT DEFAULT CHARSET=binary CONNECTION='DSN=HHZ1' TABLE_TYPE='ODBC' DATA_CHARSET=binary tabname='CHZ.T1';
    SELECT a, HEX(a) FROM t1;
    

and paste output of the above two scripts.

Note, it's very similar to what I asked on 2018-10-29, but disables all ConnectSE/MariaDB character set conversion using VARBINARY as the data type and "binary" in both CHARSET and DATA_CHARSET.

Thanks.

Comment by Karsten Budde [ 2018-11-08 ]

Alexander, here the parts from the ini-files

odbc.ini:

[HZO1]
Driver = Oracle 12c ODBC driver
ServerName = HZO1.world
DSN = HZO1
client charset = UTF-8
server charset = WE8ISO8859P1

odbcinst.ini:

[Oracle 12c ODBC driver]
Description = Oracle ODBC driver for Oracle 12c
Driver = /usr/lib/oracle/12.2/client64/lib/libsqora.so.12.1
Driver Logging = 7

[MySQL]
Description = ODBC for MySQL
Driver = /usr/lib/libmyodbc5.so
Setup = /usr/lib/libodbcmyS.so
Driver64 = /usr/lib64/libmyodbc5.so
Setup64 = /usr/lib64/libodbcmyS.so
FileUsage = 1

and the output from terminal:

[budde@server ~]$ rpm -qa|grep odbc
oracle-instantclient12.2-odbc-12.2.0.1.0-1.x86_64

Comment by Karsten Budde [ 2018-11-08 ]

output from sqlplus on windows:

SQL> SELECT a, RAWTOHEX(a) FROM t1;

A
--------------------------------------------------------------------------------
RAWTOHEX(A)
--------------------------------------------------------------------------------

C4


C5

ã
C6

SQL>

When I perfom the create statement in mariadb, then I became following output:

MariaDB [test]> CREATE TABLE test.t1 ( a VARBINARY(4000) ) ENGINE=CONNECT DEFAULT CHARSET=binary CONNECTION='DSN=HZO1' TABLE_TYPE='ODBC' DATA_CHARSET=binary TABNAME='BUDDE.T1';
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near 'binary TABNAME='BUDDE.T1'' at line 1
MariaDB [test]>

I don't know, what is wrong.

Comment by Alexander Barkov [ 2018-11-08 ]

Sorry, my fault. The binary keyword must be quoted in back ticks in this context:

CREATE TABLE test.t1 ( a VARBINARY(4000) ) ENGINE=CONNECT DEFAULT
CHARSET=`binary` CONNECTION='DSN=HZO1' TABLE_TYPE='ODBC'
DATA_CHARSET=`binary` TABNAME='BUDDE.T1';

Comment by Karsten Budde [ 2018-11-08 ]

No Problem, here is the output:

MariaDB [test]> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0.001 sec)

MariaDB [test]> CREATE TABLE t1 (
-> a VARBINARY(4000)
-> ) ENGINE=CONNECT DEFAULT CHARSET='binary' CONNECTION='DSN=HZO1' TABLE_TYPE='ODBC' DATA_CHARSET='binary' TABNAME='BUDDE.T1';
Query OK, 0 rows affected (0.001 sec)

MariaDB [test]> SELECT a, HEX(a) FROM t1;
------------+

a HEX(a)

------------+

A 41
? 3F
? 3F

------------+
3 rows in set (0.062 sec)

Comment by Alexander Barkov [ 2018-11-08 ]

What are these lines in the DSN configurations:

client charset = UTF-8
server charset = WE8ISO8859P1

I could not find any proves that Oracle's ODBC driver understands these lines.

Comment by Alexander Barkov [ 2018-11-08 ]

I have installed the same ODBC driver: oracle-instantclient12.2-odbc-12.2.0.1.0-1.x86_64.
Now I'm getting this as an output from the view on V$SESSION_CONNECT_INFO:

+------+---------------------+--------+----------------+---------------+---------------------+
| SID  | AUTHENTICATION_TYPE | OSUSER | CLIENT_CHARSET | CLIENT_DRIVER | CLIENT_OCI_LIBRARY  |
+------+---------------------+--------+----------------+---------------+---------------------+
|  128 | DATABASE            | bar    | UTF16          | ODBCCLNT      | Full Instant Client |
|  128 | DATABASE            | bar    | UTF16          | ODBCCLNT      | Full Instant Client |
|  128 | DATABASE            | bar    | UTF16          | ODBCCLNT      | Full Instant Client |
|  128 | DATABASE            | bar    | UTF16          | ODBCCLNT      | Full Instant Client |
+------+---------------------+--------+----------------+---------------+---------------------+

Comment by Alexander Barkov [ 2018-11-08 ]

After experimenting with the 12.2.0.1.0-1 driver, I found that it works exactly the same with my old driver from Oracle11-XE:

  • If the MariaDB server is running with a proper NLS_LANG value, everything works as expected
  • With an empty NLS_LANG value, I'm getting bad result:

Ä -> A
Å -> ?
Æ -> ?

Karsten, please make sure again that mysqld (the MariaDB server binary) is running with NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1

It can be added to /etc/environment, just add this line:

NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1

After editing this file, please logout, and login again as root. The do the following.

Make sure NLS_LANG is set:
echo $NLS_LANG

Then restart MariaDB server:

service mariadb stop
service mariadb start

Comment by Alexander Barkov [ 2018-11-08 ]

After restarting the server with a proper NLS_LANG value, please try again the following:

  • In Oracle:

    DROP TABLE t1;
    CREATE TABLE t1 (a VARCHAR(4000));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C4')));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C5')));
    INSERT INTO t1 VALUES (UTL_RAW.CAST_TO_VARCHAR2(HEXTORAW('C6')));
    SELECT a, RAWTOHEX(a) FROM t1;
    

  • In MariaDB:

    DROP TABLE IF EXISTS t1;
    CREATE TABLE t1 (
      a VARCHAR(4000)
    ) ENGINE=CONNECT DEFAULT CHARSET=utf8mb4 CONNECTION='DSN=HHZ1' TABLE_TYPE='ODBC' DATA_CHARSET=latin1 tabname='CHZ.T1';
    SELECT a, HEX(a) FROM t1;
    

Thanks.

Comment by Karsten Budde [ 2018-11-08 ]

The server administrator set etc/environment as

NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1

There are no more entries in this file.

Then he log off and on as root.
Then he stop and start MariaDB.

Afterwards I try it in MariaDB again:

MariaDB [test]> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0.002 sec)
 
MariaDB [test]> CREATE TABLE t1 (
    ->   a VARCHAR(4000)
    -> ) ENGINE=CONNECT DEFAULT CHARSET=utf8mb4 CONNECTION='DSN=HZO1' TABLE_TYPE                                       ='ODBC' DATA_CHARSET=latin1 tabname='BUDDE.T1';
Query OK, 0 rows affected (0.002 sec)
 
MariaDB [test]> SELECT a, HEX(a) FROM t1;
+------+--------+
| a    | HEX(a) |
+------+--------+
| A    | 41     |
| ?    | 3F     |
| ?    | 3F     |
+------+--------+
3 rows in set (0.068 sec)

Comment by Alexander Barkov [ 2018-11-08 ]

It appears that a systemd service does not respect /etc/environment.
After some googling, I found this way to add an environment variable to a service:

1. Open /usr/lib/systemd/system/mariadb.service in a text editor

2. Find the section "[Service]" and add the environment variable as follows:

[Service]
Environment=NLS_LANG=GERMAN_GERMANY.WE8ISO8859P1

3. Reload systemd units:

systemctl daemon-reload

4. Restart MariaDB server:

sudo service mariadb restart

5. Run the test SQL script again:

DROP TABLE IF EXISTS t1;
CREATE TABLE t1 (
  a VARCHAR(4000)
) ENGINE=CONNECT DEFAULT CHARSET=utf8mb4 CONNECTION='DSN=HHZ1' TABLE_TYPE='ODBC' DATA_CHARSET=latin1 tabname='CHZ.T1';
SELECT a, HEX(a) FROM t1;

Comment by Karsten Budde [ 2018-11-09 ]

Alexander,
I am glad, your last tip led to success.
Here is the output on the console:

MariaDB [test]> DROP TABLE IF EXISTS t1;
Query OK, 0 rows affected (0.002 sec)
 
MariaDB [test]> CREATE TABLE t1 (   a VARCHAR(4000) ) ENGINE=CONNECT DEFAULT CHARSET=utf8mb4 CONNECTION='DSN=HZO1' TABLE_TYPE='ODBC' DATA_CHARSET=latin1 tabname='BUDDE.T1';
Query OK, 0 rows affected (0.002 sec)
 
MariaDB [test]> SELECT a, HEX(a) FROM t1;
+------+--------+
| a    | HEX(a) |
+------+--------+
| ▒    | C384   |
| ▒    | C385   |
| ▒    | C386   |
+------+--------+
3 rows in set (0.148 sec)

Here the output in DBeaver:

a     |HEX(a)
------|------
Ä     |C384  
Å     |C385  
Æ     |C386  

Thank you very much for your help.

Karsten

Comment by Alexander Barkov [ 2018-11-09 ]

Karsten, you're welcome. Thank you for cooperation and fast feedback.

I think this should be documented.

bertrandop, can you please make sure that this is documented in the ConnectSE related manual sections?

Thanks!

Comment by Olivier Bertrand [ 2018-11-09 ]

All this seems very specific to Oracle and perhaps some Linux operating systems. Describing character handling with ODBC in general might require a complete book.
How do you think this should be documented for CONNECT?
I suggest, for users having character recognition problems with Oracle, just to add a link to this web page in the documentation.

Comment by Alexander Barkov [ 2018-12-08 ]

Documented here:
https://mariadb.com/kb/en/library/connect-odbc-table-type-accessing-tables-from-another-dbms/#non-ascii-character-sets-with-oracle
Thanks for KennethDyer.

Generated at Thu Feb 08 08:36:56 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.