[MDEV-28489] The Cyrillic string is not truncated by the number of characters when using the Connect Engine table Created: 2022-05-06  Updated: 2022-11-11  Resolved: 2022-11-11

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - Connect
Affects Version/s: 10.5.9, 10.6.0
Fix Version/s: 10.11.2, 10.3.38, 10.4.28, 10.5.19, 10.6.12, 10.7.8, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Critical
Reporter: long.skinny.boy Assignee: Andrew Hutchings
Resolution: Fixed Votes: 0
Labels: Connect-Engine, utf8
Environment:

CentOS 8


Issue Links:
Duplicate
is duplicated by MDEV-26722 connect table truncates table column ... Closed

 Description   

Good afternoon, I previously used version 10.5.8 together with Connect Engine to display third-party information from other databases, there were no problems
That is, I created a table pointing to a table from the MSSQL database and the information was displayed successfully

After that, I installed MariaDB 10.6.7 separately and I did the same thing and noticed one feature that now lines containing Cyrillic do not fit into the allotted number of characters.

In other words, I have a string in MSSQL: ИВАНОВА

When I created in MariaDB table with row VARCHAR(1) and get next:
10.5.8 = И
10.6.7 = ?

Change to VARCHAR(2)
10.5.8 = ИВ
10.6.7 = И

Change to VARCHAR(3)
10.5.8 = ИВА
10.6.7 = ИВ

Change to VARCHAR(4)
10.5.8 = ИВАН
10.6.7 = ИВ?

Change to VARCHAR(5)
10.5.8 = ИВАНО
10.6.7 = ИВА

If row without cyrillic, only latin or numbers for example IVANOVA
All excellent:

10.5.8 = I
10.6.7 = I

Change to VARCHAR(2)
10.5.8 = IV
10.6.7 = IV

Change to VARCHAR(3)
10.5.8 = IVA
10.6.7 = IVA

It seems that he began to somehow divide the multibyte Cyrillic and other special characters



 Comments   
Comment by long.skinny.boy [ 2022-05-06 ]

A simpler example that is easy to reproduce (JSON file):

[
  {
    "name": "Иванова"
  },
  {
    "name": "Ivanova"
  }
]

CREATE OR REPLACE TABLE sample (
  name VARCHAR(1))
ENGINE=CONNECT TABLE_TYPE=JSON
FILE_NAME='/opt/sample.json';

At the output I get:

name
?
I

When I expect, and how it really is in 10.5.8

name
И
I
Comment by long.skinny.boy [ 2022-05-07 ]

It looks like this happened exactly between versions 10.5.8 and 10.5.9, also starting with 10.6.0
That is, such a situation with string clipping began in 10.5.9
And starting from version 10.5.9 there is an updated version of Connect Engine 1.07.0002

Comment by Sergei Golubchik [ 2022-05-11 ]

caused by these lines: https://github.com/MariaDB/server/blob/mariadb-10.5.9/storage/connect/ha_connect.cc#L1614-L1616

when removed, the bug goes away.

Comment by long.skinny.boy [ 2022-05-11 ]

It's great that there is a solution! What is the probability that this can be fixed in the next 10.6.x release?

As I understand it, this fix will also solve the problem of this issue : MDEV-26722

Or maybe you can somehow recompile this code file and create a certain hotfix?

Comment by long.skinny.boy [ 2022-10-18 ]

Good afternoon, Sergei. Is the above described fix planned in new versions? Unfortunately, due to this error, I cannot upgrade to the latest version of MariaDB. Or maybe there is an opportunity to upgrade to the latest version, but leave ha_connect on the version from 10.5.8 release

Comment by Sergei Golubchik [ 2022-10-26 ]

It's still work in progress, so I'm afraid it likely won't make it into this release

Generated at Thu Feb 08 10:01:08 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.