[MCOL-4434] LDI multiple-byte table without specifying character set loads garbage Created: 2020-12-04  Updated: 2021-01-15

Status: Open
Project: MariaDB ColumnStore
Component/s: DDLProc
Affects Version/s: 5.5.1
Fix Version/s: Icebox

Type: Bug Priority: Major
Reporter: Daniel Lee (Inactive) Assignee: Unassigned
Resolution: Unresolved Votes: 0
Labels: None

Attachments: Text File t.txt    
Issue Links:
Problem/Incident
causes MCOL-2000 varchar specified sizing is not in ch... Closed
is caused by MCOL-2000 varchar specified sizing is not in ch... Closed

 Description   

Build tested: 5.5.1 (Drone 1265)

This issue was identified during testing for MCOL-2000.

MariaDB [mytest]> load data infile '/tmp/t.txt' into table mcol2000utf8 columns terminated by "|";
Query OK, 3 rows affected, 18 warnings (0.406 sec)
Records: 3 Deleted: 0 Skipped: 0 Warnings: 18
MariaDB [mytest]> select * from mcol2000utf8;
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 c10 c11
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 ç¾ ç¾Ž 美国福æ ç¾ ç¾Ž 美国福æ 美国福æ-¯æ-°é—»æ——下 美国福æ-¯æ-°é—»æ——下位于纽约 美国福æ-¯æ-°é—»æ——下位于纽约的第5 美国福æ-¯æ-°é—»æ——下位于纽约的第5é¢'é"报导, 美国福æ-¯æ-°é—»æ——下位于纽约的第5é¢'é"报导,美国福æ-¯æ-°
2 ç¾ ç¾Ž 美国福æ ç¾ ç¾Ž 美国福æ 美国福æ-¯æ-°é—»æ——下 美国福æ-¯æ-°é—»æ——下位于纽约 美国福æ-¯æ-°é—»æ——下位于纽约的第5 美国福æ-¯æ-°é—»æ——下位于纽约的第5é¢'é"报导, 美国福æ-¯æ-°é—»æ——下位于纽约的第5é¢'é"报导,美国福æ-¯æ-°
3 ç¾ ç¾Ž 美国福æ ç¾ ç¾Ž 美国福æ 美国福æ-¯æ-°é—»æ——下 美国福æ-¯æ-°é—»æ——下位于纽约 美国福æ-¯æ-°é—»æ——下位于纽约的第5 美国福æ-¯æ-°é—»æ——下位于纽约的第5é¢'é"报导, 美国福æ-¯æ-°é—»æ——下位于纽约的第5é¢'é"报导,美国福æ-¯æ-°
The work around is to specify character set.
load data infile '/tmp/t.txt' into table mcol2000utf8 character set utf8 columns terminated by "|";
The same LDI test worked on InnoDB table without specifying character set.

t.txt file attached to this ticket.


Generated at Thu Feb 08 02:50:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.