[MCOL-4005] mishandling multi-byte chars in DML export to cpimport Created: 2020-05-15 Updated: 2021-04-19 Resolved: 2020-05-26 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | MDB Plugin |
| Affects Version/s: | 1.4.3 |
| Fix Version/s: | 1.4.4 |
| Type: | Bug | Priority: | Blocker |
| Reporter: | Patrick LeBlanc (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Issue Links: |
|
||||||||
| Sprint: | 2020-7 | ||||||||
| Description |
|
Investigating a crash in Sky, I found a large problem in our code that exports data to cpimport (LDI). In many places we're assuming chars are 1 byte, and mixing things like byte count and character count. The function is ha_mcs_impl_write_batch_row_() in ha_mcs_dml.cpp. It'll be obvious when you look at it. |
| Comments |
| Comment by Gagan Goel (Inactive) [ 2020-05-22 ] | |||||||||||||||||||||||||||
|
For QA: With this issue, we are fixing a crash while performing an LDI in a table which has the character set utf8 at the table level. Here are steps to reproduce:
Here, mcol4005.txt contains the following
In addition, if we remove the character set property from the table creation, the LDI works, but it does not insert the "\" in the data file properly. We are also fixing this. Here are steps to reproduce:
| |||||||||||||||||||||||||||
| Comment by Daniel Lee (Inactive) [ 2020-05-26 ] | |||||||||||||||||||||||||||
|
Build verified: 1.4.4-1 (Jenkins 20200522) Tested on all 3 modes of columnstore_use_import_for_batchinsert. |