[MCOL-5480] LDI loads values incorrectly for MEDIUMINT, TIME and TIMESTAMP when cpimport is used for batch insert Created: 2023-04-20  Updated: 2023-09-20  Resolved: 2023-05-30

Status: Closed
Project: MariaDB ColumnStore
Component/s: DMLProc
Affects Version/s: None
Fix Version/s: 23.02.4

Type: Bug Priority: Blocker
Reporter: Daniel Lee (Inactive) Assignee: Gagan Goel (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MCOL-5579 Loading into Columnstore from SELECT ... Closed
Assigned for Review: Roman Roman
Assigned for Testing: Daniel Lee Daniel Lee (Inactive)

 Description   

Build tested: 23.02.02 and drone build #7534 for develop branch

The issue happens only when running "load data infile" and with "set columnstore_use_import_for_batchinsert = on;". INSERT, UPDATE, as well as stand alone cpimport do not have this issue.

[rocky8:root@rocky8~]# cat /data/qa/shares/mcol271/t.dat
1000|2020-08-13 03:14:07|
 
MariaDB [mytest]> create table t (cmediumint mediumint, ctimestamp timestamp) engine=columnstore;
Query OK, 0 rows affected (0.160 sec)
 
MariaDB [mytest]> set columnstore_use_import_for_batchinsert = off;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [mytest]> load data infile "/data/qa/shares/mcol271/t.dat" into table t fields terminated by "|";
Query OK, 1 row affected (0.189 sec)                 
Records: 1  Deleted: 0  Skipped: 0  Warnings: 0
 
MariaDB [mytest]> select * from t;
+------------+---------------------+
| cmediumint | ctimestamp          |
+------------+---------------------+
|       1000 | 2020-08-13 03:14:07 |
+------------+---------------------+
1 row in set (0.062 sec)
 
MariaDB [mytest]> set columnstore_use_import_for_batchinsert = on;
Query OK, 0 rows affected (0.001 sec)
 
MariaDB [mytest]> load data infile "/data/qa/shares/mcol271/t.dat" into table t fields terminated by "|";
ERROR 9999 (HY000): Values saturated 0% of stage done
MariaDB [mytest]> select * from t;
+------------+---------------------+
| cmediumint | ctimestamp          |
+------------+---------------------+
|       1000 | 2020-08-13 03:14:07 |
|    8388607 | 1998-01-04 21:28:32 |
+------------+---------------------+
2 rows in set (0.039 sec)
 
MariaDB [mytest]> exit
Bye
[rocky8:root@rocky8~]# cpimport mytest t /data/qa/shares/mcol271/t.dat
Locale = en_US.UTF-8
Using table OID 3102 as the default JOB ID
Input file(s) will be read from : /root
Job description file : /var/log/mariadb/columnstore/data/bulk/tmpjob/3102_D20230420_T201905_S392114_Job_3102.xml
Log file for this job: /var/log/mariadb/columnstore/cpimport/Job_3102.log
2023-04-20 20:19:05 (14371) INFO : successfully loaded job file /var/log/mariadb/columnstore/data/bulk/tmpjob/3102_D20230420_T201905_S392114_Job_3102.xml
2023-04-20 20:19:05 (14371) INFO : Job file loaded, run time for this step : 0.0217612 seconds
2023-04-20 20:19:05 (14371) INFO : PreProcessing check starts
2023-04-20 20:19:05 (14371) INFO : input data file /data/qa/shares/mcol271/t.dat
2023-04-20 20:19:05 (14371) INFO : PreProcessing check completed
2023-04-20 20:19:05 (14371) INFO : preProcess completed, run time for this step : 0.00960803 seconds
2023-04-20 20:19:05 (14371) INFO : No of Read Threads Spawned = 1
2023-04-20 20:19:05 (14371) INFO : No of Parse Threads Spawned = 3
2023-04-20 20:19:05 (14371) INFO : For table mytest.t: 1 rows processed and 1 rows inserted.
2023-04-20 20:19:06 (14371) INFO : Bulk load completed, total run time : 1.03288 seconds
 
[rocky8:root@rocky8~]# mariadb mytest
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 32
Server version: 10.6.12-7-MariaDB-enterprise MariaDB Enterprise Server
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MariaDB [mytest]> select * from t;
+------------+---------------------+
| cmediumint | ctimestamp          |
+------------+---------------------+
|       1000 | 2020-08-13 03:14:07 |
|    8388607 | 1998-01-04 21:28:32 |
|       1000 | 2020-08-13 03:14:07 |
+------------+---------------------+
3 rows in set (0.045 sec)



 Comments   
Comment by Daniel Lee (Inactive) [ 2023-04-20 ]

Data type TIME also has this issue

Value '11:58:28" would be loaded as "838:59:59".

Comment by Gagan Goel (Inactive) [ 2023-05-23 ]

For QA: This issue is caused by a bug in the processing of MEDIUMINT datatype in the plugin code. So if the LDI is performed on a signed/unsigned MEDIUMINT datatype along with other datatypes in the LDI, the other columns also get impacted. In testing, if you just use TIMESTAMP or TIME datatypes or both, the LDI works fine. I have added test cases to MTR that test various combinations of LDI that include load of signed/unsigned MEDIUMINT datatypes in isolation as well as with other datatypes.

Comment by Daniel Lee (Inactive) [ 2023-05-30 ]

Build verified: develop branch

engine: d083650f759152db639cab52cab530f06d7e36d9
server: 805750b3a90ed4aecbf475025e63674aaab7f7f7
buildNo: 7906

Verified the test case above as well as MTR test case "mysql-test/columnstore/bugfixes/mcol-5480.test".

Generated at Thu Feb 08 02:58:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.