[MCOL-1298] Bulk import of large CSV file causes restarting of the WriteEngineServer process on the PM Created: 2018-03-23  Updated: 2023-10-26  Resolved: 2018-04-12

Status: Closed
Project: MariaDB ColumnStore
Component/s: None
Affects Version/s: 1.1.4
Fix Version/s: 1.1.4

Type: Bug Priority: Major
Reporter: Elena Kotsinova (Inactive) Assignee: Elena Kotsinova (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

CS 1.1.2 on CentOS 7


Issue Links:
Relates
relates to MCOL-1299 WriteEngineServ failure triggers stuc... Closed
Sprint: 2018-07, 2018-08

 Description   

1. Start bulk load of a large CSV file with pentaho bulk load adapter. File contains 121 million rows (9 GB in size).
2. Pentaho CSV input module writes data on chunks and passes them to the Bulk Loader. This process is visible from the following log part:

2018/03/23 14:51:12 - CSV file input.0 - Line number : 99400000
2018/03/23 14:51:16 - MariaDB ColumnStore Bulk Loader.0 - Linenr 99400000
2018/03/23 14:51:18 - CSV file input.0 - Line number : 99450000
2018/03/23 14:51:18 - MariaDB ColumnStore Bulk Loader.0 - Linenr 99450000
2018/03/23 14:51:20 - CSV file input.0 - Line number : 99500000
2018/03/23 14:51:24 - MariaDB ColumnStore Bulk Loader.0 - Linenr 99500000
2018/03/23 14:51:26 - CSV file input.0 - Line number : 99550000

3. After around 2 hours work import process freezes on client side - server where pentaho job runs.
4. This is the time when on PM1 the following message is recorded:

Mar 23 12:53:07 mariadb-59f24c1f-215-1 writeengine[3596]: 07.588935 |0|147|0| I 19 CAL0080: Compression Handling: Compressed data does not fit, caused a chunk shifting @line:928 filename:/usr/lo
cal/mariadb/columnstore/data1/000.dir/000.dir/016.dir/135.dir/001.dir/FILE000.cdf, chunkId:0 data size:2030689/available:1859584 -- shifting SUCCESS
Mar 23 12:53:07 mariadb-59f24c1f-215-1 kernel: WriteEngineServ[14997]: segfault at 7f4cb80000c8 ip 00007f4cb80000c8 sp 00007f4cc0d22c78 error 15
Mar 23 12:53:08 mariadb-59f24c1f-215-1 ProcessMonitor[984]: 08.300933 |0|0|0| C 18 CAL0000: *****Calpont Process Restarting: WriteEngineServer, old PID = 3596
Mar 23 12:53:09 mariadb-59f24c1f-215-1 ProcessMonitor[984]: 09.557196 |0|0|0| I 18 CAL0000: Calpont Process WriteEngineServer restarted successfully!!
Mar 23 12:53:09 mariadb-59f24c1f-215-1 ProcessManager[1089]: 09.565387 |0|0|0| I 17 CAL0000: MSG RECEIVED: Process Restarted on pm1/WriteEngineServer
Mar 23 12:53:09 mariadb-59f24c1f-215-1 ProcessMonitor[984]: 09.644656 |0|0|0| I 18 CAL0000: MSG RECEIVED: Re-Init process request on: cpimport
Mar 23 12:53:09 mariadb-59f24c1f-215-1 ProcessMonitor[984]: 09.673504 |0|0|0| I 18 CAL0000: PROCREINITPROCESS: completed, no ack to ProcMgr

Result:
1. Pentaho job never finishes.
2. The table in CS where data must be loaded is empty and also it is locked by mcsapi process. Lock on this table is not released after termination of the petaho process. It looks like the lock is never released. Restart of the CS is needed. Or cleartablelock also can be used.

Expected:
1. It is expected that error on PM is propagated to all involved parties (in this case mcsapi and adapter). Final behavior must be - Pentaho job finished with error message.
2. Table lock must be released when import operation was not completed by some reason.

Note: The CSV file has been succesfully imported into CS using

LOAD DATA INFILE

and data in it are homogeneous.



 Comments   
Comment by Andrew Hutchings (Inactive) [ 2018-03-23 ]

Can you please look in information_schema.columnstore_files, look what OBJECT ID refers to data1/000.dir/000.dir/016.dir/135.dir/001.dir/FILE000.cdf and use information_schema.columnstore_columns to find out what that column is? Knowing the data type for that column will go a long way towards figuring out what happened.

Comment by Elena Kotsinova (Inactive) [ 2018-03-23 ]

All fields in the table are varchar(256).
The field under question is also varchar(256).

TABLE_SCHEMA TABLE_NAME COLUMN_NAME OBJECT_ID DICTIONARY_OBJECT_ID LIST_OBJECT_ID TREE_OBJECT_ID DATA_TYPE COLUMN_LENGTH COLUMN_POSITION COLUMN_DEFAULT IS_NULLABLE NUMERIC_PRECISION NUMERIC_SCALE IS_AUTOINCREMENT COMPRESSION_TYPE
edk_stg f_trans_delta trans_id 4231 4241 NULL NULL varchar 256 0 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta customer_id 4232 4242 NULL NULL varchar 256 1 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta merchant_id 4233 4243 NULL NULL varchar 256 2 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta card_id 4234 4244 NULL NULL varchar 256 3 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta trans_datetime 4235 4245 NULL NULL varchar 256 4 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta trans_type 4236 4246 NULL NULL varchar 256 5 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta amount 4237 4247 NULL NULL varchar 256 6 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta reversed 4238 4248 NULL NULL varchar 256 7 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta response_code 4239 4249 NULL NULL varchar 256 8 NULL 0 10 0 1 Snappy
edk_stg f_trans_delta is_first_topup 4240 4250 NULL NULL varchar 256 9 NULL 0 10 0 1 Snappy
Comment by Andrew Hutchings (Inactive) [ 2018-03-23 ]

Figured it would be a dict column. They had the biggest complexity.

jens.rowekamp How many rows does the Pentaho adapter pass to the API before a commit is triggered?

Comment by Andrew Hutchings (Inactive) [ 2018-03-23 ]

I've filed MCOL-1299 for part 2 of Elena's description.

Comment by Jens Röwekamp (Inactive) [ 2018-03-23 ]

All

Comment by Andrew Hutchings (Inactive) [ 2018-03-23 ]

elena.kotsinova can you please confirm you are using 1.1.2 as the ColumnStore engine here? If so can you please use 1.1.3 instead as there are known API data corruption issues with engine 1.1.2.

Comment by Elena Kotsinova (Inactive) [ 2018-03-26 ]

Yes, version of ColumnStore is 1.1.2
Retested with 1.1.3 - the load failed on the next step. New issue will be reported.

Comment by David Thompson (Inactive) [ 2018-04-02 ]

What was the issue with 1.1.3?

Comment by Elena Kotsinova (Inactive) [ 2018-04-12 ]

verified with
PDI adapter Version: 1.1.4 Revision: 493914b and CS 1.1.4 from 2 April 2018

Generated at Thu Feb 08 02:27:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.