[MCOL-3499] S3 with localStorage cpimport returned a 'bad length field' error message Created: 2019-09-10  Updated: 2019-10-29  Resolved: 2019-10-29

Status: Closed
Project: MariaDB ColumnStore
Component/s: cpimport
Affects Version/s: 1.4.0
Fix Version/s: 1.4.1

Type: Bug Priority: Major
Reporter: Daniel Lee (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MCOL-3566 cpimport fails on PM2 with S3 storage... Closed

 Description   

Build tested: 1.4.0-1

server commit:
67452bc
engine commit:
64ceb86

With S3 localStorage on a single server configuration, I created a dbt3 database and tried to cpimport a 1gb dataset. When it was loading the lineitem table, the following error was shown and cpimport never finished after almost one hour. I had to kill the cpimport process. The same test with S3 cloud (AWS) completed successfully.

Using table OID 3092 as the default JOB ID
Input file(s) will be read from : /root/tests
Job description file : /usr/local/mariadb/columnstore/data/bulk/tmpjob/3092_D20190910_T205837_S825658_Job_3092.xml
Log file for this job: /usr/local/mariadb/columnstore/data/bulk/log/Job_3092.log
2019-09-10 20:58:37 (17949) INFO : successfully loaded job file /usr/local/mariadb/columnstore/data/bulk/tmpjob/3092_D20190910_T205837_S825658_Job_3092.xml
2019-09-10 20:58:37 (17949) INFO : Job file loaded, run time for this step : 0.096211 seconds
2019-09-10 20:58:37 (17949) INFO : PreProcessing check starts
2019-09-10 20:58:37 (17949) INFO : input data file /data/qa/autopilot/data/source/dbt3/1g/lineitem.tbl
2019-09-10 20:58:37 (17949) INFO : PreProcessing check completed
2019-09-10 20:58:37 (17949) INFO : preProcess completed, run time for this step : 0.120688 seconds
2019-09-10 20:58:37 (17949) INFO : No of Read Threads Spawned = 1
2019-09-10 20:58:37 (17949) INFO : No of Parse Threads Spawned = 3
SocketPool: warning! Probably got a bad length field! payload length = 12 endOfData = 42 startOfPayload = 9



 Comments   
Comment by Daniel Lee (Inactive) [ 2019-09-10 ]

Although this test is for single server configuration. If the -d parameter is used for postConfigure, the test would pass.

Comment by Ben Thompson (Inactive) [ 2019-10-03 ]

Have not been able to reproduce with script running overnight doing hundreds of cpimports.

Comment by Ben Thompson (Inactive) [ 2019-10-04 ]

Reviewed changes since commit that is shown in ticket descriptions. No changes should have fixed this but I am unable to reproduce with 1.4.0 current develop builds. Will continue to monitor issue.

Comment by Daniel Lee (Inactive) [ 2019-10-10 ]

While testing for other tickets on the 1.2.5-1 with S3 build, I also ran into the same issue..

Single node installation
Create a dbt3 database
Load tables one at a time. This time, it occurred to the customer table and never finished after over 16 hours.

Comment by Patrick LeBlanc (Inactive) [ 2019-10-10 ]

We'll need more details. We haven't seen this prob in testing after milestone 1 when we wrote that code (March +/-). How big was the DB you were loading? Also, which machine were you using, we may need to get on that same machine to confirm the problem.

Comment by Ben Thompson (Inactive) [ 2019-10-14 ]

Issue has been reproduced and fix is in progress.

Comment by Ben Thompson (Inactive) [ 2019-10-17 ]

This appears to be the same issue as MCOL-3566 fix should solve both issues

Comment by Ben Thompson (Inactive) [ 2019-10-17 ]

Issue is subtle race at startup between PrefixCache populating cache with files that are present on restart can find the first file written to cache directory by a write call at startup. this causes the Cache LRU list to contain a duplicate entry of the file. Later when _makespace tries to flush from lru the duplicate listing attempts to flush and cannot be found. Likely because it was removed or renamed within the metadata.

Fix is to not run prefixCache call to populate in background for now. The more optimal solution that requires more detailed thought would be to make prefixCache::populate synchronous with write calls.

Opening a new issue regarding this future improvement and linking.

Comment by Daniel Lee (Inactive) [ 2019-10-29 ]

Build verified: 1.4.1-1

engine commit:
3e7a964

Generated at Thu Feb 08 02:43:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.