[MCOL-4900] LOAD DATA LOCAL INFILE is slow when there is log for synchronizeWithJournal Created: 2021-10-19  Updated: 2022-07-14  Resolved: 2022-04-07

Status: Closed
Project: MariaDB ColumnStore
Component/s: Documentation
Affects Version/s: 6.1.1
Fix Version/s: 6.4.1

Type: Bug Priority: Major
Reporter: Allen Lee (Inactive) Assignee: Ben Thompson (Inactive)
Resolution: Not a Bug Votes: 0
Labels: None
Environment:

Amazon EC2


Attachments: Text File CS0347939_pm1-debug.log    
Sprint: 2021-13, 2021-14, 2021-15, 2021-16, 2021-17

 Description   

customer reported that LDLI(LOAD DATA LOCAL INFILE) got slower than before when using along with S3. From the debug.log, the followings are the time taken per each import process.

#1 Oct 18 03:25:53 - Oct 18 03:42:05 : rea_bigsumplus_20200705000000.txt - For table reax6.bigsumplus: 24680000 rows processed and 24680000 rows inserted / 18 min
#2 Oct 18 03:44:24 - Oct 18 05:47:40: rea_bigsumplus_20200706000000.txt - For table reax6.bigsumplus: 29040000 rows processed and 29040000 rows inserted / 2 hour and 3 min
#3 Oct 18 13:16:01 - Oct 18 13:33:16: rea_bigsumplus_20200707000000.txt - For table reax6.bigsumplus: 27280000 rows processed and 27280000 rows inserted / 17 min
#4 Oct 18 13:35:25 - Oct 18 16:18:02: rea_bigsumplus_20200708000000.txt - For table reax6.bigsumplus: 26857196 rows processed and 26857196 rows inserted / 2 hour 43 min
#5 Oct 18 17:47:22 - Oct 18 20:28:53: rea_bigsumplus_20200709000000.txt - For table reax6.bigsumplus: 27560000 rows processed and 27560000 rows inserted / 2 hour 41 min

Inserted rows aren't much different. Looking at #2,4,5, it spent the most of time on synchronizeWithJournal().

[From #2] - took 1 hour 31 minutes
======================================================================================
Oct 18 04:14:32 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): no journal file found for data1/a295d85e-ca19-4633-876d-712f0fbfe075_10485760_3606487_data1~systemFiles~dbrm~BRM_saves_journal
Oct 18 04:14:32 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): data1/a295d85e-ca19-4633-876d-712f0fbfe075_10485760_3606487_data1~systemFiles~dbrm~BRM_saves_journal has no journal, but it does exist in the cloud. This indicates that an overlapping syncWithJournal() call handled it properly.
....
Oct 18 05:45:53 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): data1/4ccd1fe4-7c13-431c-b41f-2140be7e2673_0_2891776_data1~000.dir~000.dir~050.dir~004.dir~146.dir~FILE002.cdf has no journal, but it does exist in the cloud. This indicates that an overlapping syncWithJournal() call handled it properly.
======================================================================================
 
[From #4] - took 2 hour 20minutes
======================================================================================
Oct 18 13:35:39 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): no metadata found for data1/bulkRollback/12772_data/12816.p147.s0.tmp. It must have been deleted.
Oct 18 13:47:09 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): no journal file found for data1/81bfaa41-a078-492c-b836-527099f7f0dd_0_4325376_data1~000.dir~000.dir~049.dir~229.dir~148.dir~FILE001.cdf
...
Oct 18 15:54:48 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): data1/8e71ebc1-37db-430e-a8b7-a6c4ae4606b1_0_4874240_data1~000.dir~000.dir~049.dir~229.dir~149.dir~FILE001.cdf has no journal, but it does exist in the cloud. This indicates that an overlapping syncWithJournal() call handled it properly.
======================================================================================
 
[From #5] - took 1 hour 50 minutes
======================================================================================
Oct 18 18:27:30 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): no journal file found for data1/559a05c5-d34e-4d08-9552-d71179904486_0_1105920_data1~000.dir~000.dir~049.dir~229.dir~150.dir~FILE002.cdf
Oct 18 18:27:30 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): data1/559a05c5-d34e-4d08-9552-d71179904486_0_1105920_data1~000.dir~000.dir~049.dir~229.dir~150.dir~FILE002.cdf has no journal, but it does exist in the cloud. This indicates that an overlapping syncWithJournal() call handled it properly.
...
Oct 18 20:17:16 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): no journal file found for data1/335dd7cb-4b49-4f87-b4fd-7e06e4312e14_0_3809280_data1~000.dir~000.dir~050.dir~005.dir~150.dir~FILE002.cdf
Oct 18 20:17:16 mariadb-x6-pm1 StorageManager[29081]: synchronizeWithJournal(): data1/335dd7cb-4b49-4f87-b4fd-7e06e4312e14_0_3809280_data1~000.dir~000.dir~050.dir~005.dir~150.dir~FILE002.cdf has no journal, but it does exist in the cloud. This indicates that an overlapping syncWithJournal() call handled it properly.
======================================================================================

What does synchronizeWithJournal do and when it happens?



 Comments   
Comment by alexey vorovich (Inactive) [ 2022-03-16 ]

ben.thompson denis0x0D toddstoffel Could you please outline the status here? what is this blocked on ?

Also the original complaint is not clear to me

-does it hang ? if so , is there a reproduction of the hang
-if not and it is slower, then what is it being compared with ?

Comment by alexey vorovich (Inactive) [ 2022-03-16 ]

rom slack.. please keep essential conversation here

3:39 PM
ben.thompson I had allen setup the test cluster we were using before to retry reproducing it this afternoon as well as using some tooling todd has suggested for created extra wide large tables. i'll update if i find anything
white_check_mark
eyes
raised_hands

3:40 PM
Todd Stoffel did you set up a mockaroo schema?
3:40 PM
@ben.thompson
3:44 PM
I just renewed the account

Comment by Todd Stoffel (Inactive) [ 2022-04-07 ]

https://mariadb.com/docs/multi-node/columnstore/operations/load-data/cpimport/#efs-storage

Generated at Thu Feb 08 02:53:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.