[MCOL-353] with 6 PMs, cpimport has an long lag time Created: 2016-10-10 Updated: 2017-03-07 Resolved: 2017-03-01 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | cpimport |
| Affects Version/s: | 1.0.1 |
| Fix Version/s: | 1.1.0 |
| Type: | Bug | Priority: | Major |
| Reporter: | David Hall (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Environment: |
CentOS |
||
| Attachments: |
|
||||||||||||||||
| Issue Links: |
|
||||||||||||||||
| Sprint: | 2017-01, 2017-2, 2017-3, 2017-4, 2017-5 | ||||||||||||||||
| Description |
|
Reported by a customer of InfiniDB. When cpimport is run, it reports complete rather quickly (most times) – 1 -2 seconds. Then there is a 19-30 second delay before it completes. This happens with LDI and cpimport direct. During the wait, we see the following which may or may not be related or significant: This doesn't appear on a single server system. |
| Comments |
| Comment by Dipti Joshi (Inactive) [ 2016-11-11 ] | |||||||
|
What cpimport mode is this happening ? | |||||||
| Comment by way [ 2016-11-15 ] | |||||||
|
Hi, I'am the customer who reported this issue, cpimport is used in default mode : mode 1 - rows will be loaded in a distributed manner across PMs.
So 1 ligne I have to wait near 20 secondes, as you can see : (I cut useless) | |||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-01-06 ] | |||||||
|
Can someone please provide the full debug log at the time this occurs so we can look into a way of reproducing it? | |||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-01-16 ] | |||||||
|
Reproduced the problem. Attached a PMP output during the delay. Appears to be the message queue waiting for data and timing out. Runtime data:
| |||||||
| Comment by Andrew Hutchings (Inactive) [ 2017-01-17 ] | |||||||
|
At the end of a cpimport there is a read timeout added which is 1 sec * PM count on each connection. There is then a race where we could (very likely) enter a socket poll with that timeout before we check to see if the connection has actually ended. This accumulates at the end of the run due to the serial thread joining which again is a race between thread joining and polling. Maximum delay would be in the region of 'number of PMs' squared. There is no obvious reason for the delay to be there since the loop will re-enter the recv() on the default 20ms so the extra timeout only appears to save a little CPU time on the loops at the end. This fix removes the extra timeout. This has been tested with a full speed VirtualBox network and an artificial 500ms delay added using 'tc qdisc add dev enp0s3 root netem delay 500ms' to make sure there are no obvious side-effects. | |||||||
| Comment by David Hall (Inactive) [ 2017-01-17 ] | |||||||
|
A one time test with multiple PMs to see if the delay is gone should suffice. Regression test should catch the unlikely scenario that a problem was introduced. | |||||||
| Comment by Daniel Lee (Inactive) [ 2017-03-01 ] | |||||||
|
Build tested: Github source [root@localhost columnstore]# cd mariadb-columnstore-server/ Merge pull request #31 from jbfavre/fix_deb_package_dependency [root@localhost mariadb-columnstore-server]# cd mariadb-columnstore-engine/ change the check for prompt back to the previous code Verified on a 1UM4PM stack. [root@localhost ~]# date /usr/local/mariadb/columnstore/bin/cpimport real 0m12.622s |