[MCOL-1408] Columnstore table unable to accept writes after thousands of commits via Bulk SDK Created: 2018-05-12 Updated: 2023-10-26 Resolved: 2018-06-07 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | None |
| Affects Version/s: | 1.1.4 |
| Fix Version/s: | 1.1.5 |
| Type: | Bug | Priority: | Critical |
| Reporter: | Austin Rutherford (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Sprint: | 2018-10, 2018-11, 2018-12 | ||||||||
| Description |
|
When running the attached data generator and writing via Bulk Write SDK Columnstore it locks up around 9500 commits and puts the table into a state where no additional writes can occur. (Even if you clear tablelocks and/or restart Columnstore). The only way identified to allow writes again is to "truncate table iot" and then restart ColumnStore. This is very reproducible. The attached python program just needs to run for about 3-4 hours before this occurs. The python program is a bit obnoxious and does single row commits, but the actual use case would be a higher velocity of rows (10,000-100,000/second) that are committed at same frequency. If this can't do single row commits, it would not be able to do higher row counts either. CREATE TABLE `iot` ( |
| Comments |
| Comment by David Thompson (Inactive) [ 2018-05-13 ] | ||
|
can reproduce, for me it took a lot more iterations: debug.log has: ExtentMap::setLocalHWM(): new HWM is past the end of the file for OID 11880; partition 0; segment 0; HWM 16384 editem: I can clear the table lock but then re-running hits the HWM error straight away. | ||
| Comment by David Thompson (Inactive) [ 2018-05-13 ] | ||
|
in my case writing the 16384th value is obviously a much more 'special' value, 9500ish is less special if that is repeatable for you austin? | ||
| Comment by David Thompson (Inactive) [ 2018-05-14 ] | ||
|
attached what i had in my terminal after adding driver.setDebug(true) to enable debug outptut. | ||
| Comment by Andrew Hutchings (Inactive) [ 2018-05-14 ] | ||
|
For my notes: I think this might be related to | ||
| Comment by Andrew Hutchings (Inactive) [ 2018-05-14 ] | ||
|
I've reproduced this using a C API based test. Looks like the HWM on the second extent in a segment is miscalculated when an insert rolls over an extent. I can probably produce a much faster test case. | ||
| Comment by Andrew Hutchings (Inactive) [ 2018-05-15 ] | ||
|
Status so far:
Other related errors I'm hitting:
Best guess is the segment file isn't growing as required when it needs to use the second extent of a segment. In particular this happens with dictionary columns. When removing the dictionary column and disabling compression the tests pass. But tests using compressed columns also fail. | ||
| Comment by Andrew Hutchings (Inactive) [ 2018-05-29 ] | ||
|
OK, problem is allocRowId() is calculating the start HWM for a new extent based on the first column, not the smallest column. So the start HWM should be 4096 for the second extent in a segment (since we should always use the smallest column for HWM calculations) but instead it ends up being 8192 for that column. This is then doubled-up for the 8-byte columns so it ends up being a start HWM of 16384 for those columns. | ||
| Comment by Andrew Hutchings (Inactive) [ 2018-05-29 ] | ||
|
Pull request for engine and API (the API pull request is to add a test). For QA: A new test is added to the API regression suite, it requires the pull request from engine to pass. (warning: this test can take 10-15 minutes to execute) | ||
| Comment by Daniel Lee (Inactive) [ 2018-06-05 ] | ||
|
Build tested: 1.1.5-1 /root/columnstore/mariadb-columnstore-server Merge pull request #118 from mariadb-corporation/ Merge MariaDB 10.2.15 into develop-1.1 /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine Merge pull request #488 from mariadb-corporation/ Mcol 1370 I tried the api test on 1.1.5-1 on Centos 7 and got: I executed the api test for [ FAILED ] mcol1408.BRM (351 ms) [----------] Global test environment tear-down 2 FAILED TESTS I tried reproduced the reporter's test case on 1.1.4-1. the script stopped/hanged after 4095 commits: . The WriteEngineServ process was cranking at 100% CPU. PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND The same test on 1.1.5-1 did past 8250 commits. | ||
| Comment by Daniel Lee (Inactive) [ 2018-06-05 ] | ||
|
Reopen per my last comment. | ||
| Comment by Andrew Hutchings (Inactive) [ 2018-06-05 ] | ||
|
With the CentOS 7 test the LBID/HWM for the second extent in the segment doesn't appear to have been committed causing the header read error (extent stuck in "Updating" state). My guess is some kind of de-duplication not taking into account the offset. | ||
| Comment by Andrew Hutchings (Inactive) [ 2018-06-06 ] | ||
|
Pull request in engine and API to fix more HWM issues found in CentOS 7. Engine pull request fixes:
API pull request fixes a duplicate extent issue when extents from the same segment report HWMs (therefore the wrong HWM can be committed) For QA: API test should pass with these pull requests. | ||
| Comment by Daniel Lee (Inactive) [ 2018-06-07 ] | ||
|
Build verified: 1.1.5-1 source /root/columnstore/mariadb-columnstore-server Merge pull request #118 from mariadb-corporation/ Merge MariaDB 10.2.15 into develop-1.1 /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine Merge pull request #491 from mariadb-corporation/ Executed the following tests: 1. Report's test case, ran beyond 16383 commits Got boundary test working again today. |