[MCOL-3296] ctrl+c sometimes leaves DMLProc in bad state Created: 2019-05-07 Updated: 2020-08-25 Resolved: 2019-05-20 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | DMLProc |
| Affects Version/s: | 1.1.7 |
| Fix Version/s: | 1.1.0, 1.2.4 |
| Type: | Bug | Priority: | Critical |
| Reporter: | David Hall (Inactive) | Assignee: | Daniel Lee (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Sprint: | 2019-05 |
| Description |
|
PackageHandler::synchTableAccess() is a kludge to serialize DML against a single table, but allow parallel processing against different tables. This kludge is required because the vss can't accept transaction ID's out of numerical order on a single table, and asserts if they're not in order. If there are multiple transactions running on multiple threads, there's no guarantee of order. Something occasionally breaks if ctrl+c is hit while processing DML. The internal tables and synch conditions of synchTableAccess() can get out of whack. This has caused two catastrophic events. In one case, DMLProc segfaulted while accessing the synchro map (27656), and in other cases, DML statements block indefinitely. |
| Comments |
| Comment by David Hall (Inactive) [ 2019-05-10 ] |
|
When CTRL+C is hit, the query is removed from the queue that keeps track of the queries for this table and the query is marked as cancelled. The query, running in a different thread, begins cleaning up stuff and removes the top item (should be the running item) from the queue. Of course he's not in there so the queue gets corrupted. Different, but similar, breakage occurs when the query is waiting. A query that blocks doesn't log anything yet, as it's blocked before the normal logging occurs. This led to significant confusion trying to analyze this. Added logging when a query blocks here. |
| Comment by David Hall (Inactive) [ 2019-05-10 ] |
|
For QA: After fix: All the above should work as expected: Which ever dml is cancelled via CTRL+C is cancelled and all other queries continue (in order). Check the debug log to see log lines showing when a query is blocked. |
| Comment by Daniel Lee (Inactive) [ 2019-05-16 ] |
|
Build tested: 1.1.7-1, 1.2.3-1 Finally reproduced the issue in the above releases. At first, failed many times to reproduce it using smaller table. I eventually reproduced it when updating a 10gb dbt3 lineitem table. Waiting for a nightly build with the fix. |
| Comment by Daniel Lee (Inactive) [ 2019-05-17 ] |
|
Build verified: 1.1.8-1 nightly server commit: Still waiting for 1.2.4-1 |
| Comment by Daniel Lee (Inactive) [ 2019-05-20 ] |
|
Build tested: 1.2.4-1 GitHub source Made a build with the latest source and verified the fixed. /root/columnstore/mariadb-columnstore-server Disable Travis triggering on pull requests /root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine Merge pull request #767 from mariadb-corporation/develop-1.2-merge-up-20190517 Merge develop-1.1 into develop-1.2 |