[MCOL-3296] ctrl+c sometimes leaves DMLProc in bad state Created: 2019-05-07  Updated: 2020-08-25  Resolved: 2019-05-20

Status: Closed
Project: MariaDB ColumnStore
Component/s: DMLProc
Affects Version/s: 1.1.7
Fix Version/s: 1.1.0, 1.2.4

Type: Bug Priority: Critical
Reporter: David Hall (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Sprint: 2019-05

 Description   

PackageHandler::synchTableAccess() is a kludge to serialize DML against a single table, but allow parallel processing against different tables. This kludge is required because the vss can't accept transaction ID's out of numerical order on a single table, and asserts if they're not in order. If there are multiple transactions running on multiple threads, there's no guarantee of order.

Something occasionally breaks if ctrl+c is hit while processing DML. The internal tables and synch conditions of synchTableAccess() can get out of whack. This has caused two catastrophic events. In one case, DMLProc segfaulted while accessing the synchro map (27656), and in other cases, DML statements block indefinitely.



 Comments   
Comment by David Hall (Inactive) [ 2019-05-10 ]

When CTRL+C is hit, the query is removed from the queue that keeps track of the queries for this table and the query is marked as cancelled. The query, running in a different thread, begins cleaning up stuff and removes the top item (should be the running item) from the queue. Of course he's not in there so the queue gets corrupted. Different, but similar, breakage occurs when the query is waiting.

A query that blocks doesn't log anything yet, as it's blocked before the normal logging occurs. This led to significant confusion trying to analyze this. Added logging when a query blocks here.

Comment by David Hall (Inactive) [ 2019-05-10 ]

For QA:
Before fix: Start any update or insert (not using cpimport) and hit CTRL+C during execution. It should work fine. However, after this, dml on this table is likely to never complete. CTRL+C this and it might start working again, might not. Start multiple dml against the same table and ctrl+c out of not the first one. This may eventually cause problems.

After fix: All the above should work as expected: Which ever dml is cancelled via CTRL+C is cancelled and all other queries continue (in order). Check the debug log to see log lines showing when a query is blocked.

Comment by Daniel Lee (Inactive) [ 2019-05-16 ]

Build tested: 1.1.7-1, 1.2.3-1

Finally reproduced the issue in the above releases.

At first, failed many times to reproduce it using smaller table. I eventually reproduced it when updating a 10gb dbt3 lineitem table.

Waiting for a nightly build with the fix.

Comment by Daniel Lee (Inactive) [ 2019-05-17 ]

Build verified: 1.1.8-1 nightly

server commit:
01cc1ef
engine commit:
0af6994

Still waiting for 1.2.4-1

Comment by Daniel Lee (Inactive) [ 2019-05-20 ]

Build tested: 1.2.4-1 GitHub source

Made a build with the latest source and verified the fixed.

/root/columnstore/mariadb-columnstore-server
commit e3d99393916f0231db02564dd5e316e803bdbbe9
Author: Andrew Hutchings <andrew@linuxjedi.co.uk>
Date: Mon Jan 14 16:20:01 2019 +0000

Disable Travis triggering on pull requests

/root/columnstore/mariadb-columnstore-server/mariadb-columnstore-engine
commit 122038e36a8a4d4bd632eb137bde31a09a88d2b9
Merge: 8afc3f8 ea2ff9c
Author: Roman Nozdrin <drrtuy@gmail.com>
Date: Mon May 20 13:50:36 2019 +0300

Merge pull request #767 from mariadb-corporation/develop-1.2-merge-up-20190517

Merge develop-1.1 into develop-1.2

Generated at Thu Feb 08 02:41:40 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.