[MCOL-4775] DMLproc is unable to complete a rollback and goes into a crash loop Created: 2021-06-24  Updated: 2023-11-17  Resolved: 2022-06-24

Status: Closed
Project: MariaDB ColumnStore
Component/s: DMLProc
Affects Version/s: 5.5.2
Fix Version/s: 6.4.1

Type: Bug Priority: Major
Reporter: Rick Pizzi Assignee: Daniel Lee (Inactive)
Resolution: Duplicate Votes: 4
Labels: None

Issue Links:
Duplicate
duplicates MCOL-5105 Reduced systemd timeouts results in c... Closed
Problem/Incident
is caused by MCOL-5021 Implement an auxiliary (hidden) colum... Closed
Relates
relates to MCOL-4867 DMLProc failed to rollback a txn Closed
relates to MCOL-4785 ROLLBACK of a long lasting DML left c... Closed
Sprint: 2021-9, 2021-10, 2021-11, 2021-12, 2021-13, 2021-14, 2021-15, 2021-16, 2021-17

 Description   

A customer experienced an issue where on cluster restart, DMLproc was unable to complete the rollback of 3 tables.
It would try for a while and then die, then a new DMLproc would be spawned, over and over in a wash-rinse-repeat fashion:

Jun 23 11:08:27 cluster-csx2 DMLProc[1123]: 27.889048 |0|0|0| I 20 CAL0002: DMLProc will rollback 0 tables.
Jun 23 11:08:28 cluster-csx2 DMLProc[1123]: 28.280692 |0|0|0| I 20 CAL0002: DMLProc will rollback 3 transactions.
Jun 23 11:08:43 cluster-csx2 DMLProc[1123]: 43.466989 |0|0|0| I 20 CAL0002: DMLProc will roll back transaction 24608
Jun 23 11:09:58 cluster-csx2 DMLProc[1247]: 58.340083 |0|0|0| I 20 CAL0002: DMLProc starts rollbackAll.
Jun 23 11:09:58 cluster-csx2 DMLProc[1247]: 58.386158 |0|0|0| I 20 CAL0002: DMLProc will rollback 0 tables.
Jun 23 11:09:58 cluster-csx2 DMLProc[1247]: 58.770217 |0|0|0| I 20 CAL0002: DMLProc will rollback 3 transactions.
Jun 23 11:10:02 cluster-csx2 DMLProc[1247]: 02.689347 |0|0|0| I 20 CAL0002: DMLProc will roll back transaction 24608
Jun 23 11:11:28 cluster-csx2 DMLProc[1452]: 28.837884 |0|0|0| I 20 CAL0002: DMLProc starts rollbackAll.
Jun 23 11:11:28 cluster-csx2 DMLProc[1452]: 28.891178 |0|0|0| I 20 CAL0002: DMLProc will rollback 0 tables.
Jun 23 11:11:29 cluster-csx2 DMLProc[1452]: 29.273671 |0|0|0| I 20 CAL0002: DMLProc will rollback 3 transactions.

You can see in the log that the process ID of DMLProc changes, but there are no traces in the log about it dying, no trace files, nothing.



 Comments   
Comment by Roman [ 2021-06-26 ]

allen.herrera You describe unrelated but important case that should be reproduced and filed if possible.

Comment by Roman [ 2021-09-17 ]

Agree.

Comment by David Hall (Inactive) [ 2022-03-04 ]

Long running Rollbacks can take more than the systemd startup timeout.
When DMLProc starts, it doesn't report to systemd until the rollback is complete. If that takes too long, then systemd times out and restarts the process.

Comment by David Hall (Inactive) [ 2022-05-13 ]

We'll wait for the Pulumi project, which sets up a VM cluster for testing. Then run a test that causes failure and long rollback upon startup .

Comment by David Hall (Inactive) [ 2022-06-24 ]

Duplicate of 5105

Generated at Thu Feb 08 02:52:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.