[MCOL-5163] Increase the stability of writing processes: WriteEngineServer, DMLProc, DDLProc Created: 2022-07-19  Updated: 2022-10-25  Resolved: 2022-08-19

Status: Closed
Project: MariaDB ColumnStore
Component/s: DDLProc, DMLProc, writeengine
Affects Version/s: 6.4.1
Fix Version/s: 22.08.1

Type: New Feature Priority: Major
Reporter: Roman Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Duplicate
is duplicated by MCOL-4785 ROLLBACK of a long lasting DML left c... Closed
is duplicated by MCOL-4798 ExeMgr hit cpu , cluster in read only... Closed
Sprint: 2021-17

 Description   

As of 6.4.1 MCS uses systemd to handle processing start and stop routine. It also handles the processes hierarchy restarts, e.g. if mcs-primproc(contains both EM and PP) crashes systemd restarts mcs-writeengineserver unit. If WE is in the middle of a write operation this potentially left the system in an unusable state. If mcs-writeengineserver is restarted systemd restarts both mcs-dmlproc and mcs-ddlproc so if DMLProc was doing some changes it left failed transactions behind. It is tedious to clean these stuck txns from the cluster.
The suggested approach is to decouple pairs: mcs-primproc, mcs-writeengineserver and mcs-writeengineserver and mcs-dmlproc, mcs-writeengineserver and mcs-ddlproc systemd-wise.



 Comments   
Comment by Roman [ 2022-08-16 ]

4QA Try to crash ExeMgr or PP and it will force WE and DMLProc to restart even if they are in the middle of a txn. The current version doesn't restart WE or DMLProc if PP has crashed.

Comment by Daniel Lee (Inactive) [ 2022-08-19 ]

Build verified: 22.08-1 (#5324)

Also verified 6.x (#5262) since it was also checked into develop-6.

Generated at Thu Feb 08 02:55:49 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.