[MCOL-3917] DDLProc/DMLProc must return a meaningful error if a local/remote workernode/controllernode was restarted Created: 2020-04-02 Updated: 2023-10-25 Resolved: 2023-10-25 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | DDLProc, DMLProc |
| Affects Version/s: | None |
| Fix Version/s: | 23.10 |
| Type: | Task | Priority: | Major |
| Reporter: | Roman | Assignee: | Roman |
| Resolution: | Won't Fix | Votes: | 0 |
| Labels: | None | ||
| Epic Link: | Columnstore OAM replacement |
| Sprint: | 2021-2, 2021-3, 2021-4 |
| Description |
|
DMLProc, DDLProc don’t reestablish connections to WriteEngines if they fail so they won’t survie WriteEngine restart. DMLProc/DDLProc must survive this outage with failing their current operations before they reestablish their connections. The user must be notified about the operations failure. Update: Make sure columnstore services survive restarts of other services as well, including controllernode, workernode, primproc, and exemgr restarts. |
| Comments |
| Comment by Roman [ 2020-04-24 ] | ||||
|
We should extend the scope of this task to test all dependencies across services. | ||||
| Comment by Jose Rojas (Inactive) [ 2020-04-24 ] | ||||
|
You will find all changes in | ||||
| Comment by Roman [ 2020-09-02 ] | ||||
|
The changes doesn't allow services to really survive WriteEngine restarts. These are just a systemd workarounds that doesn't work in non-systemd environments. | ||||
| Comment by David Hall (Inactive) [ 2022-03-04 ] | ||||
|
We believe this works correctly. It needs to be properly tested. | ||||
| Comment by David Hall (Inactive) [ 2022-05-25 ] | ||||
|
QA: Please force crash of PrimProc and other process and bring them back up. See if DDLProc and DMLProc still function after the other processes come back up. | ||||
| Comment by Daniel Lee (Inactive) [ 2022-05-31 ] | ||||
|
Build tested: 6.4.1-1 (drone #4524) 1. Restart DDLProc, DMLProc, writeengine, StorageManager 2. Restart PrimProc
The 2nd try of DML statement worked 3. Restart ExeMgr 4. Restart workernode 5. Restart controllernode
| ||||
| Comment by Roman [ 2022-07-27 ] | ||||
|
The goals are quite feasible except the fact that crashed workernode/controllernode might result in a split-brain in how the cluster nodes see the extent map. |