[MCOL-4015] ExeMgr must re-establish its PrimProc connections. Created: 2020-05-22 Updated: 2021-01-25 Resolved: 2020-07-14 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | ExeMgr |
| Affects Version/s: | None |
| Fix Version/s: | 1.5.3 |
| Type: | Task | Priority: | Major |
| Reporter: | Roman | Assignee: | Gregory Dorman (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | None | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
ExeMgr now calls for HUP signal to re-establish its connections to PrimProcs. |
| Comments |
| Comment by Roman [ 2020-05-22 ] |
|
Plz review. |
| Comment by Roman [ 2020-05-27 ] |
|
Whilst looking into DEC code I found out that our whole ExeMgr to PrimProc State Machine is fragile, e.g. I was tinkering with iptables blocks trying to emulate custom network outages. There was a case when ExeMgr got into a infinity blocking read from the socket awaiting for magic whilst the network traffic was blackholed. I didn't come up with an appropriate solution. |
| Comment by Patrick LeBlanc (Inactive) [ 2020-05-27 ] |
|
I'm not entirely sure what the patch was for or if it corrects a problem QA can replicate & test. Roman could you advise Daniel how to test this? |
| Comment by Daniel Lee (Inactive) [ 2020-06-02 ] |
|
Yes, instructions or any info would be great. Thx |
| Comment by Roman [ 2020-06-03 ] |
|
Ehhm. It doesn't look easily testable now. Here is the reciept though.
|
| Comment by Gregory Dorman (Inactive) [ 2020-06-19 ] |
|
drrtuy, i am afraid Daniel cannot do it in that manner. We will first get Jose's procs. I will attempt to do it myself, though. Are the package in build 153 ready, do they include this thing? |
| Comment by Gregory Dorman (Inactive) [ 2020-06-21 ] |
|
OK, I had a limited success with this - not exactly as written, but seems close (SELECTS work after PM2 shutdown, but CRUDs don't, and start working upon restart). To make it work I had to inject poor man's synchronization (PM1 and PM2 waiting on each other exemgrs, and on PM1's controllernode (8616) to open. Columnstore.xml was hand-crafted (compring one node to two nodes). I did the end-to-end: docker build, start containers, push two-node .xml to both, push appropriate modules to both; called start-columnstore on both. Lowlights: I still cannot make it start up for local storage for some reason (this is S3). Also, something going funky on initialization, I had to restart both in order to make them talk to each other. But good so far. |