[MCOL-4337] Controllernode must establish connection with his workernodes on its startup Created: 2020-10-07 Updated: 2020-11-20 Resolved: 2020-11-20 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | Build |
| Affects Version/s: | 1.5.3 |
| Fix Version/s: | 5.5.1 |
| Type: | New Feature | Priority: | Minor |
| Reporter: | Roman | Assignee: | Jose Rojas (Inactive) |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | stability | ||
| Issue Links: |
|
||||||||||||||||
| Sprint: | 2020-8 | ||||||||||||||||
| Description |
|
Controllernode now establish DBRM_Worker connections lazily so it doesn't wait for them to come up. This calls for additional startup check conditions in a cluster setups. Controllernode must wait for all WNs to come up before CN is ready to process requests. |
| Comments |
| Comment by Roman [ 2020-10-09 ] |
|
Plz review. |
| Comment by Roman [ 2020-10-15 ] |
|
4QA should be tested with 5.5 release. |
| Comment by Daniel Lee (Inactive) [ 2020-10-30 ] |
|
Builds tested: 5.5.1-1 Tested on both centos 8 and ubuntu18.04 1. I stopped mcs-controllernode and Macs-workernode@1 services In stead of using systemctl to start services, I did another round of tests by running load_brm and controller node in /usr/bin directly. controllernode started immediately. I noticed there is no DBRM_Controller:WorkerConnectionTiimeout entry in the Columnstore.xml file. I added the entry with a value of 30 and did another round of tests, controllernode still did not wait for workernode to start. |
| Comment by Roman [ 2020-11-10 ] |
|
toddstoffel I would like to suggest to change the mark to stability or something similar. |
| Comment by Roman [ 2020-11-10 ] |
|
dleeyh I need to confess that my test scenario is totally misguiding. JFYI The test you tried will work after |
| Comment by Daniel Lee (Inactive) [ 2020-11-10 ] |
|
Build tested: 5.5.1-1 (Drone) engine: 1ffca618dfba15d1edda4b21a2d9f9713d4f7262 Few issues 1. The error msg is in the warning.log file, not err.log file Nov 10 22:00:15 centos-8 controllernode[4810]: 15.905223 |0|0|0| D 29 CAL0000: DBRM Controller: Connected to DBRM_Worker1 2. After both controller and worker started, the cluster is in a system-not-ready state I logged into the MySQL client after both services were started MariaDB [mytest]> select count [centos8:root~]# systemctl status mariadb Nov 10 21:53:40 centos-8 mariadbd[3920]: 2020-11-10 21:53:40 0 [Note] Added new Master_info '' to hash table restarting the mariadb service did not resolve the issue, but restarting mariadb-columnstore did. |
| Comment by Roman [ 2020-11-11 ] |
|
dleeyh I need the exact steps you took b/c my list of actions didn't include starting or testing of the whole cluster so I'm unaware of what you are exactly doing. |
| Comment by Daniel Lee (Inactive) [ 2020-11-11 ] |
|
I did the steps you specified and tried to run a query. The cluster was in "not ready" state. Your change maybe doing what you want to do specifically, but you need to ensure the cluster is not broken. |
| Comment by Roman [ 2020-11-11 ] |
|
How did you start the cluster? |
| Comment by Daniel Lee (Inactive) [ 2020-11-11 ] |
|
The "not ready" issue occurred right after I performed the steps you specified. No start or restart was performed. I did restart the cluster later, attempting to recover the cluster systemctl restart mariadb |