[MCOL-4337] Controllernode must establish connection with his workernodes on its startup - Jira

Details

Type: New Feature
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.5.3
Fix Version/s: 5.5.1
Component/s: Build
Labels:
- stability

Sprint:
2020-8

Description

Controllernode now establish DBRM_Worker connections lazily so it doesn't wait for them to come up. This calls for additional startup check conditions in a cluster setups. Controllernode must wait for all WNs to come up before CN is ready to process requests.

Attachments

Issue Links

is caused by

MCOL-3836 Columnstore OAM replacement

Closed

relates to

MCOL-3836 Columnstore OAM replacement

Closed

Activity

Ascending order - Click to sort in descending order

View 5 older comments

Daniel Lee (Inactive) added a comment - 2020-11-10 22:12

Build tested: 5.5.1-1 (Drone)

engine: 1ffca618dfba15d1edda4b21a2d9f9713d4f7262
server: 10b2d5726fa21675362596ff4f52f2eca748bdc9
buildNo: 1097

Few issues

1. The error msg is in the warning.log file, not err.log file

Nov 10 22:00:15 centos-8 controllernode[4810]: 15.905223 |0|0|0| D 29 CAL0000: DBRM Controller: Connected to DBRM_Worker1

2. After both controller and worker started, the cluster is in a system-not-ready state

I logged into the MySQL client after both services were started

MariaDB [mytest]> select count from lineitem;
ERROR 1815 (HY000): Internal error: The system is not yet ready to accept queries

[centos8:root~]# systemctl status mariadb
● mariadb.service - MariaDB 10.5.8 database server
Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/mariadb.service.d
└─migrated-from-my.cnf-settings.conf
Active: active (running) since Tue 2020-11-10 21:53:40 UTC; 7min ago
Docs: man:mariadbd(8)
https://mariadb.com/kb/en/library/systemd/
Main PID: 3920 (mariadbd)
Status: "Taking your SQL requests now..."
Tasks: 12 (limit: 50823)
Memory: 799.2M
CGroup: /system.slice/mariadb.service
└─3920 /usr/sbin/mariadbd

Nov 10 21:53:40 centos-8 mariadbd[3920]: 2020-11-10 21:53:40 0 [Note] Added new Master_info '' to hash table
Nov 10 21:53:40 centos-8 mariadbd[3920]: 2020-11-10 21:53:40 0 [Note] /usr/sbin/mariadbd: ready for connections.
Nov 10 21:53:40 centos-8 mariadbd[3920]: Version: '10.5.8-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
Nov 10 21:53:40 centos-8 systemd[1]: Started MariaDB 10.5.8 database server.
Nov 10 21:54:40 centos-8 dbcon[3920]: 40.145388 |4|0|0| D 24 CAL0001: Start SQL statement: load data infile "/data/qa/source/dbt3/1g/lineitem.tbl" in>
Nov 10 21:55:58 centos-8 writeenginesplit[4425]: 58.648139 |0|0|0| I 33 CAL0000: Send EOD message to All PMs
Nov 10 21:55:59 centos-8 writeenginesplit[4425]: 59.023209 |0|0|0| I 33 CAL0098: Received a Cpimport Pass from PM1.
Nov 10 21:55:59 centos-8 writeenginesplit[4425]: 59.024888 |0|0|0| I 33 CAL0000: Released Table Lock
Nov 10 21:55:59 centos-8 dbcon[3920]: 59.935069 |4|0|0| D 24 CAL0001: End SQL statement
Nov 10 22:01:16 centos-8 mariadbd[3920]: DBRM::send_recv: controller node closed the connection

restarting the mariadb service did not resolve the issue, but restarting mariadb-columnstore did.

Daniel Lee (Inactive) added a comment - 2020-11-10 22:12 Build tested: 5.5.1-1 (Drone) engine: 1ffca618dfba15d1edda4b21a2d9f9713d4f7262 server: 10b2d5726fa21675362596ff4f52f2eca748bdc9 buildNo: 1097 Few issues 1. The error msg is in the warning.log file, not err.log file Nov 10 22:00:15 centos-8 controllernode [4810] : 15.905223 |0|0|0| D 29 CAL0000: DBRM Controller: Connected to DBRM_Worker1 2. After both controller and worker started, the cluster is in a system-not-ready state I logged into the MySQL client after both services were started MariaDB [mytest] > select count from lineitem; ERROR 1815 (HY000): Internal error: The system is not yet ready to accept queries [centos8:root~] # systemctl status mariadb ● mariadb.service - MariaDB 10.5.8 database server Loaded: loaded (/usr/lib/systemd/system/mariadb.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/mariadb.service.d └─migrated-from-my.cnf-settings.conf Active: active (running) since Tue 2020-11-10 21:53:40 UTC; 7min ago Docs: man:mariadbd(8) https://mariadb.com/kb/en/library/systemd/ Main PID: 3920 (mariadbd) Status: "Taking your SQL requests now..." Tasks: 12 (limit: 50823) Memory: 799.2M CGroup: /system.slice/mariadb.service └─3920 /usr/sbin/mariadbd Nov 10 21:53:40 centos-8 mariadbd [3920] : 2020-11-10 21:53:40 0 [Note] Added new Master_info '' to hash table Nov 10 21:53:40 centos-8 mariadbd [3920] : 2020-11-10 21:53:40 0 [Note] /usr/sbin/mariadbd: ready for connections. Nov 10 21:53:40 centos-8 mariadbd [3920] : Version: '10.5.8-MariaDB' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server Nov 10 21:53:40 centos-8 systemd [1] : Started MariaDB 10.5.8 database server. Nov 10 21:54:40 centos-8 dbcon [3920] : 40.145388 |4|0|0| D 24 CAL0001: Start SQL statement: load data infile "/data/qa/source/dbt3/1g/lineitem.tbl" in> Nov 10 21:55:58 centos-8 writeenginesplit [4425] : 58.648139 |0|0|0| I 33 CAL0000: Send EOD message to All PMs Nov 10 21:55:59 centos-8 writeenginesplit [4425] : 59.023209 |0|0|0| I 33 CAL0098: Received a Cpimport Pass from PM1. Nov 10 21:55:59 centos-8 writeenginesplit [4425] : 59.024888 |0|0|0| I 33 CAL0000: Released Table Lock Nov 10 21:55:59 centos-8 dbcon [3920] : 59.935069 |4|0|0| D 24 CAL0001: End SQL statement Nov 10 22:01:16 centos-8 mariadbd [3920] : DBRM::send_recv: controller node closed the connection restarting the mariadb service did not resolve the issue, but restarting mariadb-columnstore did.

Roman added a comment - 2020-11-11 15:42

dleeyh I need the exact steps you took b/c my list of actions didn't include starting or testing of the whole cluster so I'm unaware of what you are exactly doing.

Roman added a comment - 2020-11-11 15:42 dleeyh I need the exact steps you took b/c my list of actions didn't include starting or testing of the whole cluster so I'm unaware of what you are exactly doing.

Daniel Lee (Inactive) added a comment - 2020-11-11 15:47

I did the steps you specified and tried to run a query. The cluster was in "not ready" state.

Your change maybe doing what you want to do specifically, but you need to ensure the cluster is not broken.

Daniel Lee (Inactive) added a comment - 2020-11-11 15:47 I did the steps you specified and tried to run a query. The cluster was in "not ready" state. Your change maybe doing what you want to do specifically, but you need to ensure the cluster is not broken.

Roman added a comment - 2020-11-11 15:55

How did you start the cluster?

Roman added a comment - 2020-11-11 15:55 How did you start the cluster?

Daniel Lee (Inactive) added a comment - 2020-11-11 16:00

The "not ready" issue occurred right after I performed the steps you specified. No start or restart was performed.

I did restart the cluster later, attempting to recover the cluster
It was a single node

systemctl restart mariadb
systemctl restart mariadb-columnstore

Daniel Lee (Inactive) added a comment - 2020-11-11 16:00 The "not ready" issue occurred right after I performed the steps you specified. No start or restart was performed. I did restart the cluster later, attempting to recover the cluster It was a single node systemctl restart mariadb systemctl restart mariadb-columnstore

MariaDB ColumnStore

Controllernode must establish connection with his workernodes on its startup

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration