[MCOL-1624] mcsadmin getSystemStatus and mcsSystemReady() should not return active until system is fully up Created: 2018-08-03 Updated: 2020-03-30 Resolved: 2020-03-30 |
|
| Status: | Closed |
| Project: | MariaDB ColumnStore |
| Component/s: | MDB Plugin |
| Affects Version/s: | None |
| Fix Version/s: | Icebox |
| Type: | Bug | Priority: | Critical |
| Reporter: | David Thompson (Inactive) | Assignee: | Andrew Hutchings (Inactive) |
| Resolution: | Won't Do | Votes: | 1 |
| Labels: | None | ||
| Sprint: | 2018-20, 2018-21 |
| Description |
|
If you have a script that is checking for the cluster to be active in parallel to a postConfigure then mcsadmin getSystemStatus will show the system as active before the system catalog has been created. If you try to create a table in this window you'll get error: My proposal is to not update the system status to active until all postConfigure steps (including replication setup) are complete. I think it would be a good idea to add a new mcsadmin status such as 'PostConfig' for the interim state between mysqld up and the cluster fully configured. My workaround for this is to create a columnstore table and loop until it works. |
| Comments |
| Comment by markus makela [ 2018-11-15 ] |
|
mcsSystemReady() also returns 1 even if ExeMgr is dead. To reproduce this, run watch pkill -9 ExeMgr in one window and then run |
| Comment by Andrew Hutchings (Inactive) [ 2018-11-22 ] |
|
Hopefully this patch does what you require. It only applies to mcsSystemReady(). getSystemStatus doesn't appear to do anything like what is required. This patch now tests whether or not system catalogue is installed and if it can run a quick system catalogue query through ExeMgr and PrimProc. I only have a limited test bed so let me know how this goes. |
| Comment by David Thompson (Inactive) [ 2018-11-24 ] |
|
This improves things in that you can now rely on mcsSystemReady to create tables etc for docker containers. What is still not done is the additional scope to verify that postConfig is complete (and hence things like replication setup) but i'll move that out to a seperate jira. |
| Comment by David Thompson (Inactive) [ 2018-11-24 ] |
|
I should have done more thorough testing review before merging but this only seems to correctly return 1 when setup on:
running on a seperate deployment it will only ever return 0. I had a mistake in my shell script logic earlier. |
| Comment by Ralf Gebhardt [ 2019-12-27 ] |
|
LinuxJedi, we have seen "ERROR 1815 (HY000) at line 1: Internal error: CAL0009: Error while calling getSysCatDBRoot" also in the simple MTR test for ColumnStore on Azure when creating the table (for SLES 12 but I do not think that this is relevant). |