[MCOL-1624] mcsadmin getSystemStatus and mcsSystemReady() should not return active until system is fully up Created: 2018-08-03  Updated: 2020-03-30  Resolved: 2020-03-30

Status: Closed
Project: MariaDB ColumnStore
Component/s: MDB Plugin
Affects Version/s: None
Fix Version/s: Icebox

Type: Bug Priority: Critical
Reporter: David Thompson (Inactive) Assignee: Andrew Hutchings (Inactive)
Resolution: Won't Do Votes: 1
Labels: None

Sprint: 2018-20, 2018-21

 Description   

If you have a script that is checking for the cluster to be active in parallel to a postConfigure then mcsadmin getSystemStatus will show the system as active before the system catalog has been created. If you try to create a table in this window you'll get error:
ERROR 1815 (HY000) at line 1: Internal error: CAL0009: Error while calling getSysCatDBRoot

My proposal is to not update the system status to active until all postConfigure steps (including replication setup) are complete. I think it would be a good idea to add a new mcsadmin status such as 'PostConfig' for the interim state between mysqld up and the cluster fully configured.

My workaround for this is to create a columnstore table and loop until it works.



 Comments   
Comment by markus makela [ 2018-11-15 ]

mcsSystemReady() also returns 1 even if ExeMgr is dead. To reproduce this, run watch pkill -9 ExeMgr in one window and then run
SELECT mcsSystemReady() in another.

Comment by Andrew Hutchings (Inactive) [ 2018-11-22 ]

Hopefully this patch does what you require. It only applies to mcsSystemReady(). getSystemStatus doesn't appear to do anything like what is required.

This patch now tests whether or not system catalogue is installed and if it can run a quick system catalogue query through ExeMgr and PrimProc.

I only have a limited test bed so let me know how this goes.

Comment by David Thompson (Inactive) [ 2018-11-24 ]

This improves things in that you can now rely on mcsSystemReady to create tables etc for docker containers. What is still not done is the additional scope to verify that postConfig is complete (and hence things like replication setup) but i'll move that out to a seperate jira.

Comment by David Thompson (Inactive) [ 2018-11-24 ]

I should have done more thorough testing review before merging but this only seems to correctly return 1 when setup on:

  • single node
  • pm1 on combined deployment (pm2 returns 0)

running on a seperate deployment it will only ever return 0.

I had a mistake in my shell script logic earlier.

Comment by Ralf Gebhardt [ 2019-12-27 ]

LinuxJedi, we have seen "ERROR 1815 (HY000) at line 1: Internal error: CAL0009: Error while calling getSysCatDBRoot" also in the simple MTR test for ColumnStore on Azure when creating the table (for SLES 12 but I do not think that this is relevant).
And the way the pipeline is set up also fits to the description.

Generated at Thu Feb 08 02:30:09 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.