[MXS-4259] warning: [xpandmon] 'late' is an unknown sub-state for a Xpand node Created: 2022-08-25  Updated: 2022-08-29  Resolved: 2022-08-29

Status: Closed
Project: MariaDB MaxScale
Component/s: xpandmon
Affects Version/s: 6.4.1
Fix Version/s: 6.4.3

Type: Bug Priority: Major
Reporter: Manjinder Nijjar Assignee: Johan Wikman
Resolution: Fixed Votes: 0
Labels: None

Sprint: MXS-SPRINT-165

 Description   

In upcoming Xpand release 6.1 (code name: Isolation Peak), we are making a change to node state. Objective of this feature is to reduce number of group changes when a node is added to the cluster or an existing node rejoins a cluster after recovering from a downtime.

When a new node is added to the cluster (or when it rejoins), (in most cases but not all) there will not be any group change. A newly added node will join the cluster and will be marked as "Late" until next group change (i.e. hard group change). On a next group change, this node will move from 'Late' to 'Normal' sub-state. While in 'Late' state, its available for all incoming transactions. Late nodes do not participate in some aspects of the system (e.g. lock manager, sequence manager) however that should not affect Maxscale.

This new sub state is unknown to maxscale and hence is unable to make sense of this and therefore throws an error.

I see following in logs when I add a new node to already existing cluster. New node appears fine and accepts workload via maxscale without any issues.

2022-08-25 19:50:01   warning: [xpandmon] 'late' is an unknown sub-state for a Xpand node.
2022-08-25 19:50:01   notice : Created server '@@Backend-Monitor:node-4' at 10.2.13.99:3306
2022-08-25 19:50:01   info   : [xpandmon] Created volatile server found after 2 lookup attempts and a total sleep time of 2 milliseconds.
2022-08-25 19:50:01   info   : [xpandmon] Updated Xpand node in bookkeeping: 4, '10.2.13.99', 3306, 3581.
2022-08-25 19:50:11   warning: [xpandmon] 'late' is an unknown sub-state for a Xpand node.

On Xpand:

[root@karma060 ~]# /opt/clustrix/bin/clx stat
Cluster Name:    clca72aceead5c1f26
Cluster Version: Xpand-mainline1-17821
Cluster Status:   OK
Cluster Size:    4 nodes - 16 CPUs per Node
Current Node:    karma060 - nid 1
 
nid |  Hostname | Status | Substate |  IP Address  | TPS |      Used     |  Total
----+-----------+--------+----------+--------------+-----+---------------+--------
  1 |  karma060 |    OK  |   normal |  10.2.15.149 |  35 |  1.1G (0.14%) |  761.9G
  2 |  karma090 |    OK  |   normal |  10.2.14.208 |   0 |  1.2G (0.16%) |  761.9G
  3 |  karma065 |    OK  |   normal |  10.2.14.119 |   0 |  1.2G (0.15%) |  761.9G
  4 |  karma199 |    OK  |     late |   10.2.13.99 |   0 |  1.3G (0.17%) |  761.9G
----+-----------+--------+----------+--------------+-----+---------------+--------
                                                      35 |  4.7G (0.15%) |    3.0T

On Maxscale:

[root@karma075 ~]# maxctrl list servers
┌──────────────────────────┬─────────────────────────────┬──────┬─────────────┬─────────────────┬──────┐
│ Server                   │ Address                     │ Port │ Connections │ State           │ GTID │
├──────────────────────────┼─────────────────────────────┼──────┼─────────────┼─────────────────┼──────┤
│ xpand1                   │ karma060.colo.sproutsys.com │ 3306 │ 0           │ Master, Running │      │
├──────────────────────────┼─────────────────────────────┼──────┼─────────────┼─────────────────┼──────┤
│ xpand2                   │ karma065.colo.sproutsys.com │ 3306 │ 0           │ Master, Running │      │
├──────────────────────────┼─────────────────────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Backend-Monitor:node-1 │ 10.2.15.149                 │ 3306 │ 4           │ Master, Running │      │
├──────────────────────────┼─────────────────────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Backend-Monitor:node-2 │ 10.2.14.208                 │ 3306 │ 7           │ Master, Running │      │
├──────────────────────────┼─────────────────────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Backend-Monitor:node-3 │ 10.2.14.119                 │ 3306 │ 3           │ Master, Running │      │
├──────────────────────────┼─────────────────────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Backend-Monitor:node-4 │ 10.2.13.99                  │ 3306 │ 3           │ Master, Running │      │
└──────────────────────────┴─────────────────────────────┴──────┴─────────────┴─────────────────┴──────┘

I am not sure how maxscale utilizes sub state from xpand but i did not see any impact of this on maxscale. In my opinion, We can safely ignore this sub-state and instead of Warning we should make it Info or simply not show it.



 Comments   
Comment by Johan Wikman [ 2022-08-26 ]

Currently it's actually not used for anything, but merely collected. I'll add knowledge about that new state and for the future turn that warning into an info-level message.

Generated at Thu Feb 08 04:27:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.