[MXS-3374] MaxScale fails to update IP for a existing node that reappears with a IP change Created: 2021-01-14  Updated: 2021-01-26  Resolved: 2021-01-25

Status: Closed
Project: MariaDB MaxScale
Component/s: xpandmon
Affects Version/s: 2.5.6
Fix Version/s: 2.5.7

Type: Bug Priority: Major
Reporter: Manjinder Nijjar Assignee: Johan Wikman
Resolution: Fixed Votes: 0
Labels: None
Environment:

Sky-GCP
MaxScale version: MariaDB MaxScale 2.5.6 (Commit: fddc0526ee79ac9a87f7a7170f3204263240ab57)


Attachments: File maxscale-after.log     File maxscale-before.log     File maxscale.11a874d96b69829d967676047af18badc9ba884f.log     File maxscale.cnf.rtf    
Sprint: MXS-SPRINT-123

 Description   

In SkySQL when a node misbehaves for any reason, K8s kills the node and restarts a new instance with same hostname but different IP. When Xpandmon is configured to send traffic directly to Xpand nodes, it seem replacement node is not being identified correctly since it reappears with a different IP. As a result MaxScale stops connecting new sessions and errors on existing session.

Here is a example to demo this behavior. We have a 3 node cluster running in Sky-GCP. This is a configuration where Xpand is running with Mariadb server in same POD (1:1 config). MaxScale is configured both for Frontend (Mariadb nodes) and backend (Xpand nodes).

dev-jump:~/xpand-new $ kubectl exec -it t1-mdb-mxs-78cf79765-28fvz -- bash
Defaulting container name to maxscale.
Use 'kubectl describe pod/t1-mdb-mxs-78cf79765-28fvz -n db00007507' to see all of the containers in this pod.
[root@t1-mdb-mxs-78cf79765-28fvz /]# maxctrl -u $(cat /etc/maxscale-cfg/maxscale-api-username) -p\'$(cat /etc/maxscale-cfg/maxscale-api-password)\' list servers
┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.3.10      │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.32.2.232     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.2.232            │ 10.32.2.232     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.3.10             │ 10.32.3.10      │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-1 │ 10.32.2.197     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.2.197            │ 10.32.2.197     │ 3306 │ 0           │ Master, Running              │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ Xpand-Bootstrap        │ t1-mxp-0.t1-mxp │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴─────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘
[root@t1-mdb-mxs-78cf79765-28fvz /]# exit
exit
dev-jump:~/xpand-new $ kubectl delete pod t1-mxp-2 && sleep 120
pod "t1-mxp-2" deleted
dev-jump:~/xpand-new $ kubectl exec -it t1-mdb-mxs-78cf79765-28fvz -- bash
Defaulting container name to maxscale.
Use 'kubectl describe pod/t1-mdb-mxs-78cf79765-28fvz -n db00007507' to see all of the containers in this pod.
[root@t1-mdb-mxs-78cf79765-28fvz /]# maxctrl -u $(cat /etc/maxscale-cfg/maxscale-api-username) -p\'$(cat /etc/maxscale-cfg/maxscale-api-password)\' list servers
┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.3.11             │ 10.32.3.11      │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.3.10      │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.32.2.232     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.2.232            │ 10.32.2.232     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-1 │ 10.32.2.197     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.2.197            │ 10.32.2.197     │ 3306 │ 0           │ Master, Running              │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ Xpand-Bootstrap        │ t1-mxp-0.t1-mxp │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴─────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘
[root@t1-mdb-mxs-78cf79765-28fvz /]# exit
dev-jump:~/xpand-new $ kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP            NODE                                     NOMINATED NODE   READINESS GATES
t1-mdb-mxs-78cf79765-28fvz     3/3     Running   0          11m     10.32.3.37    gke-xpand-proj-user-n1s4-0145145e-1cp3   <none>           <none>
t1-mdb-state-f4d489fd5-27rxz   1/1     Running   0          11m     10.32.1.15    gke-xpand-proj-user-n1s1-f50de098-mqqj   <none>           <none>
t1-mxp-0                       4/4     Running   0          11m     10.32.2.197   gke-xpand-proj-user-n1s8-a767dc3a-3d0f   <none>           <none>
t1-mxp-1                       4/4     Running   0          8m58s   10.32.2.232   gke-xpand-proj-user-n1s8-30eafa84-lqwh   <none>           <none>
t1-mxp-2                       4/4     Running   0          2m26s   10.32.3.11    gke-xpand-proj-user-n1s8-767aa0a1-nx6x   <none>           <none>
dev-jump:~/xpand-new $ kubectl logs t1-mdb-mxs-78cf79765-28fvz maxscale | grep "MariaDB MaxScale"
2021-01-14 16:27:19   notice : MariaDB MaxScale 2.5.6 started (Commit: fddc0526ee79ac9a87f7a7170f3204263240ab57)
dev-jump:~/xpand-new $ 

And then we kill one of the pods to mimic K8s behavior when node misbehaves:

dev-jump:~/xpand-new $ kubectl delete pod t1-mxp-2 && sleep 120
pod "t1-mxp-2" deleted

When a new pod appears, its IP changes to 10.32.3.11 however Xpand config is still pointing to older IP: 10.32.3.10.

dev-jump:~/xpand-new $ kubectl exec -it t1-mdb-mxs-78cf79765-28fvz -- bash
Defaulting container name to maxscale.
Use 'kubectl describe pod/t1-mdb-mxs-78cf79765-28fvz -n db00007507' to see all of the containers in this pod.
[root@t1-mdb-mxs-78cf79765-28fvz /]# maxctrl -u $(cat /etc/maxscale-cfg/maxscale-api-username) -p\'$(cat /etc/maxscale-cfg/maxscale-api-password)\' list servers
┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.3.11             │ 10.32.3.11      │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.3.10      │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.32.2.232     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.2.232            │ 10.32.2.232     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-1 │ 10.32.2.197     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.2.197            │ 10.32.2.197     │ 3306 │ 0           │ Master, Running              │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ Xpand-Bootstrap        │ t1-mxp-0.t1-mxp │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴─────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘

However Xpand cluster identifies this correctly and comes up fine:

dev-jump:~ $ kubectl exec -it t1-mxp-0 -- bash
Defaulting container name to clustrix.
Use 'kubectl describe pod/t1-mxp-0 -n db00007507' to see all of the containers in this pod.
[root@t1-mxp-0 /]# /opt/clustrix/bin/clx status
Cluster Name:    cl6299ac0c0fc1da53
Cluster Version: 5.3.13
Cluster Status:   OK 
Cluster Size:    3 nodes - 8 CPUs per Node
Current Node:    t1-mxp-0 - nid 1
nid |  Hostname | Status |  IP Address  | TPS |      Used      |  Total 
----+-----------+--------+--------------+-----+----------------+--------
  1 |  t1-mxp-0 |    OK  |  10.32.2.197 |   0 |   9.5M (0.00%) |  223.9G
  2 |  t1-mxp-1 |    OK  |  10.32.2.232 |   0 |   9.3M (0.00%) |  223.9G
  3 |  t1-mxp-2 |    OK  |   10.32.3.11 |   0 |   9.7M (0.00%) |  223.9G
----+-----------+--------+--------------+-----+----------------+--------
                                            0 |  28.6M (0.00%) |  671.6G

List of POD's in this setup:

dev-jump:~/xpand-new $ kubectl get pods -o wide
NAME                           READY   STATUS    RESTARTS   AGE     IP            NODE                                     NOMINATED NODE   READINESS GATES
t1-mdb-mxs-78cf79765-28fvz     3/3     Running   0          11m     10.32.3.37    gke-xpand-proj-user-n1s4-0145145e-1cp3   <none>           <none>
t1-mdb-state-f4d489fd5-27rxz   1/1     Running   0          11m     10.32.1.15    gke-xpand-proj-user-n1s1-f50de098-mqqj   <none>           <none>
t1-mxp-0                       4/4     Running   0          11m     10.32.2.197   gke-xpand-proj-user-n1s8-a767dc3a-3d0f   <none>           <none>
t1-mxp-1                       4/4     Running   0          8m58s   10.32.2.232   gke-xpand-proj-user-n1s8-30eafa84-lqwh   <none>           <none>
t1-mxp-2                       4/4     Running   0          2m26s   10.32.3.11    gke-xpand-proj-user-n1s8-767aa0a1-nx6x   <none>           <none>
dev-jump:~/xpand-new $ kubectl logs t1-mdb-mxs-78cf79765-28fvz maxscale | grep "MariaDB MaxScale"
2021-01-14 16:27:19   notice : MariaDB MaxScale 2.5.6 started (Commit: fddc0526ee79ac9a87f7a7170f3204263240ab57)
dev-jump:~/xpand-new $



 Comments   
Comment by Manjinder Nijjar [ 2021-01-14 ]

This is blocker since this is going to prevent some of our customers from doing POC in Sky where they will be accessing Xpand nodes directly via MaxScale.

Comment by Johan Wikman [ 2021-01-21 ]

msnijjar I am confused by this output

┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.3.10      │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.32.2.232     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
...
│ @@Xpand-Monitor:node-1 │ 10.32.2.197     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
...
│ Xpand-Bootstrap        │ t1-mxp-0.t1-mxp │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴─────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘

because presumably Xpand-Bootstrap should correspond to one of those @@Xpand... nodes and hence have the same state?

Comment by Johan Wikman [ 2021-01-21 ]

msnijjar Could you provide the maxscale.log?

Comment by Jens Röwekamp (Inactive) [ 2021-01-21 ]

Hi johan.wikman.

Attached you'll find the two maxscale logs maxscale-before.log and maxscale-after.log .

Steps taken before getting the log files:

dev-jump:~/mxs-3374 $ kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE                                     NOMINATED NODE   READINESS GATES
j1-mdb-mxs-9f46fd795-qzprx      3/3     Running   0          47s     10.32.2.164   gke-xpand-proj-user-n1s4-dd53f623-xd3f   <none>           <none>
j1-mdb-state-5f95dcdb65-zsfrt   1/1     Running   0          9m7s    10.32.3.6     gke-xpand-proj-user-n1s1-f50de098-1hvu   <none>           <none>
j1-mxp-0                        4/4     Running   0          9m7s    10.32.3.99    gke-xpand-proj-user-n1s8-a767dc3a-n5t1   <none>           <none>
j1-mxp-1                        4/4     Running   0          5m20s   10.32.0.132   gke-xpand-proj-user-n1s8-30eafa84-0w3z   <none>           <none>
j1-mxp-2                        4/4     Running   0          5m7s    10.32.0.164   gke-xpand-proj-user-n1s8-767aa0a1-8p5v   <none>           <none>
dev-jump:~/mxs-3374 $ kubectl exec -it j1-mdb-mxs-9f46fd795-qzprx -- bash
Defaulting container name to maxscale.
Use 'kubectl describe pod/j1-mdb-mxs-9f46fd795-qzprx -n db00007562' to see all of the containers in this pod.
[root@j1-mdb-mxs-9f46fd795-qzprx /]# maxctrl -u $(cat /etc/maxscale-cfg/maxscale-api-username) -p\'$(cat /etc/maxscale-cfg/maxscale-api-password)\' list servers
┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.0.132            │ 10.32.0.132     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.3.99             │ 10.32.3.99      │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.0.164            │ 10.32.0.164     │ 3306 │ 0           │ Master, Running              │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.32.0.132     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.0.164     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-1 │ 10.32.3.99      │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ Xpand-Bootstrap        │ j1-mxp-0.j1-mxp │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴─────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘
[root@j1-mdb-mxs-9f46fd795-qzprx /]# ping -c2 j1-mxp-0.j1-mxp
PING j1-mxp-0.j1-mxp.db00007562.svc.cluster.local (10.32.3.99) 56(84) bytes of data.
64 bytes from j1-mxp-0.j1-mxp.db00007562.svc.cluster.local (10.32.3.99): icmp_seq=1 ttl=62 time=1.45 ms
64 bytes from j1-mxp-0.j1-mxp.db00007562.svc.cluster.local (10.32.3.99): icmp_seq=2 ttl=62 time=0.275 ms
 
--- j1-mxp-0.j1-mxp.db00007562.svc.cluster.local ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 2ms
rtt min/avg/max/mdev = 0.275/0.862/1.449/0.587 ms
[root@j1-mdb-mxs-9f46fd795-qzprx /]# exit
exit
dev-jump:~/mxs-3374 $ kubectl logs j1-mdb-mxs-9f46fd795-qzprx maxscale > maxscale-before.log
dev-jump:~/mxs-3374 $ kubectl delete pod j1-mxp-1 && sleep 120
pod "j1-mxp-1" deleted
dev-jump:~/mxs-3374 $ kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE    IP            NODE                                     NOMINATED NODE   READINESS GATES
j1-mdb-mxs-9f46fd795-qzprx      3/3     Running   0          6m4s   10.32.2.164   gke-xpand-proj-user-n1s4-dd53f623-xd3f   <none>           <none>
j1-mdb-state-5f95dcdb65-zsfrt   1/1     Running   0          14m    10.32.3.6     gke-xpand-proj-user-n1s1-f50de098-1hvu   <none>           <none>
j1-mxp-0                        4/4     Running   0          14m    10.32.3.99    gke-xpand-proj-user-n1s8-a767dc3a-n5t1   <none>           <none>
j1-mxp-1                        4/4     Running   0          2m7s   10.32.0.133   gke-xpand-proj-user-n1s8-30eafa84-0w3z   <none>           <none>
j1-mxp-2                        4/4     Running   0          10m    10.32.0.164   gke-xpand-proj-user-n1s8-767aa0a1-8p5v   <none>           <none>
dev-jump:~/mxs-3374 $ kubectl logs j1-mdb-mxs-9f46fd795-qzprx maxscale > maxscale-after.log
dev-jump:~/mxs-3374 $ kubectl exec -it j1-mdb-mxs-9f46fd795-qzprx -- bash
Defaulting container name to maxscale.
Use 'kubectl describe pod/j1-mdb-mxs-9f46fd795-qzprx -n db00007562' to see all of the containers in this pod.
[root@j1-mdb-mxs-9f46fd795-qzprx /]# maxctrl -u $(cat /etc/maxscale-cfg/maxscale-api-username) -p\'$(cat /etc/maxscale-cfg/maxscale-api-password)\' list servers
┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.0.133            │ 10.32.0.133     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.3.99             │ 10.32.3.99      │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.0.164            │ 10.32.0.164     │ 3306 │ 0           │ Master, Running              │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.32.0.132     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.0.164     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-1 │ 10.32.3.99      │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ Xpand-Bootstrap        │ j1-mxp-0.j1-mxp │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴─────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘
[root@j1-mdb-mxs-9f46fd795-qzprx /]# exit
exit
dev-jump:~/mxs-3374 $ 

Comment by Johan Wikman [ 2021-01-21 ]

I'm also confused by

┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.3.11             │ 10.32.3.11      │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7126,51-51-1,52-52-1 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.3.10      │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤

If 10.32.3.10 is the wrong IP, I wonder how it is possible that the state is Master, Running. If there is no-one answering to the health-check ping, the state should be Down-

Comment by Johan Wikman [ 2021-01-21 ]

I think I know why the state of @@Xpand-Monitor:node-3 is correctly shown as Master, Running, although the wrong address 10.32.3.10 is seemingly used. Based on code review It would appear that internally xpandmon is aware of and is using the correct address, but the address in the corresponding internal Server object that routers use is not updated.

@msnijjar Could you repeat the test and as a final step, stop and restart MaxScale. If that causes the correct address to be shown in the output, it would confirm my hypothesis.

Comment by Johan Wikman [ 2021-01-21 ]

And could you also before and after do

# cd /var/lib/maxscale/Xpand-Monitor
# sqlite3 xpand_nodes-v1.db
...
sqlite> select * from dynamic_nodes;

and paste the output here.

Comment by Jens Röwekamp (Inactive) [ 2021-01-21 ]

Restarting MaxScale will be a bit hard since the whole container in which the MaxScale process is running will be restarted once the MaxScale process is killed. Since there isn't any persistent storage attached to the MaxScale container it will default to its initial configuration which only contains the Xpand-Bootstrap node. From this Xpand-Bootstrap node MaxScale is able to acquire the correct IP addresses of the xpand cluster.

But I was able to query the xpand_nodes-v1.db database as you asked.
(which contains the updated IP address for the restarted Xpand pod)

dev-jump:~ $ kubectl exec -it j2-mdb-mxs-5bdb889477-xnqsd -- bash
Defaulting container name to maxscale.
Use 'kubectl describe pod/j2-mdb-mxs-5bdb889477-xnqsd -n db00007563' to see all of the containers in this pod.
[root@j2-mdb-mxs-5bdb889477-xnqsd /]# maxctrl -u $(cat /etc/maxscale-cfg/maxscale-api-username) -p\'$(cat /etc/maxscale-cfg/maxscale-api-password)\' list servers
┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.0.163            │ 10.32.0.163     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.3.100            │ 10.32.3.100     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.0.133            │ 10.32.0.133     │ 3306 │ 0           │ Master, Running              │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.0.133     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.32.3.100     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-1 │ 10.32.0.163     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ Xpand-Bootstrap        │ j2-mxp-0.j2-mxp │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴─────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘
[root@j2-mdb-mxs-5bdb889477-xnqsd /]# cd /var/lib/maxscale/Xpand-Monitor/
[root@j2-mdb-mxs-5bdb889477-xnqsd Xpand-Monitor]# sqlite3 xpand_nodes-v1.db 
SQLite version 3.26.0 2018-12-01 12:34:55
Enter ".help" for usage hints.
sqlite> select * FROM dynamic_nodes;
1|10.32.0.163|3308|3581
2|10.32.3.100|3308|3581
3|10.32.0.133|3308|3581
sqlite> 
[root@j2-mdb-mxs-5bdb889477-xnqsd Xpand-Monitor]# exit
exit
dev-jump:~ $ kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE                                     NOMINATED NODE   READINESS GATES
j2-mdb-mxs-5bdb889477-xnqsd     3/3     Running   0          4m19s   10.32.2.164   gke-xpand-proj-user-n1s4-dd53f623-6glc   <none>           <none>
j2-mdb-state-7dcc4bd798-q28g8   1/1     Running   0          46m     10.32.3.8     gke-xpand-proj-user-n1s1-f50de098-1hvu   <none>           <none>
j2-mxp-0                        4/4     Running   0          46m     10.32.0.163   gke-xpand-proj-user-n1s8-a767dc3a-x5tn   <none>           <none>
j2-mxp-1                        4/4     Running   0          31m     10.32.0.133   gke-xpand-proj-user-n1s8-30eafa84-78qp   <none>           <none>
j2-mxp-2                        4/4     Running   0          42m     10.32.3.100   gke-xpand-proj-user-n1s8-767aa0a1-jmqp   <none>           <none>
dev-jump:~ $ kubectl delete pod j2-mxp-1 && sleep 120
pod "j2-mxp-1" deleted
dev-jump:~ $ kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE                                     NOMINATED NODE   READINESS GATES
j2-mdb-mxs-5bdb889477-xnqsd     3/3     Running   0          6m43s   10.32.2.164   gke-xpand-proj-user-n1s4-dd53f623-6glc   <none>           <none>
j2-mdb-state-7dcc4bd798-q28g8   1/1     Running   0          49m     10.32.3.8     gke-xpand-proj-user-n1s1-f50de098-1hvu   <none>           <none>
j2-mxp-0                        4/4     Running   0          49m     10.32.0.163   gke-xpand-proj-user-n1s8-a767dc3a-x5tn   <none>           <none>
j2-mxp-1                        4/4     Running   0          2m4s    10.32.0.134   gke-xpand-proj-user-n1s8-30eafa84-78qp   <none>           <none>
j2-mxp-2                        4/4     Running   0          45m     10.32.3.100   gke-xpand-proj-user-n1s8-767aa0a1-jmqp   <none>           <none>
dev-jump:~ $ kubectl exec -it j2-mdb-mxs-5bdb889477-xnqsd -- bash
Defaulting container name to maxscale.
Use 'kubectl describe pod/j2-mdb-mxs-5bdb889477-xnqsd -n db00007563' to see all of the containers in this pod.
[root@j2-mdb-mxs-5bdb889477-xnqsd /]# maxctrl -u $(cat /etc/maxscale-cfg/maxscale-api-username) -p\'$(cat /etc/maxscale-cfg/maxscale-api-password)\' list servers
┌────────────────────────┬─────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address         │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.0.134            │ 10.32.0.134     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.0.163            │ 10.32.0.163     │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.32.3.100            │ 10.32.3.100     │ 3306 │ 0           │ Master, Running              │ 50-50-7138,51-51-4,52-52-4 │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.32.0.133     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.32.3.100     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-1 │ 10.32.0.163     │ 3308 │ 0           │ Master, Running              │                            │
├────────────────────────┼─────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ Xpand-Bootstrap        │ j2-mxp-0.j2-mxp │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴─────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘
[root@j2-mdb-mxs-5bdb889477-xnqsd /]# cd /var/lib/maxscale/Xpand-Monitor/
[root@j2-mdb-mxs-5bdb889477-xnqsd Xpand-Monitor]# sqlite3 xpand_nodes-v1.db 
SQLite version 3.26.0 2018-12-01 12:34:55
Enter ".help" for usage hints.
sqlite> select * FROM dynamic_nodes;
1|10.32.0.163|3308|3581
2|10.32.3.100|3308|3581
3|10.32.0.134|3308|3581
sqlite> 
[root@j2-mdb-mxs-5bdb889477-xnqsd Xpand-Monitor]# 

Comment by Johan Wikman [ 2021-01-21 ]

Thanks, this confirms it. The list servers output shows the incorrect 10.32.0.133, but the sqlite database contains the correct 10.32.0.134. So, internally xpandmon uses the correct IP, which is why the state is correctly shown, but the address in the server object used by routers is not updated, which then causes the routing to fail.

Comment by Johan Wikman [ 2021-01-21 ]

What is the IP-adress of j2-mxp-0.j2-mxp?

Comment by Jens Röwekamp (Inactive) [ 2021-01-21 ]

That would be: 10.32.0.163

If we use the IP address directly instead of the DNS name for the bootstrap node it shows itself as running.
I could try to use the longer DNS name and see if that makes any difference. We have some similar DNS issue with ColumnStore where ColumnStore can't handle the short DNS names, but the longer ones work fine.

Comment by Johan Wikman [ 2021-01-21 ]

In all list servers output, Xpand-Bootstrap is listed as being Down. So how does the Xpand monitor get going?

At first startup, the Xpand monitor uses the bootstrap server defined in the configuration in order to get in contact with the Xpand cluster. It then figures out the cluster configuration and stores that information in the sqlite database. On subsequent startups, the monitor uses the data in the sqlite database and effectively ignores the information in the configuration file.

Anyway, to get going, the Xpand monitor must at some point be able to connect to a Xpand node using the bootstrap server information from the configuration file. So how does that happen if you use an address using which the Xpand monitor cannot connect?

Comment by Jens Röwekamp (Inactive) [ 2021-01-22 ]

Thanks a lot Johan.
I tried to create a CentOS 8 package from your branch but am currently struggling to install all required dependencies.
Do you have some build pipeline in place that could be used to create a CentOS 8 package of your branch for us?
With a provided CentOS 8 package we would be able to create a custom Docker image that we can test in our setup.

Comment by Johan Wikman [ 2021-01-22 ]

Run first install_build_deps.sh in BUILD, that should install all dependencies.
Johan

Comment by Johan Wikman [ 2021-01-22 ]

I'm not at my laptop so this comes from memory...

  • Run BUILD/install_build_deps.sh
  • mkdir build; cd build
  • cmake -DCMAKE_INSTALL_PREFIX=/usr ..
  • make
  • make package

I rarely build packages, but I have a vague recollection that you at some point had to run it twice in a row on some OS.

Hope this works...
Johan

Comment by Jens Röwekamp (Inactive) [ 2021-01-22 ]

Thanks. Unfortunately install_build_deps.sh didn't resolve all dependencies.
With the manual installation of some additional packages MaxScale was able to be build.
I'll continue testing in SkySQL and report back once I've some results.

Here is the Dockerfile for reference.

FROM centos:8
 
USER root
WORKDIR /root
 
RUN dnf -y update && \
    dnf group install -y "Development Tools" && \
    dnf install -y git sudo wget tcl tcl-devel curl-devel openssl-devel \
      sqlite-devel gnutls-devel libxml2-devel cyrus-sasl-devel glibc-all-langpacks \
      libuuid-devel pam-devel libatomic && \
    dnf config-manager --set-enabled powertools && \
    dnf install -y doxygen
 
RUN git clone https://github.com/mariadb-corporation/MaxScale.git && \
    cd MaxScale && \
    git checkout johan-MXS-3374 && \
    cd .. && \
    mkdir build && \
    cd build && \
    ../MaxScale/BUILD/install_build_deps.sh && \
    cmake ../MaxScale -DCMAKE_INSTALL_PREFIX=/usr -DPACKAGE=Y -DBUILD_TESTS=Y && \
    make && \
    make test && \
    make package

Comment by Jens Röwekamp (Inactive) [ 2021-01-22 ]

Hi Johan. I think it works as expected.
Even though `maxctrl list servers` still shows the old IP address (10.113.129.38) my clients are routed to the restarted Xpand node (10.113.129.39).

The MaxScale logfile is attached.
maxscale.11a874d96b69829d967676047af18badc9ba884f.log

[root@pff-mdb-mxs-7db7f77665-rdhpn /]# maxctrl -u $(cat /etc/maxscale-cfg/maxscale-api-username) -p\'$(cat /etc/maxscale-cfg/maxscale-api-password)\' list servers
┌────────────────────────┬────────────────┬──────┬─────────────┬──────────────────────────────┬────────────────────────────┐
│ Server                 │ Address        │ Port │ Connections │ State                        │ GTID                       │
├────────────────────────┼────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.113.129.39          │ 10.113.129.39  │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7129,51-51-4,52-52-4 │
├────────────────────────┼────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-2 │ 10.113.128.230 │ 3308 │ 1           │ Master, Running              │                            │
├────────────────────────┼────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-3 │ 10.113.129.38  │ 3308 │ 1           │ Master, Running              │                            │
├────────────────────────┼────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.113.128.230         │ 10.113.128.230 │ 3306 │ 0           │ Relay Master, Slave, Running │ 50-50-7129,51-51-4,52-52-4 │
├────────────────────────┼────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ @@Xpand-Monitor:node-1 │ 10.113.129.68  │ 3308 │ 2           │ Master, Running              │                            │
├────────────────────────┼────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ 10.113.129.68          │ 10.113.129.68  │ 3306 │ 0           │ Master, Running              │ 50-50-7129,51-51-4,52-52-4 │
├────────────────────────┼────────────────┼──────┼─────────────┼──────────────────────────────┼────────────────────────────┤
│ Xpand-Bootstrap        │ 10.113.201.148 │ 3308 │ 0           │ Down                         │                            │
└────────────────────────┴────────────────┴──────┴─────────────┴──────────────────────────────┴────────────────────────────┘
[root@pff-mdb-mxs-7db7f77665-rdhpn /]# ping 10.113.129.38 -c2
PING 10.113.129.38 (10.113.129.38) 56(84) bytes of data.
From 10.166.15.203 icmp_seq=1 Destination Host Unreachable
From 10.166.15.203 icmp_seq=2 Destination Host Unreachable
 
--- 10.113.129.38 ping statistics ---
2 packets transmitted, 0 received, +2 errors, 100% packet loss, time 59ms

jens@drop-table:~$ mariadb --host pff.mdb0003035.dev.skysql.net --port 5001 --user skysql_admin -p'xxx' --ssl
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MySQL connection id is 11
Server version: 5.0.45-Xpand-5.3.13 
 
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
MySQL [(none)]> select @@hostname;
+---------------+
| @@hostname    |
+---------------+
| 10.113.129.39 |
+---------------+
1 row in set (0.043 sec)
 
MySQL [(none)]> 

The MaxScale image that was used is: mariadb/skysql-maxscale-dev:2.5.6-1-quizlet-dev-6189-61d2bf

Comment by Jens Röwekamp (Inactive) [ 2021-01-22 ]

The Xpand-Bootstrap node in above's example is a load balancer for all Xpand nodes which doesn't return pings.
(In case you were wondering why it is in state DOWN)

Comment by Johan Wikman [ 2021-01-25 ]

jens.rowekamp So it is not part of the Xpand cluster? If that's the case, then I just don't understand how the Xpand-monitor gets going.

Comment by Johan Wikman [ 2021-01-25 ]

jens.rowekampI found the reason why the traffic is routed correctly, even though list servers shows the old address. For performance reasons the address is stored in two places; one which is used for routing and another place which is used when the list servers output is generated. In that fix of mine, only the former was updated when the address of the Xpand node had changed.

What packages did install_build_deps.sh not install? We'll update the script.

Comment by Jens Röwekamp (Inactive) [ 2021-01-25 ]

Hi Johan. I needed to install following packages prior executing install_build_deps.sh to be able to compile MaxScale in a CentOS 8 container:

tcl wget openssl-devel gnutls-devel libxml2-devel curl-devel libuuid-devel libatomic pam-devel sqlite-devel cyrus-sasl-devel glibc-all-langpacks

Regarding the Xpand-Bootstrap node that is used:
I created a k8s load balancer that detects all available Xpand nodes dynamically and only routes traffic to a respective Xpand node's 3308 port if it is available.
This k8s load balancer was set as Xpand-Bootstrap node in Xpand-monitor. This way the Xpand-monitor will not be limited to a specific bootstrap node but will use any Xpand node that is available for bootstrapping.
Since the load balancer only forwards traffic on port 3308 and doesn't return any ICMP packets, I simply assumed that it will be listed as offline as it doesn't return any pings.
Nevertheless the Xpand-monitor is able to connect to the load balancer during its bootstrap phase and detects the Xpand topology.

Comment by Johan Wikman [ 2021-01-26 ]

jens.rowekampNow I finally understand your bootstrap setup. That's brilliant.

Generated at Thu Feb 08 04:20:57 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.