[MXS-2626] Clustrix node goes directly to 'Drained` state instead of 'Draining'. - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Not a Bug
Affects Version/s: None
Fix Version/s: N/A
Component/s: xpandmon
Labels:
None
Environment:
MaxScale server karma197:
OS: CentOS 7
Version: built from 2.4 branch, MariaDB MaxScale 2.4.1 started (Commit: 658aae6b6c77774c34d0f52c03f63edc2d44969e)
Clustrix nodes:
OS: CentOS 7
Version: clustrix-fredonyer-16045

Sprint:
MXS-SPRINT-88

Description

[root@karma108 ~]# clx s

Cluster Name:    cld8d568c4e1f3e0db

Cluster Version: clustrix-fredonyer-16045

Cluster Status:   OK

Cluster Size:    4 nodes - 16 CPUs per Node

Current Node:    karma108 - nid 1

nid |  Hostname | Status |  IP Address  | Zone | TPS |      Used     |  Total

----+-----------+--------+--------------+------+-----+---------------+--------

  1 |  karma108 |    OK  |   10.2.13.91 |    1 |   0 |  1.8G (0.24%) |  762.9G

  2 |  karma180 |    OK  |  10.2.15.180 |    2 |   0 |  1.8G (0.24%) |  762.9G

  3 |  karma123 |    OK  |   10.2.15.89 |    3 |   0 |  1.8G (0.24%) |  762.9G

  4 |  karma065 |    OK  |  10.2.14.119 |    0 |   0 |  9.3M (0.00%) |  762.9G

----+-----------+--------+--------------+------+-----+---------------+--------

                                                   0 |  5.5G (0.18%) |    3.0T

MaxScale run as:

[root@karma197 etc]# maxscale -d -f clustrix_karma108_trxreplay.cnf --user=root

[root@karma197 log]# maxctrl list servers

┌─────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐

│ Server              │ Address     │ Port │ Connections │ State           │ GTID │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤

│ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │

└─────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘

Now, let's softfail node 4.

MySQL [system]> SELECT * FROM system.nodeinfo ORDER BY nodeid;

+--------+----------------------------+----------------------------+----------------------------+-----------------------------+------------+-------------------+-------------+-----------------+------------+-------------+---------+--------------+----------------+-------------------+-------+------------+----------+------+

| nodeid | started                    | ntptime                    | up_since                   | hostname                    | iface_name | iface_mac_addr    | iface_ip    | iface_connected | mysql_port | be_ip       | be_port | control_port | healthmon_port | pnid              | cores | first_pass | be_iface | zone |

+--------+----------------------------+----------------------------+----------------------------+-----------------------------+------------+-------------------+-------------+-----------------+------------+-------------+---------+--------------+----------------+-------------------+-------+------------+----------+------+

|      1 | 2019-08-07 22:32:47.808417 | 2019-08-08 00:08:38.300635 | 2019-08-05 05:48:39.300635 | karma108.colo.sproutsys.com | enp5s0f0   | 00:25:90:20:f5:a2 | 10.2.13.91  |               1 |       3306 | 10.2.13.91  |   24378 |         2048 |           3581 | pfec37993505e4d91 |    16 |          0 | enp5s0f0 |    1 |

|      2 | 2019-08-07 22:32:47.571999 | 2019-08-08 00:08:38.299715 | 2019-08-05 05:48:16.299715 | karma180.colo.sproutsys.com | enp5s0f0   | 00:25:90:20:f4:ce | 10.2.15.180 |               1 |       3306 | 10.2.15.180 |   24378 |         2048 |           3581 | pcb504659bf22ce02 |    16 |          0 | enp5s0f0 |    2 |

|      3 | 2019-08-07 22:32:47.519503 | 2019-08-08 00:08:38.305634 | 2019-08-05 05:47:56.305634 | karma123.colo.sproutsys.com | enp5s0f0   | 00:25:90:78:f7:3c | 10.2.15.89  |               1 |       3306 | 10.2.15.89  |   24378 |         2048 |           3581 | pe268896eb1da201a |    16 |          0 | enp5s0f0 |    3 |

|      4 | 2019-08-07 22:33:01.935622 | 2019-08-08 00:08:38.297207 | 2019-07-29 00:37:02.297207 | karma065.colo.sproutsys.com | enp5s0f0   | 00:25:90:7b:5c:f4 | 10.2.14.119 |               1 |       3306 | 10.2.14.119 |   24378 |         2048 |           3581 | p67082d6315829f66 |    16 |          0 | enp5s0f0 |    0 |

+--------+----------------------------+----------------------------+----------------------------+-----------------------------+------------+-------------------+-------------+-----------------+------------+-------------+---------+--------------+----------------+-------------------+-------+------------+----------+------+

4 rows in set (0.00 sec)

MySQL [system]> ALTER CLUSTER SOFTFAIL 4;

Query OK, 0 rows affected (0.04 sec)

Immediately run `maxctrl list servers` and it shows that node is marked as `Drained` instead of `Draining`.

[root@karma197 log]# maxctrl list servers;date

┌─────────────────────┬─────────────┬──────┬─────────────┬──────────────────────────┬──────┐

│ Server              │ Address     │ Port │ Connections │ State                    │ GTID │

├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤

│ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Drained, Master, Running │      │

├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤

│ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running          │      │

├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤

│ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running          │      │

├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤

│ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running          │      │

├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤

│ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running          │      │

└─────────────────────┴─────────────┴──────┴─────────────┴──────────────────────────┴──────┘

Thu Aug  8 00:11:50 UTC 2019

Whereas Clustrix is not yet done softfailing the node.

MySQL [system]>  SELECT count(*) FROM system.softfailing_containers; select now();

+----------+

| count(*) |

+----------+

|        2 |

+----------+

1 row in set (0.03 sec)

+---------------------+

| now()               |

+---------------------+

| 2019-08-08 00:12:01 |

+---------------------+

1 row in set (0.00 sec)

MySQL [system]> SELECT * FROM system.softfailed_nodes       WHERE nodeid NOT IN          (SELECT DISTINCT nodeid           FROM system.softfailing_containers);

Empty set (0.04 sec)

This shows that the node is not ready to be removed as Clustrix rebalancer is still moving data from the node.

MaxScale log also says:

2019-08-08 00:11:43   notice : [clustrixmon] Clustrix: Node 4 (10.2.14.119) has been SOFTFAILed. Turning ON 'Being Drained'.

2019-08-08 00:11:50   info   : Accept authentication from 'admin', using password. Request: /v1/servers

2019-08-08 00:11:50   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix

2019-08-08 00:12:07   info   : Accept authentication from 'admin', using password. Request: /v1/servers

2019-08-08 00:12:07   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix

2019-08-08 00:12:19   info   : Accept authentication from 'admin', using password. Request: /v1/servers

2019-08-08 00:12:19   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix

2019-08-08 00:13:12   info   : Accept authentication from 'admin', using password. Request: /v1/servers

2019-08-08 00:13:12   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix

2019-08-08 00:14:21   info   : Accept authentication from 'admin', using password. Request: /v1/servers

2019-08-08 00:14:21   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix

It's only at 2019-08-08 00:13:14.502252 UTC, that node should be marked as `Drained`

2019-08-08 00:13:14.502252 UTC nid 1 karma108.colo.sproutsys.com clxnode: INFO dbcore/softfail.ct:26 softfail_node_ready_signal(): softfailing nodes are ready to be removed: 4

Cofig file and logs attached.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

clustrix_karma108_trxreplay.cnf
0.7 kB
2019-08-08 00:42
maxscale.log
11 kB
2019-08-08 00:42

Activity

People

Assignee:: Johan Wikman

Reporter:: Rahul Joshi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2019-08-08 00:42

Updated:: 2019-08-14 12:59

Resolved:: 2019-08-14 12:59

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.