Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2626

Clustrix node goes directly to 'Drained` state instead of 'Draining'.

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Not a Bug
    • Affects Version/s: None
    • Fix Version/s: N/A
    • Component/s: xpandmon
    • Labels:
      None
    • Environment:
      MaxScale server karma197:
      OS: CentOS 7
      Version: built from 2.4 branch, MariaDB MaxScale 2.4.1 started (Commit: 658aae6b6c77774c34d0f52c03f63edc2d44969e)
      Clustrix nodes:
      OS: CentOS 7
      Version: clustrix-fredonyer-16045
    • Sprint:
      MXS-SPRINT-88

      Description

      [root@karma108 ~]# clx s
      Cluster Name:    cld8d568c4e1f3e0db
      Cluster Version: clustrix-fredonyer-16045
      Cluster Status:   OK
      Cluster Size:    4 nodes - 16 CPUs per Node
      Current Node:    karma108 - nid 1
       
      nid |  Hostname | Status |  IP Address  | Zone | TPS |      Used     |  Total
      ----+-----------+--------+--------------+------+-----+---------------+--------
        1 |  karma108 |    OK  |   10.2.13.91 |    1 |   0 |  1.8G (0.24%) |  762.9G
        2 |  karma180 |    OK  |  10.2.15.180 |    2 |   0 |  1.8G (0.24%) |  762.9G
        3 |  karma123 |    OK  |   10.2.15.89 |    3 |   0 |  1.8G (0.24%) |  762.9G
        4 |  karma065 |    OK  |  10.2.14.119 |    0 |   0 |  9.3M (0.00%) |  762.9G
      ----+-----------+--------+--------------+------+-----+---------------+--------
                                                         0 |  5.5G (0.18%) |    3.0T
      MaxScale run as:
      [root@karma197 etc]# maxscale -d -f clustrix_karma108_trxreplay.cnf --user=root
       
      [root@karma197 log]# maxctrl list servers
      ┌─────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
      │ Server              │ Address     │ Port │ Connections │ State           │ GTID │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      └─────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘
      

      Now, let's softfail node 4.

      MySQL [system]> SELECT * FROM system.nodeinfo ORDER BY nodeid;
      +--------+----------------------------+----------------------------+----------------------------+-----------------------------+------------+-------------------+-------------+-----------------+------------+-------------+---------+--------------+----------------+-------------------+-------+------------+----------+------+
      | nodeid | started                    | ntptime                    | up_since                   | hostname                    | iface_name | iface_mac_addr    | iface_ip    | iface_connected | mysql_port | be_ip       | be_port | control_port | healthmon_port | pnid              | cores | first_pass | be_iface | zone |
      +--------+----------------------------+----------------------------+----------------------------+-----------------------------+------------+-------------------+-------------+-----------------+------------+-------------+---------+--------------+----------------+-------------------+-------+------------+----------+------+
      |      1 | 2019-08-07 22:32:47.808417 | 2019-08-08 00:08:38.300635 | 2019-08-05 05:48:39.300635 | karma108.colo.sproutsys.com | enp5s0f0   | 00:25:90:20:f5:a2 | 10.2.13.91  |               1 |       3306 | 10.2.13.91  |   24378 |         2048 |           3581 | pfec37993505e4d91 |    16 |          0 | enp5s0f0 |    1 |
      |      2 | 2019-08-07 22:32:47.571999 | 2019-08-08 00:08:38.299715 | 2019-08-05 05:48:16.299715 | karma180.colo.sproutsys.com | enp5s0f0   | 00:25:90:20:f4:ce | 10.2.15.180 |               1 |       3306 | 10.2.15.180 |   24378 |         2048 |           3581 | pcb504659bf22ce02 |    16 |          0 | enp5s0f0 |    2 |
      |      3 | 2019-08-07 22:32:47.519503 | 2019-08-08 00:08:38.305634 | 2019-08-05 05:47:56.305634 | karma123.colo.sproutsys.com | enp5s0f0   | 00:25:90:78:f7:3c | 10.2.15.89  |               1 |       3306 | 10.2.15.89  |   24378 |         2048 |           3581 | pe268896eb1da201a |    16 |          0 | enp5s0f0 |    3 |
      |      4 | 2019-08-07 22:33:01.935622 | 2019-08-08 00:08:38.297207 | 2019-07-29 00:37:02.297207 | karma065.colo.sproutsys.com | enp5s0f0   | 00:25:90:7b:5c:f4 | 10.2.14.119 |               1 |       3306 | 10.2.14.119 |   24378 |         2048 |           3581 | p67082d6315829f66 |    16 |          0 | enp5s0f0 |    0 |
      +--------+----------------------------+----------------------------+----------------------------+-----------------------------+------------+-------------------+-------------+-----------------+------------+-------------+---------+--------------+----------------+-------------------+-------+------------+----------+------+
      4 rows in set (0.00 sec)
       
      MySQL [system]> ALTER CLUSTER SOFTFAIL 4;
      Query OK, 0 rows affected (0.04 sec)
      
      

      Immediately run `maxctrl list servers` and it shows that node is marked as `Drained` instead of `Draining`.

      [root@karma197 log]# maxctrl list servers;date
      ┌─────────────────────┬─────────────┬──────┬─────────────┬──────────────────────────┬──────┐
      │ Server              │ Address     │ Port │ Connections │ State                    │ GTID │
      ├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤
      │ @@Clustrix:node-4   │ 10.2.14.119 │ 3306 │ 0           │ Drained, Master, Running │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤
      │ @@Clustrix:node-2   │ 10.2.15.180 │ 3306 │ 0           │ Master, Running          │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤
      │ @@Clustrix:node-3   │ 10.2.15.89  │ 3306 │ 0           │ Master, Running          │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤
      │ @@Clustrix:node-1   │ 10.2.13.91  │ 3306 │ 0           │ Master, Running          │      │
      ├─────────────────────┼─────────────┼──────┼─────────────┼──────────────────────────┼──────┤
      │ Bootstrap1-karma108 │ 10.2.13.91  │ 3306 │ 0           │ Master, Running          │      │
      └─────────────────────┴─────────────┴──────┴─────────────┴──────────────────────────┴──────┘
      Thu Aug  8 00:11:50 UTC 2019
      

      Whereas Clustrix is not yet done softfailing the node.

      MySQL [system]>  SELECT count(*) FROM system.softfailing_containers; select now();
      +----------+
      | count(*) |
      +----------+
      |        2 |
      +----------+
      1 row in set (0.03 sec)
       
      +---------------------+
      | now()               |
      +---------------------+
      | 2019-08-08 00:12:01 |
      +---------------------+
      1 row in set (0.00 sec)
       
      MySQL [system]> SELECT * FROM system.softfailed_nodes       WHERE nodeid NOT IN          (SELECT DISTINCT nodeid           FROM system.softfailing_containers);
      Empty set (0.04 sec)
      
      

      This shows that the node is not ready to be removed as Clustrix rebalancer is still moving data from the node.

      MaxScale log also says:

      2019-08-08 00:11:43   notice : [clustrixmon] Clustrix: Node 4 (10.2.14.119) has been SOFTFAILed. Turning ON 'Being Drained'.
      2019-08-08 00:11:50   info   : Accept authentication from 'admin', using password. Request: /v1/servers
      2019-08-08 00:11:50   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix
      2019-08-08 00:12:07   info   : Accept authentication from 'admin', using password. Request: /v1/servers
      2019-08-08 00:12:07   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix
      2019-08-08 00:12:19   info   : Accept authentication from 'admin', using password. Request: /v1/servers
      2019-08-08 00:12:19   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix
      2019-08-08 00:13:12   info   : Accept authentication from 'admin', using password. Request: /v1/servers
      2019-08-08 00:13:12   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix
      2019-08-08 00:14:21   info   : Accept authentication from 'admin', using password. Request: /v1/servers
      2019-08-08 00:14:21   info   : Accept authentication from 'admin', using password. Request: /v1/monitors/Clustrix
      

      It's only at 2019-08-08 00:13:14.502252 UTC, that node should be marked as `Drained`

      2019-08-08 00:13:14.502252 UTC nid 1 karma108.colo.sproutsys.com clxnode: INFO dbcore/softfail.ct:26 softfail_node_ready_signal(): softfailing nodes are ready to be removed: 4
      

      Cofig file and logs attached.

        Attachments

          Activity

            People

            Assignee:
            johan.wikman Johan Wikman
            Reporter:
            rahul.joshi@mariadb.com Rahul Joshi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.