Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-2641

New connection cannot execute anything when Clustrix slave cluster used in rwsplit is down

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Major
    • Resolution: Cannot Reproduce
    • Affects Version/s: None
    • Fix Version/s: N/A
    • Component/s: xpandmon
    • Labels:
    • Environment:

      Description

      [root@karma182 ~]# maxctrl list servers
      ┌───────────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
      │ Server                    │ Address     │ Port │ Connections │ State           │ GTID │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor:node-2  │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor:node-3  │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor:node-1  │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor1:node-2 │ 10.2.15.75  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor1:node-3 │ 10.2.12.218 │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor1:node-1 │ 10.2.15.40  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ BS1                       │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ BS4                       │ 10.2.15.40  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ MBS                       │ 127.0.0.1   │ 4006 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ MBS2                      │ 127.0.0.1   │ 4007 │ 0           │ Slave, Running  │      │
      └───────────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘
      

      I am using `master_accept_reads=true` in all listener sections in the config file.
      But that does not help.

      So, everything is fine in the beginning. Reads going to slave and writes going to master.
      Then I run `clx dbstop` on the slave Clustrix cluster.

      [root@karma007 ~]# /opt/clustrix/bin/clx dbstop
      Success: stop_job krobix run on all nodes.
      [root@karma007 ~]# /opt/clustrix/bin/clx s
      Error connecting to database via /data/clustrix/mysql.sock: (2002, "Can't connect to local MySQL server through socket '/data/clustrix/mysql.sock' (2)")
      == Using cached node details ==
      Cluster Name:    cled76691fc35f9f61
      Cluster Version: clustrix-fredonyer-16045
      Cluster Status:   UNKW
      Cluster Size:    3 nodes - 16 CPUs per Node
       
      nid |  Hostname |  Status |  IP Address  | Zone | TPS |    Used    | Total
      ----+-----------+---------+--------------+------+-----+------------+------
        1 |  karma007 |   UNKW  |   10.2.15.40 |    0 |   0 |  0 (0.00%) |     0
        2 |  karma022 |   UNKW  |   10.2.15.75 |    0 |   0 |  0 (0.00%) |     0
        3 |  karma078 |   UNKW  |  10.2.12.218 |    0 |   0 |  0 (0.00%) |     0
      ----+-----------+---------+--------------+------+-----+------------+------
                                                          0 |  0 (0.00%) |     0
      [root@karma007 ~]#
      

      Now all the reads and writes should go to master. This works in the already open session. I stopped Clustrix service before `insert into test.t1 values (7);` below and all reads and writes went to master as expected.

      [root@vqc008c vqc008c]# mysql -h karma182 -umaxscale -pmaxscale_pw -P 4008
      mysql: [Warning] Using a password on the command line interface can be insecure.
      Welcome to the MySQL monitor.  Commands end with ; or \g.
      Your MySQL connection id is 1
      Server version: 5.5.5-10.2.12 2.4.1-maxscale
       
      Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
       
      Oracle is a registered trademark of Oracle Corporation and/or its
      affiliates. Other names may be trademarks of their respective
      owners.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      mysql> show databases;
      +--------------------+
      | Database           |
      +--------------------+
      | information_schema |
      | test               |
      +--------------------+
      2 rows in set (0.01 sec)
       
      mysql> insert into test.t1 values (6);
      Query OK, 1 row affected (0.00 sec)
       
      mysql> select * from test.t1;
      +------+
      | col1 |
      +------+
      |    2 |
      |    5 |
      |    1 |
      |    3 |
      |    6 |
      +------+
      5 rows in set (0.00 sec)
       
      mysql> show databases;
      +--------------------+
      | Database           |
      +--------------------+
      | information_schema |
      | test               |
      +--------------------+
      2 rows in set (0.00 sec)
       
      mysql> select * from test.t1;
      +------+
      | col1 |
      +------+
      |    2 |
      |    5 |
      |    1 |
      |    3 |
      |    6 |
      +------+
      5 rows in set (0.00 sec)
       
      mysql> insert into test.t1 values (7);
      Query OK, 1 row affected (0.00 sec)
       
      mysql> select * from test.t1 where col1=7;
      +------+
      | col1 |
      +------+
      |    7 |
      +------+
      1 row in set (0.01 sec)
       
      mysql> \q
      Bye
      

      [root@karma182 maxscale]# maxctrl list servers
      ┌───────────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
      │ Server                    │ Address     │ Port │ Connections │ State           │ GTID │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor:node-3  │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor:node-2  │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor:node-1  │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor1:node-3 │ 10.2.12.218 │ 3306 │ 0           │ Down            │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor1:node-2 │ 10.2.15.75  │ 3306 │ 0           │ Down            │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ @@Cluster-Monitor1:node-1 │ 10.2.15.40  │ 3306 │ 0           │ Down            │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ BS1                       │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ BS4                       │ 10.2.15.40  │ 3306 │ 0           │ Down            │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ MBS                       │ 127.0.0.1   │ 4006 │ 0           │ Master, Running │      │
      ├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
      │ MBS2                      │ 127.0.0.1   │ 4007 │ 0           │ Slave, Running  │      │
      └───────────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘
      

      Problem starts when you try to create new connection. You can login but cannot do any reads or writes.

      [root@vqc008c vqc008c]# mysql -h karma182 -umaxscale -pmaxscale_pw -P 4008
      mysql: [Warning] Using a password on the command line interface can be insecure.
      Welcome to the MySQL monitor.  Commands end with ; or \g.
      Your MySQL connection id is 14410
      Server version: 5.5.5-10.2.12 2.4.1-maxscale
       
      Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
       
      Oracle is a registered trademark of Oracle Corporation and/or its
      affiliates. Other names may be trademarks of their respective
      owners.
       
      Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
       
      mysql> show databases;
      ERROR 1927 (08S01): Connection killed by MaxScale: Router could not recover from connection errors
      mysql> use test
      No connection. Trying to reconnect...
      Connection id:    14414
      Current database: *** NONE ***
       
      ERROR 2003 (HY000): Authentication with backend failed. Session will be closed.
      mysql>
      mysql> insert into test.t1 values (8);
      ERROR 1927 (08S01): Connection killed by MaxScale: Router could not recover from connection errors
      mysql> select * from test.t1;
      ERROR 2006 (HY000): MySQL server has gone away
      No connection. Trying to reconnect...
      Connection id:    14418
      Current database: *** NONE ***
       
      ERROR 2003 (HY000): Authentication with backend failed. Session will be closed.
      mysql>
      

      Relevant logs:

      2019-08-15 22:25:10   error  : [clustrixmon] Cluster-Monitor1: Could either not ping or create connection to 10.2.15.40:3306: Can't connect to MySQL server on '10.2.15.40' (115)
      2019-08-15 22:25:10   error  : [clustrixmon] Cluster-Monitor1: Could either not ping or create connection to 10.2.15.75:3306: Can't connect to MySQL server on '10.2.15.75' (115)
      2019-08-15 22:25:10   error  : [clustrixmon] Cluster-Monitor1: Could either not ping or create connection to 10.2.12.218:3306: Can't connect to MySQL server on '10.2.12 for authentication until database users have been successfully loaded.
      2019-08-15 22:24:10   info   : (14419) [readwritesplit] Route query to master: @@Cluster-Monitor:node-1         [10.2.13.91]:3306
      2019-08-15 22:24:10   info   : Stopped Read-Write-Service client session [14419]
      2019-08-15 22:24:10   info   : (14418) [readwritesplit] Servers and router connection counts:
      2019-08-15 22:24:10   info   : (14418) [readwritesplit] current operations : 0 in       [127.0.0.1]:4006 Master, Running
      2019-08-15 22:24:10   info   : (14418) [readwritesplit] current operations : 0 in       [127.0.0.1]:4007 Slave, Running
      2019-08-15 22:24:10   info   : (14418) [readwritesplit] Connected to 'MBS'
      2019-08-15 22:24:10   info   : (14418) [readwritesplit] Selected Master: MBS
      2019-08-15 22:24:10   info   : (14418) [readwritesplit] Connected to 'MBS2'
      2019-08-15 22:24:10   info   : (14418) [readwritesplit] Selected Slave: MBS2
      2019-08-15 22:24:10   info   : (14418) Started Read-Write-Service_rwsrvc client session [14418] for 'maxscale' from ::ffff:10.2.14.197
      2019-08-15 22:24:10   info   : (14418) Connected to 'MBS' with thread id 14420
      2019-08-15 22:24:10   info   : (14418) Connected to 'MBS2' with thread id 14421
      2019-08-15 22:24:10   info   : (14418) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 26, type: QUERY_TYPE_READ, stmt: select * from test.t1
      2019-08-15 22:24:10   info   : (14420) [readwritesplit] Servers and router connection counts:
      2019-08-15 22:24:10   info   : (14418) [readwritesplit] Route query to master: MBS      [127.0.0.1]:4006 <
      2019-08-15 22:24:10   info   : (14420) [readwritesplit] current operations : 0 in       [10.2.13.91]:3306 Master, Running
      2019-08-15 22:24:10   info   : (14420) [readwritesplit] current operations : 0 in       [10.2.15.89]:3306 Master, Running
      2019-08-15 22:24:10   info   : (14420) [readwritesplit] current operations : 0 in       [10.2.15.180]:3306 Master, Running
      2019-08-15 22:24:10   error  : (14421) [readwritesplit] Couldn't find suitable Master from 3 candidates.
      2019-08-15 22:24:10   error  : (14421) Failed to create new router session for service 'Read-Write-Service1'. See previous errors for more details.
      2019-08-15 22:24:10   info   : (14420) [readwritesplit] Connected to '@@Cluster-Monitor:node-1'
      2019-08-15 22:24:10   info   : (14420) [readwritesplit] Selected Master: @@Cluster-Monitor:node-1
      2019-08-15 22:24:10   info   : (14420) Started Read-Write-Service client session [14420] for 'maxscale' from ::ffff:127.0.0.1
      2019-08-15 22:24:10   info   : Stopped Read-Write-Service1 client session [14421]
      2019-08-15 22:24:10   error  : (14418) [mariadbbackend] Invalid authentication message from backend 'MBS2'. Error code: 1045, Msg : #28000failed to create new session
      2019-08-15 22:24:10   warning: (14418) [Read-Write-Service_rwsrvc] Refresh rate limit (once every 30 seconds) exceeded for load of users' table.
      2019-08-15 22:24:10   error  : [mariadbbackend] Unable to write to backend 'MBS2' due to authentication failure. Server in state Slave, Running.
      2019-08-15 22:24:10   info   : Stopped Read-Write-Service_rwsrvc client session [14418]
      2019-08-15 22:24:10   info   : (14420) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 26, type: QUERY_TYPE_READ, stmt: select * from test.t1
      2019-08-15 22:24:10   info   : (14420) [readwritesplit] Route query to master: @@Cluster-Monitor:node-1         [10.2.13.91]:3306 <
      2019-08-15 22:24:10   info   : (14420) [readwritesplit] Storing query (len: 5 cmd: 1), expecting 1 replies to current command:
      2019-08-15 22:24:10   info   : (14420) Connected to '@@Cluster-Monitor:node-1' with thread id 130049
      2019-08-15 22:24:10   info   : Stopped Read-Write-Service client session [14420]
      

      Conf file attached. Maxscale log is too big for attachment, will add to google drive.

        Attachments

          Activity

            People

            Assignee:
            Unassigned Unassigned
            Reporter:
            rahul.joshi@mariadb.com Rahul Joshi
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved:

                Git Integration