[MXS-2641] New connection cannot execute anything when Clustrix slave cluster used in rwsplit is down Created: 2019-08-15  Updated: 2020-08-27  Resolved: 2020-08-27

Status: Closed
Project: MariaDB MaxScale
Component/s: xpandmon
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Rahul Joshi (Inactive) Assignee: Unassigned
Resolution: Cannot Reproduce Votes: 0
Labels: need_feedback
Environment:

MaxScale server karma182:
OS: CentOS 7
Version: built from 2.4 branch,
[root@karma182 ~]# maxscale -V
MaxScale 2.4.1 - 8ce6a679bef8c73b156d3687af7d6eac3c8f46bc
Clustrix nodes:
OS: CentOS 7
Version: clustrix-fredonyer-16045


Attachments: File clustrix_karma182_rwsplit.cnf    

 Description   

[root@karma182 ~]# maxctrl list servers
┌───────────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
│ Server                    │ Address     │ Port │ Connections │ State           │ GTID │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor:node-2  │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor:node-3  │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor:node-1  │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor1:node-2 │ 10.2.15.75  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor1:node-3 │ 10.2.12.218 │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor1:node-1 │ 10.2.15.40  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ BS1                       │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ BS4                       │ 10.2.15.40  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ MBS                       │ 127.0.0.1   │ 4006 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ MBS2                      │ 127.0.0.1   │ 4007 │ 0           │ Slave, Running  │      │
└───────────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘

I am using `master_accept_reads=true` in all listener sections in the config file.
But that does not help.

So, everything is fine in the beginning. Reads going to slave and writes going to master.
Then I run `clx dbstop` on the slave Clustrix cluster.

[root@karma007 ~]# /opt/clustrix/bin/clx dbstop
Success: stop_job krobix run on all nodes.
[root@karma007 ~]# /opt/clustrix/bin/clx s
Error connecting to database via /data/clustrix/mysql.sock: (2002, "Can't connect to local MySQL server through socket '/data/clustrix/mysql.sock' (2)")
== Using cached node details ==
Cluster Name:    cled76691fc35f9f61
Cluster Version: clustrix-fredonyer-16045
Cluster Status:   UNKW
Cluster Size:    3 nodes - 16 CPUs per Node
 
nid |  Hostname |  Status |  IP Address  | Zone | TPS |    Used    | Total
----+-----------+---------+--------------+------+-----+------------+------
  1 |  karma007 |   UNKW  |   10.2.15.40 |    0 |   0 |  0 (0.00%) |     0
  2 |  karma022 |   UNKW  |   10.2.15.75 |    0 |   0 |  0 (0.00%) |     0
  3 |  karma078 |   UNKW  |  10.2.12.218 |    0 |   0 |  0 (0.00%) |     0
----+-----------+---------+--------------+------+-----+------------+------
                                                    0 |  0 (0.00%) |     0
[root@karma007 ~]#

Now all the reads and writes should go to master. This works in the already open session. I stopped Clustrix service before `insert into test.t1 values (7);` below and all reads and writes went to master as expected.

[root@vqc008c vqc008c]# mysql -h karma182 -umaxscale -pmaxscale_pw -P 4008
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 5.5.5-10.2.12 2.4.1-maxscale
 
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| test               |
+--------------------+
2 rows in set (0.01 sec)
 
mysql> insert into test.t1 values (6);
Query OK, 1 row affected (0.00 sec)
 
mysql> select * from test.t1;
+------+
| col1 |
+------+
|    2 |
|    5 |
|    1 |
|    3 |
|    6 |
+------+
5 rows in set (0.00 sec)
 
mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| test               |
+--------------------+
2 rows in set (0.00 sec)
 
mysql> select * from test.t1;
+------+
| col1 |
+------+
|    2 |
|    5 |
|    1 |
|    3 |
|    6 |
+------+
5 rows in set (0.00 sec)
 
mysql> insert into test.t1 values (7);
Query OK, 1 row affected (0.00 sec)
 
mysql> select * from test.t1 where col1=7;
+------+
| col1 |
+------+
|    7 |
+------+
1 row in set (0.01 sec)
 
mysql> \q
Bye

[root@karma182 maxscale]# maxctrl list servers
┌───────────────────────────┬─────────────┬──────┬─────────────┬─────────────────┬──────┐
│ Server                    │ Address     │ Port │ Connections │ State           │ GTID │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor:node-3  │ 10.2.15.180 │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor:node-2  │ 10.2.15.89  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor:node-1  │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor1:node-3 │ 10.2.12.218 │ 3306 │ 0           │ Down            │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor1:node-2 │ 10.2.15.75  │ 3306 │ 0           │ Down            │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ @@Cluster-Monitor1:node-1 │ 10.2.15.40  │ 3306 │ 0           │ Down            │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ BS1                       │ 10.2.13.91  │ 3306 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ BS4                       │ 10.2.15.40  │ 3306 │ 0           │ Down            │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ MBS                       │ 127.0.0.1   │ 4006 │ 0           │ Master, Running │      │
├───────────────────────────┼─────────────┼──────┼─────────────┼─────────────────┼──────┤
│ MBS2                      │ 127.0.0.1   │ 4007 │ 0           │ Slave, Running  │      │
└───────────────────────────┴─────────────┴──────┴─────────────┴─────────────────┴──────┘

Problem starts when you try to create new connection. You can login but cannot do any reads or writes.

[root@vqc008c vqc008c]# mysql -h karma182 -umaxscale -pmaxscale_pw -P 4008
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 14410
Server version: 5.5.5-10.2.12 2.4.1-maxscale
 
Copyright (c) 2000, 2016, Oracle and/or its affiliates. All rights reserved.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> show databases;
ERROR 1927 (08S01): Connection killed by MaxScale: Router could not recover from connection errors
mysql> use test
No connection. Trying to reconnect...
Connection id:    14414
Current database: *** NONE ***
 
ERROR 2003 (HY000): Authentication with backend failed. Session will be closed.
mysql>
mysql> insert into test.t1 values (8);
ERROR 1927 (08S01): Connection killed by MaxScale: Router could not recover from connection errors
mysql> select * from test.t1;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id:    14418
Current database: *** NONE ***
 
ERROR 2003 (HY000): Authentication with backend failed. Session will be closed.
mysql>

Relevant logs:

2019-08-15 22:25:10   error  : [clustrixmon] Cluster-Monitor1: Could either not ping or create connection to 10.2.15.40:3306: Can't connect to MySQL server on '10.2.15.40' (115)
2019-08-15 22:25:10   error  : [clustrixmon] Cluster-Monitor1: Could either not ping or create connection to 10.2.15.75:3306: Can't connect to MySQL server on '10.2.15.75' (115)
2019-08-15 22:25:10   error  : [clustrixmon] Cluster-Monitor1: Could either not ping or create connection to 10.2.12.218:3306: Can't connect to MySQL server on '10.2.12 for authentication until database users have been successfully loaded.
2019-08-15 22:24:10   info   : (14419) [readwritesplit] Route query to master: @@Cluster-Monitor:node-1         [10.2.13.91]:3306
2019-08-15 22:24:10   info   : Stopped Read-Write-Service client session [14419]
2019-08-15 22:24:10   info   : (14418) [readwritesplit] Servers and router connection counts:
2019-08-15 22:24:10   info   : (14418) [readwritesplit] current operations : 0 in       [127.0.0.1]:4006 Master, Running
2019-08-15 22:24:10   info   : (14418) [readwritesplit] current operations : 0 in       [127.0.0.1]:4007 Slave, Running
2019-08-15 22:24:10   info   : (14418) [readwritesplit] Connected to 'MBS'
2019-08-15 22:24:10   info   : (14418) [readwritesplit] Selected Master: MBS
2019-08-15 22:24:10   info   : (14418) [readwritesplit] Connected to 'MBS2'
2019-08-15 22:24:10   info   : (14418) [readwritesplit] Selected Slave: MBS2
2019-08-15 22:24:10   info   : (14418) Started Read-Write-Service_rwsrvc client session [14418] for 'maxscale' from ::ffff:10.2.14.197
2019-08-15 22:24:10   info   : (14418) Connected to 'MBS' with thread id 14420
2019-08-15 22:24:10   info   : (14418) Connected to 'MBS2' with thread id 14421
2019-08-15 22:24:10   info   : (14418) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 26, type: QUERY_TYPE_READ, stmt: select * from test.t1
2019-08-15 22:24:10   info   : (14420) [readwritesplit] Servers and router connection counts:
2019-08-15 22:24:10   info   : (14418) [readwritesplit] Route query to master: MBS      [127.0.0.1]:4006 <
2019-08-15 22:24:10   info   : (14420) [readwritesplit] current operations : 0 in       [10.2.13.91]:3306 Master, Running
2019-08-15 22:24:10   info   : (14420) [readwritesplit] current operations : 0 in       [10.2.15.89]:3306 Master, Running
2019-08-15 22:24:10   info   : (14420) [readwritesplit] current operations : 0 in       [10.2.15.180]:3306 Master, Running
2019-08-15 22:24:10   error  : (14421) [readwritesplit] Couldn't find suitable Master from 3 candidates.
2019-08-15 22:24:10   error  : (14421) Failed to create new router session for service 'Read-Write-Service1'. See previous errors for more details.
2019-08-15 22:24:10   info   : (14420) [readwritesplit] Connected to '@@Cluster-Monitor:node-1'
2019-08-15 22:24:10   info   : (14420) [readwritesplit] Selected Master: @@Cluster-Monitor:node-1
2019-08-15 22:24:10   info   : (14420) Started Read-Write-Service client session [14420] for 'maxscale' from ::ffff:127.0.0.1
2019-08-15 22:24:10   info   : Stopped Read-Write-Service1 client session [14421]
2019-08-15 22:24:10   error  : (14418) [mariadbbackend] Invalid authentication message from backend 'MBS2'. Error code: 1045, Msg : #28000failed to create new session
2019-08-15 22:24:10   warning: (14418) [Read-Write-Service_rwsrvc] Refresh rate limit (once every 30 seconds) exceeded for load of users' table.
2019-08-15 22:24:10   error  : [mariadbbackend] Unable to write to backend 'MBS2' due to authentication failure. Server in state Slave, Running.
2019-08-15 22:24:10   info   : Stopped Read-Write-Service_rwsrvc client session [14418]
2019-08-15 22:24:10   info   : (14420) > Autocommit: [enabled], trx is [not open], cmd: (0x03) COM_QUERY, plen: 26, type: QUERY_TYPE_READ, stmt: select * from test.t1
2019-08-15 22:24:10   info   : (14420) [readwritesplit] Route query to master: @@Cluster-Monitor:node-1         [10.2.13.91]:3306 <
2019-08-15 22:24:10   info   : (14420) [readwritesplit] Storing query (len: 5 cmd: 1), expecting 1 replies to current command:
2019-08-15 22:24:10   info   : (14420) Connected to '@@Cluster-Monitor:node-1' with thread id 130049
2019-08-15 22:24:10   info   : Stopped Read-Write-Service client session [14420]

Conf file attached. Maxscale log is too big for attachment, will add to google drive.



 Comments   
Comment by Rahul Joshi (Inactive) [ 2019-08-15 ]

MaxScale log under:
https://drive.google.com/open?id=1V9xkHydAkaaHuC89oVTJaETfu8eTg_Rn

Comment by markus makela [ 2020-03-02 ]

rahul.joshi@mariadb.com is this still relevant?

Comment by markus makela [ 2020-08-27 ]

Closing as Cannot Reproduce since this hasn't been confirmed by the reporter.

Generated at Thu Feb 08 04:15:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.