[MXS-4609] MaxScale Connects to MySQL replica but cannot execute queries Created: 2023-05-09  Updated: 2023-05-17  Resolved: 2023-05-10

Status: Closed
Project: MariaDB MaxScale
Component/s: N/A
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Kalana Wijethunga Assignee: markus makela
Resolution: Not a Bug Votes: 0
Labels: None


 Description   

I am setting up a Master-slave MySQL replication setup using 3 MySQL 8 nodes(Fresh installations). I have setup a master and configured another nodes to replicate from the master which is working fine.

I have setup MaxScale free version as a load balancer and I am intending to use it as a load balancer with read write splitting. Following is my setup(domain names are anonymized for privacy reasons).

node1.com - master
node2.com - slave1

I can succesfully connect both nodes with `mysql -uroot -proot -h <domain-name>. Following is my MaxScale configuration.

[maxscale]
threads=auto
admin_secure_gui=false
admin_host=0.0.0.0
 
[node1]
type=server
address=node1.com
port=3306
protocol=MySQLBackend
 
[node2]
type=server
address=node2.com
port=3306
protocol=MySQLBackend
 
[node3]
type=server
address=node3.com
port=3306
protocol=MySQLBackend
 
 
[MySQL-Monitor]
type=monitor
module=mariadbmon
servers=node1,node2,node3
user=maxscale
password=password
monitor_interval=2s
 
[Read-Write-Service]
type=service
router=readwritesplit
servers=node1,node2
user=maxscale
password=password
enable_root_user=true
 
[Read-Write-Listener]
type=listener
service=Read-Write-Service
protocol=MariaDBClient
port=3307

When I try `mysql -uroot -proot -h <maxscale-domain>` Maxscale can succesfully connect to the server, but the connection hangs as follows.

➜  ~ mysql -uroot -proot -h <maxscale-domain> -P 3307
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.

How can i resolve this?

I have tried the followings to try to debug this.

1. Using a new read only service with following configuration

[Read-Service]
type=service
router=readconnroute
servers=node1,node2
router_options=master
user=maxscale
password=password
enable_root_user=true
 
[Read-Listener]
type=listener
service=Read-Service
protocol=MariaDBClient
port=3306

This has led me somewhere with `mysql -uroot -proot` as it allows me to connect to both servers succesfully. When `router_options=master` I can successfully connect to master and query the database. However when `router_options=slave` I can connect to the slave but cannot query it.

➜  ~ mysql -uroot -proot -h <maxscale-host> -P 3306
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 7
Server version: 8.0.32-0ubuntu0.22.04.2
 
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> select @@hostname;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    8
Current database: *** NONE ***
 
ERROR 1927 (HY000): Lost connection to backend server: network error (node2, Connection refused)

This led me to believe the issue could be with the node2, but I cannot figure out the issue as it works fine if i connect the mysql client directly to node.

2. To ensure that this is not a issue related to any slave, i removed node1 from the network and added a new node3. Then node2 is setup as the master of node3 with replication enabled. Results for the Read only service is as follows.

With `router_options=master` :

➜  ~ mysql -uroot -proot -h <maxscale-host> -P 3306
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 8.0.32-0ubuntu0.22.04.2
 
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> select @@hostname;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    2
Current database: *** NONE ***
 
ERROR 1927 (HY000): Lost connection to backend server: network error (node2, Connection refused)

It provides the same issue for node2 and 3 both now.

With `router_options=slave` :

➜  ~ mysql -uroot -proot -h <maxscale-host> -P 3306
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 1
Server version: 8.0.32-0ubuntu0.22.04.2
 
Copyright (c) 2000, 2023, Oracle and/or its affiliates.
 
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
 
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
 
mysql> select @@hostname;
ERROR 2013 (HY000): Lost connection to MySQL server during query
No connection. Trying to reconnect...
Connection id:    2
Current database: *** NONE ***
 
ERROR 1927 (HY000): Lost connection to backend server: network error (node3, Connection refused)

Note that all 3 are plain MySQL installations with few empty databases added + binlog enabled. Replication works fine and MaxScale GUI reports as such. It looks like the issue comes from MySQL, but actually it should be related to the MaxScale.

Any thoughts on the matter are highly appreciated.

Edit : Attaching maxscale.log

MariaDB MaxScale  /var/log/maxscale/maxscale.log  Fri May  5 13:27:43 2023
----------------------------------------------------------------------------
2023-05-05 13:27:43   notice : Module 'mariadbmon' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libmariadbmon.so'.
2023-05-05 13:27:43   notice : Module 'readconnroute' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libreadconnroute.so'.
2023-05-05 13:27:43   notice : Module 'readwritesplit' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libreadwritesplit.so'.
2023-05-05 13:27:43   notice : Using up to 75.53GiB of memory for query classifier cache
2023-05-05 13:27:43   notice : syslog logging is disabled.
2023-05-05 13:27:43   notice : maxlog logging is enabled.
2023-05-05 13:27:43   notice : Host: '<maxscale-host>' OS: Linux@5.4.0-135-generic, #152-Ubuntu SMP Wed Nov 23 20:19:22 UTC 2022, x86_64 with 64 processor cores (64.00 available).
2023-05-05 13:27:43   notice : Total main memory: 503.56GiB (503.56GiB usable).
2023-05-05 13:27:43   notice : MariaDB MaxScale 23.02.1 started (Commit: 5de96ecc39da80a992fc8f362d22a030a844bef5)
2023-05-05 13:27:43   notice : MaxScale is running in process 3097068
2023-05-05 13:27:43   notice : Configuration file: /etc/maxscale.cnf
2023-05-05 13:27:43   notice : Log directory: /var/log/maxscale
2023-05-05 13:27:43   notice : Data directory: /var/lib/maxscale
2023-05-05 13:27:43   notice : Module directory: /usr/lib/x86_64-linux-gnu/maxscale
2023-05-05 13:27:43   notice : Service cache: /var/cache/maxscale
2023-05-05 13:27:43   notice : Working directory: /var/log/maxscale
2023-05-05 13:27:43   notice : Module 'qc_sqlite' loaded from '/usr/lib/x86_64-linux-gnu/maxscale/libqc_sqlite.so'.
2023-05-05 13:27:43   notice : Query classification results are cached and reused. Memory used per thread: 1.18GiB
2023-05-05 13:27:43   notice : Password encryption key file '/var/lib/maxscale/.secrets' not found, using configured passwords as plaintext.
2023-05-05 13:27:43   notice : The systemd watchdog is Enabled. Internal timeout = 30s
2023-05-05 13:27:43   notice : Using HS256 for JWT signatures
2023-05-05 13:27:43   notice : Started REST API on [0.0.0.0]:8989

Edit 2:

Attaching mysql general log for reference. Nothing specific to logging or execution of the query gets logged in the mysql general log either. However when i restart the maxscale server it succesfull y connects it seems which is proven by the statement `2023-05-05T11:45:44.693829Z 19 Connect maxscale@<maxscale-host> on using TCP/IP`

2023-05-05T11:45:44.226596Z	   18 Query	SHOW STATUS LIKE 'Uptime'
2023-05-05T11:45:44.228199Z	   18 Query	SELECT @@global.server_id, @@read_only
2023-05-05T11:45:44.228584Z	   18 Query	SHOW SLAVE STATUS
2023-05-05T11:45:44.532309Z	   18 Quit	
2023-05-05T11:45:44.693829Z	   19 Connect	maxscale@<maxscale-host> on  using TCP/IP
2023-05-05T11:45:44.694243Z	   19 Quit	
2023-05-05T11:45:44.695671Z	   20 Connect	maxscale@<maxscale-host> on  using TCP/IP
2023-05-05T11:45:44.696020Z	   20 Query	SET SQL_MODE=''
2023-05-05T11:45:44.696292Z	   20 Query	SET @@session.autocommit=1
2023-05-05T11:45:44.696512Z	   20 Query	SET NAMES latin1
2023-05-05T11:45:44.696768Z	   20 Query	SELECT id, @@global.collation_server FROM information_schema.collations WHERE collation_name=@@global.collation_server

Edit 3: Attaching MaxScale info log as suggested by https://stackoverflow.com/users/7633996/markusjm

2023-05-08 11:05:34   info   : Routed [COM_QUERY] to 'node-a' select @@version_comment limit 1
2023-05-08 11:05:34   info   : Storing COM_QUERY while in state 'Handshaking': select @@version_comment limit 1
2023-05-08 11:05:34   info   : Server 'node-a' failed
2023-05-08 11:05:34   info   : Stopped Read-Service client session [1]
2023-05-08 11:05:38   info   : Accept authentication from 'admin', using password. Request: /auth
2023-05-08 11:05:44   info   : Found matching user 'maxscale'@'%' for client 'maxscale'@'2001:1458:202:56::111:1b' with sufficient privileges.
2023-05-08 11:05:44   info   : New session for server node-a. Connections : 1
2023-05-08 11:05:44   info   : Started Read-Service client session [2] for 'maxscale' from 2001:1458:202:56::100:1b
2023-05-08 11:05:44   info   : Server 'node-a' failed
2023-05-08 11:05:44   info   : Stopped Read-Service client session [2]
2023-05-08 11:07:03   info   : Found matching user 'maxscale'@'%' for client 'maxscale'@'2001:1458:202:56::111:1b' with sufficient privileges.
2023-05-08 11:07:03   info   : New session for server node-a. Connections : 1
2023-05-08 11:07:03   info   : Started Read-Service client session [3] for 'maxscale' from 2001:1458:202:56::111:1b
2023-05-08 11:07:03   info   : Server 'node-a' failed
2023-05-08 11:07:03   info   : Stopped Read-Service client session [3]
2023-05-08 11:10:39   info   : Found matching user 'maxscale'@'%' for client 'maxscale'@'2001:1458:202:56::111:1b' with sufficient privileges.

This is after setting `log_info=true` for maxscale and recreating maxscale MySQL user with `mysql_native_password`. Following is the user info.

mysql> select * from user where user='maxscale'\G;
*************************** 1. row ***************************
                    Host: %
                    User: maxscale
             Select_priv: N
             Insert_priv: N
             Update_priv: N
             Delete_priv: N
             Create_priv: N
               Drop_priv: N
             Reload_priv: N
           Shutdown_priv: N
            Process_priv: N
               File_priv: N
              Grant_priv: N
         References_priv: N
              Index_priv: N
              Alter_priv: N
            Show_db_priv: Y
              Super_priv: N
   Create_tmp_table_priv: N
        Lock_tables_priv: N
            Execute_priv: N
         Repl_slave_priv: N
        Repl_client_priv: Y
        Create_view_priv: N
          Show_view_priv: N
     Create_routine_priv: N
      Alter_routine_priv: N
        Create_user_priv: N
              Event_priv: N
            Trigger_priv: N
  Create_tablespace_priv: N
                ssl_type: 
              ssl_cipher: 0x
             x509_issuer: 0x
            x509_subject: 0x
           max_questions: 0
             max_updates: 0
         max_connections: 0
    max_user_connections: 0
                  plugin: mysql_native_password
   authentication_string: *2470C0C06DEE42FD1618BB99005ADCA2629D1E19
        password_expired: N
   password_last_changed: 2023-05-08 09:02:05
       password_lifetime: NULL
          account_locked: N
        Create_role_priv: N
          Drop_role_priv: N
  Password_reuse_history: NULL
     Password_reuse_time: NULL
Password_require_current: NULL
         User_attributes: NULL
1 row in set (0.00 sec)

mysql> show grants for 'maxscale'@'%';
+-------------------------------------------------------------------+
| Grants for maxscale@%                                             |
+-------------------------------------------------------------------+
| GRANT SHOW DATABASES, REPLICATION CLIENT ON *.* TO `maxscale`@`%` |
| GRANT SELECT ON `mysql`.`columns_priv` TO `maxscale`@`%`          |
| GRANT SELECT ON `mysql`.`db` TO `maxscale`@`%`                    |
| GRANT SELECT ON `mysql`.`procs_priv` TO `maxscale`@`%`            |
| GRANT SELECT ON `mysql`.`proxies_priv` TO `maxscale`@`%`          |
| GRANT SELECT ON `mysql`.`tables_priv` TO `maxscale`@`%`           |
| GRANT SELECT ON `mysql`.`user` TO `maxscale`@`%`                  |
+-------------------------------------------------------------------+
7 rows in set (0.00 sec)

Maria



 Comments   
Comment by markus makela [ 2023-05-10 ]

Can you try replacing router=readconnroute with router=readwritesplit and then remove router_options from the configuration? The readwritesplit routing module should log a more detailed error whenever a node fails.

I did a quick test with the MySQL 8.0.32 docker image and an identical user but I wasn't able to reproduce the problem. However, it did reveal that readconnroute does not log enough information on the info level which is something that can be improved.

Comment by Kalana Wijethunga [ 2023-05-10 ]

Hi @markus,

I had already tried this with a `router=readwritesplit`. The behaviour was that the MySQL connection would load and just hang in the middle at the following point without no clue on what's going on.

```
➜ ~ mysql -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor. Commands end with ; or \g.
```

However, if I take the replica out of the Maxscale monitoring, it will successfully connect to master which led me to believe that the issue with the replica and led me to bring a 2nd replica to the setup which ultimately gave the same result.

Anyway, after all the changes done to the setup which discussing this question, that issue seems to have gone away but the connection still drops after logging into MySQL. Following is the MaxScale log.

```
2023-05-10 09:55:53
warning: Super user 'root'@'2001:1458:202:56::100:1b' logged in to service 'Read-Write-Service'.
2023-05-10 09:55:53
info : Target connection counts:
2023-05-10 09:55:53
info : current operations : 0 in node-a Master, Running
2023-05-10 09:55:53
info : current operations : 0 in node-b Slave, Running
2023-05-10 09:55:53
info : Connected to 'node-a'
2023-05-10 09:55:53
info : Selected Primary: node-a
2023-05-10 09:55:53
info : Connected to 'node-b'
2023-05-10 09:55:53
info : Selected Replica: node-b
2023-05-10 09:55:53
info : Started Read-Write-Service client session [10] for 'root' from 2001:1458:202:56::100:1b
2023-05-10 09:55:53
info : Primary 'node-a' failed: #HY000: Lost connection to backend server: network error (node-a: 111, Connection refused)
2023-05-10 09:55:53
error : Lost connection to the primary server, closing session. Lost connection to primary server while connection was idle. Connection has been idle for 0 seconds. Error caused by: #HY000: Lost connection to backend server: network error (node-a: 111, Connection refused). Last close reason: <none>. Last error:
2023-05-10 09:55:53
error : Network error in connection to server 'node-b', session in state 'Stopping session' (DCB::State::POLLING): 111, Connection refused
2023-05-10 09:55:57
info : Stopped Read-Write-Service client session [10]
2023-05-10 09:56:01
info : Found matching user 'root'@'%' for client 'root'@'2001:1458:202:56::100:1b' with sufficient privileges.
2023-05-10 09:56:01
warning: Super user 'root'@'2001:1458:202:56::100:1b' logged in to service 'Read-Write-Service'.
2023-05-10 09:56:01
info : Target connection counts:
2023-05-10 09:56:01
info : current operations : 0 in node-a Master, Running
2023-05-10 09:56:01
info : current operations : 0 in node-b Slave, Running
2023-05-10 09:56:01
info : Connected to 'node-a'
2023-05-10 09:56:01
info : Selected Primary: node-a
2023-05-10 09:56:01
info : Connected to 'node-b'
2023-05-10 09:56:01
info : Selected Replica: node-b
2023-05-10 09:56:01
info : Started Read-Write-Service client session [11] for 'root' from 2001:1458:202:56::100:1b
2023-05-10 09:56:01
info : Primary 'node-a' failed: #HY000: Lost connection to backend server: network error (node-a: 111, Connection refused)
2023-05-10 09:56:01
error : Lost connection to the primary server, closing session. Lost connection to primary server while connection was idle. Connection has been idle for 0 seconds. Error caused by: #HY000: Lost connection to backend server: network error (node-a: 111, Connection refused). Last close reason: <none>. Last error:
2023-05-10 09:56:01
error : Network error in connection to server 'node-b', session in state 'Stopping session' (DCB::State::POLLING): 111, Connection refused
2023-05-10 09:56:05
info : Stopped Read-Write-Service client session [11]
```

The server immediately disconnects after the authentication and I can't figure out why.

In the meantime could you verify that if MaxScale free version can only be used with 1 MySQL server? Because that could be the reason behind all this.

Comment by markus makela [ 2023-05-10 ]

No, the license if MaxScale does not affect its functionality and thus this is not caused by it. The MaxScale 23.02 license states:

Additional Use Grant: You may use the Licensed Work when your application
uses the Licensed Work with a total of less than three
server instances in production.

I believe that less than three means either one or two servers.

The Connection refused error suggests that there's a problem connecting to the database and the rejection happens at the network level. Oddly enough, it seems like the monitor is able to connect to the servers and thus the state of the servers seems to be OK. Can you verify that you can connect to the database nodes from the MaxScale host? SSH-ing over to the server and then using the mysql command line client to connect to the database using the root user should verify that the network between the nodes is functional.

Another thing to check would be to see if there are multiple network interfaces on the MaxScale host. If so, you could try to bind to a specific interface using the global local_address parameter in MaxScale. Perhaps the network interface that the client connections from MaxScale use isn't able to connect to the database nodes and the connections used by the monitors in MaxScale somehow end up using a different network interface.

Comment by Kalana Wijethunga [ 2023-05-10 ]

Hi,

This brings something interesting to my attention. node1(working primary) is setup on my local machine. node2, node3 and <maxscale-host> are on actual enteprise servers. The authentication is done via SSH keys. So my personal key is added on each machine, but the machines does not share each other's keys. This leads to the fact that I can SSH from node1 to any of the machines but I cant SSH from any other machine to a different machine because the keys are not configured.

However, I can try directly connecting via MySQL -h command to each machine from the node2 which is working fine. At the moment I am reluctant to install mysql-client libraries on <maxscale-host> as it looks like the limitation of 2 servers will stop me from evaluating the product further.

About trying out the local_address, I got the following error on log. I believe the issue could be here. The server machines are not configured to use ipv6 and they use ipv4 only. The reason could be why it can connect to node1(my personal machine) is because it has a ipv4 but the other machines do not. Is there a way to force MaxScale to use ipv4 instead of ipv6 ?

```
2023-05-10 10:37:55
info : Found matching user 'root'@'%' for client 'root'@'<>' with sufficient privileges.
2023-05-10 10:37:55
warning: Super user 'root'@'<>' logged in to service 'Read-Write-Service'.
2023-05-10 10:37:55
info : Target connection counts:
2023-05-10 10:37:55
info : current operations : 0 in node-a Master, Running
2023-05-10 10:37:55
info : current operations : 0 in node-b Slave, Running
2023-05-10 10:37:55
error : Could not bind connecting socket to local address "<>", connecting to server using default local address: Address family not supported by protocol
2023-05-10 10:37:55
info : Connected to 'node-a'
2023-05-10 10:37:55
info : Selected Primary: node-a
2023-05-10 10:37:55
error : Could not bind connecting socket to local address "<>, connecting to server using default local address: Address family not supported by protocol
2023-05-10 10:37:55
info : Connected to 'node-b'
2023-05-10 10:37:55
info : Selected Replica: node-b
2023-05-10 10:37:55
info : Started Read-Write-Service client session [2] for 'root' from 2001:1458:202:56::100:1b
2023-05-10 10:37:55
info : Primary 'node-a' failed: #HY000: Lost connection to backend server: network error (node-a: 111, Connection refused)
2023-05-10 10:37:55
error : Lost connection to the primary server, closing session. Lost connection to primary server while connection was idle. Connection has been idle for 0 seconds. Error caused by: #HY000: Lost connection to backend server: network error (node-a: 111, Connection refused). Last close reason: <none>. Last error:
2023-05-10 10:37:55
error : Network error in connection to server 'node-b', session in state 'Stopping session' (DCB::State::POLLING): 111, Connection refused
2023-05-10 10:37:58
info : Stopped Read-Write-Service client session [2]

Comment by markus makela [ 2023-05-10 ]

I don't know why or how you have a user with <> as the host but that definitely doesn't seem like something that's a valid network address. What's the value of local_client you used?

So far nothing indicates that IPv6 is the problem. You can bind the listening ports of MaxScale to the IPv4 all interface by using address=0.0.0.0 in the listener sections.

Comment by Kalana Wijethunga [ 2023-05-10 ]

sorry for the confusion. I replaced ip addresses with <> for security reasons. Please see a more sensible log below after adding address = 0.0.0.0 to the listeners and setting ipv4 address of maxscale host to `local_address` when initiating the call to maxscale host from a mysql client in my local machine(node1

```
2023-05-10 10:49:51
info : Found matching user 'root'@'%' for client 'root'@'<node1-ipv4>' with sufficient privileges.
2023-05-10 10:49:51
warning: Super user 'root'@'<node1-ipv4>' logged in to service 'Read-Write-Service'.
2023-05-10 10:49:51
info : Target connection counts:
2023-05-10 10:49:51
info : current operations : 0 in node-a Master, Running
2023-05-10 10:49:51
info : current operations : 0 in node-b Slave, Running
2023-05-10 10:49:51
error : Could not bind connecting socket to local address "<maxscale-host-ipv4>", connecting to server using default local address: Address family not supported by protocol
2023-05-10 10:49:51
info : Connected to 'node-a'
2023-05-10 10:49:51
info : Selected Primary: node-a
2023-05-10 10:49:51
error : Could not bind connecting socket to local address "<maxscale-host-ipv4>", connecting to server using default local address: Address family not supported by protocol
2023-05-10 10:49:51
info : Connected to 'node-b'
2023-05-10 10:49:51
info : Selected Replica: node-b
2023-05-10 10:49:51
info : Started Read-Write-Service client session [2] for 'root' from <node1ipv4>
2023-05-10 10:49:51
info : Primary 'node-a' failed: #HY000: Lost connection to backend server: network error (node-a: 111, Connection refused)
2023-05-10 10:49:51
error : Lost connection to the primary server, closing session. Lost connection to primary server while connection was idle. Connection has been idle for 0 seconds. Error caused by: #HY000: Lost connection to backend server: network error (node-a: 111, Connection refused). Last close reason: <none>. Last error:
2023-05-10 10:49:51
error : Network error in connection to server 'node-b', session in state 'Stopping session' (DCB::State::POLLING): 111, Connection refused
2023-05-10 10:49:54
info : Stopped Read-Write-Service client session [2]
```

Comment by markus makela [ 2023-05-10 ]

Are you using hostnames for the address parameters in the server definitions? If possible, could you try if switching them to IP addresses has any effect?

Comment by Kalana Wijethunga [ 2023-05-10 ]

Oh ! shoot. I should have check this at the beginning. Yup, using IPs resolved the problem which means that this should be related to DNS. Thanks a lot for the pointers

Comment by markus makela [ 2023-05-10 ]

Thanks for reporting back. I think we can improve some of the error and info level messages and maybe even show the IP addresses that hostnames resolve to, hopefully this will help with any future DNS related problems.

Comment by markus makela [ 2023-05-10 ]

I'll close this as Not a Bug since it turned out to be a DNS related issue.

Comment by Kalana Wijethunga [ 2023-05-10 ]

Yes, I think improving logging would reduce a lot of debugging effort.

Can you let me know how I can check the the server limitation of 2 servers. I now have 3 servers being monitored by the maxscale monitor.
When i set route_options=master traffic gets routed to node1 and when i set it up route_options=slave it gets routed to node2 which kind of shows that it handle only 2 servers. But could it be that it does not know how to share the traffic between 2 replicas so it keeps sending to the first replica only?

Comment by markus makela [ 2023-05-10 ]

The readconnroute module only does load balancing at the connection level. When a client connects to the readconnroute service, it picks a server and creates a connection there. From that point onwards, the readconnroute module does no load balancing and routes every request, regardless of its type, to the same server.

The readwritesplit module is more versatile and is capable of connecting to multiple servers and routing reads to different nodes based on various load conditions. This is usually what you want since the load conditions at the connection creation time are not always a good representation of the load conditions later on. The readwritesplit router looks at the load conditions of the replicas before each query and routes it to the one that is the leasy busy. This sort of dynamic load balancing is especially important for connections that exist for longer periods of time (e.g. client-side connection pools like HikariCP) but it is equally as important for proper load distribution even for short connections.

Now that the connection problems have been fixed, I'd recommend using the readwritesplit module if you expect individual requests to be load balanced across multiple nodes. Another thing to keep in mind while testing is that if you only have one client connection, the traffic can end up going only to a single node since the same node always appers as the best candidate. For a proper load balancing test, you need more than one client connection and some load to properly stress the system. A program like mysqlslap or sysbench are useful tools for testing load balancing.

Comment by Kalana Wijethunga [ 2023-05-10 ]

Thanks a lot for the pointers and detailed explanations marcus. I will try them out.

Generated at Thu Feb 08 04:29:52 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.