[MXS-4609] MaxScale Connects to MySQL replica but cannot execute queries Created: 2023-05-09 Updated: 2023-05-17 Resolved: 2023-05-10 |
|
| Status: | Closed |
| Project: | MariaDB MaxScale |
| Component/s: | N/A |
| Affects Version/s: | None |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Major |
| Reporter: | Kalana Wijethunga | Assignee: | markus makela |
| Resolution: | Not a Bug | Votes: | 0 |
| Labels: | None | ||
| Description |
|
I am setting up a Master-slave MySQL replication setup using 3 MySQL 8 nodes(Fresh installations). I have setup a master and configured another nodes to replicate from the master which is working fine. I have setup MaxScale free version as a load balancer and I am intending to use it as a load balancer with read write splitting. Following is my setup(domain names are anonymized for privacy reasons).
I can succesfully connect both nodes with `mysql -uroot -proot -h <domain-name>. Following is my MaxScale configuration.
When I try `mysql -uroot -proot -h <maxscale-domain>` Maxscale can succesfully connect to the server, but the connection hangs as follows.
How can i resolve this? I have tried the followings to try to debug this. 1. Using a new read only service with following configuration
This has led me somewhere with `mysql -uroot -proot` as it allows me to connect to both servers succesfully. When `router_options=master` I can successfully connect to master and query the database. However when `router_options=slave` I can connect to the slave but cannot query it.
This led me to believe the issue could be with the node2, but I cannot figure out the issue as it works fine if i connect the mysql client directly to node. 2. To ensure that this is not a issue related to any slave, i removed node1 from the network and added a new node3. Then node2 is setup as the master of node3 with replication enabled. Results for the Read only service is as follows. With `router_options=master` :
It provides the same issue for node2 and 3 both now. With `router_options=slave` :
Note that all 3 are plain MySQL installations with few empty databases added + binlog enabled. Replication works fine and MaxScale GUI reports as such. It looks like the issue comes from MySQL, but actually it should be related to the MaxScale. Any thoughts on the matter are highly appreciated. Edit : Attaching maxscale.log
Edit 2: Attaching mysql general log for reference. Nothing specific to logging or execution of the query gets logged in the mysql general log either. However when i restart the maxscale server it succesfull y connects it seems which is proven by the statement `2023-05-05T11:45:44.693829Z 19 Connect maxscale@<maxscale-host> on using TCP/IP`
Edit 3: Attaching MaxScale info log as suggested by https://stackoverflow.com/users/7633996/markusjm
This is after setting `log_info=true` for maxscale and recreating maxscale MySQL user with `mysql_native_password`. Following is the user info.
Maria |
| Comments |
| Comment by markus makela [ 2023-05-10 ] |
|
Can you try replacing router=readconnroute with router=readwritesplit and then remove router_options from the configuration? The readwritesplit routing module should log a more detailed error whenever a node fails. I did a quick test with the MySQL 8.0.32 docker image and an identical user but I wasn't able to reproduce the problem. However, it did reveal that readconnroute does not log enough information on the info level which is something that can be improved. |
| Comment by Kalana Wijethunga [ 2023-05-10 ] |
|
Hi @markus, I had already tried this with a `router=readwritesplit`. The behaviour was that the MySQL connection would load and just hang in the middle at the following point without no clue on what's going on. ``` However, if I take the replica out of the Maxscale monitoring, it will successfully connect to master which led me to believe that the issue with the replica and led me to bring a 2nd replica to the setup which ultimately gave the same result. Anyway, after all the changes done to the setup which discussing this question, that issue seems to have gone away but the connection still drops after logging into MySQL. Following is the MaxScale log. ``` The server immediately disconnects after the authentication and I can't figure out why. In the meantime could you verify that if MaxScale free version can only be used with 1 MySQL server? Because that could be the reason behind all this. |
| Comment by markus makela [ 2023-05-10 ] |
|
No, the license if MaxScale does not affect its functionality and thus this is not caused by it. The MaxScale 23.02 license states:
I believe that less than three means either one or two servers. The Connection refused error suggests that there's a problem connecting to the database and the rejection happens at the network level. Oddly enough, it seems like the monitor is able to connect to the servers and thus the state of the servers seems to be OK. Can you verify that you can connect to the database nodes from the MaxScale host? SSH-ing over to the server and then using the mysql command line client to connect to the database using the root user should verify that the network between the nodes is functional. Another thing to check would be to see if there are multiple network interfaces on the MaxScale host. If so, you could try to bind to a specific interface using the global local_address parameter in MaxScale. Perhaps the network interface that the client connections from MaxScale use isn't able to connect to the database nodes and the connections used by the monitors in MaxScale somehow end up using a different network interface. |
| Comment by Kalana Wijethunga [ 2023-05-10 ] |
|
Hi, This brings something interesting to my attention. node1(working primary) is setup on my local machine. node2, node3 and <maxscale-host> are on actual enteprise servers. The authentication is done via SSH keys. So my personal key is added on each machine, but the machines does not share each other's keys. This leads to the fact that I can SSH from node1 to any of the machines but I cant SSH from any other machine to a different machine because the keys are not configured. However, I can try directly connecting via MySQL -h command to each machine from the node2 which is working fine. At the moment I am reluctant to install mysql-client libraries on <maxscale-host> as it looks like the limitation of 2 servers will stop me from evaluating the product further. About trying out the local_address, I got the following error on log. I believe the issue could be here. The server machines are not configured to use ipv6 and they use ipv4 only. The reason could be why it can connect to node1(my personal machine) is because it has a ipv4 but the other machines do not. Is there a way to force MaxScale to use ipv4 instead of ipv6 ? ``` |
| Comment by markus makela [ 2023-05-10 ] |
|
I don't know why or how you have a user with <> as the host but that definitely doesn't seem like something that's a valid network address. What's the value of local_client you used? So far nothing indicates that IPv6 is the problem. You can bind the listening ports of MaxScale to the IPv4 all interface by using address=0.0.0.0 in the listener sections. |
| Comment by Kalana Wijethunga [ 2023-05-10 ] |
|
sorry for the confusion. I replaced ip addresses with <> for security reasons. Please see a more sensible log below after adding address = 0.0.0.0 to the listeners and setting ipv4 address of maxscale host to `local_address` when initiating the call to maxscale host from a mysql client in my local machine(node1 ``` |
| Comment by markus makela [ 2023-05-10 ] |
|
Are you using hostnames for the address parameters in the server definitions? If possible, could you try if switching them to IP addresses has any effect? |
| Comment by Kalana Wijethunga [ 2023-05-10 ] |
|
Oh ! shoot. I should have check this at the beginning. Yup, using IPs resolved the problem which means that this should be related to DNS. Thanks a lot for the pointers |
| Comment by markus makela [ 2023-05-10 ] |
|
Thanks for reporting back. I think we can improve some of the error and info level messages and maybe even show the IP addresses that hostnames resolve to, hopefully this will help with any future DNS related problems. |
| Comment by markus makela [ 2023-05-10 ] |
|
I'll close this as Not a Bug since it turned out to be a DNS related issue. |
| Comment by Kalana Wijethunga [ 2023-05-10 ] |
|
Yes, I think improving logging would reduce a lot of debugging effort. Can you let me know how I can check the the server limitation of 2 servers. I now have 3 servers being monitored by the maxscale monitor. |
| Comment by markus makela [ 2023-05-10 ] |
|
The readconnroute module only does load balancing at the connection level. When a client connects to the readconnroute service, it picks a server and creates a connection there. From that point onwards, the readconnroute module does no load balancing and routes every request, regardless of its type, to the same server. The readwritesplit module is more versatile and is capable of connecting to multiple servers and routing reads to different nodes based on various load conditions. This is usually what you want since the load conditions at the connection creation time are not always a good representation of the load conditions later on. The readwritesplit router looks at the load conditions of the replicas before each query and routes it to the one that is the leasy busy. This sort of dynamic load balancing is especially important for connections that exist for longer periods of time (e.g. client-side connection pools like HikariCP) but it is equally as important for proper load distribution even for short connections. Now that the connection problems have been fixed, I'd recommend using the readwritesplit module if you expect individual requests to be load balanced across multiple nodes. Another thing to keep in mind while testing is that if you only have one client connection, the traffic can end up going only to a single node since the same node always appers as the best candidate. For a proper load balancing test, you need more than one client connection and some load to properly stress the system. A program like mysqlslap or sysbench are useful tools for testing load balancing. |
| Comment by Kalana Wijethunga [ 2023-05-10 ] |
|
Thanks a lot for the pointers and detailed explanations marcus. I will try them out. |