[MXS-2503] `Failed to create new routing session` with valid slaves available Created: 2019-05-22  Updated: 2020-11-27  Resolved: 2019-07-04

Status: Closed
Project: MariaDB MaxScale
Component/s: readconnroute
Affects Version/s: 2.3.6
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Kadir Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: readconnroute
Environment:

FROM centos:7

ENV MAXSCALE_VERSION=2.3.6
RUN curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | bash -s – --mariadb-maxscale-version="${MAXSCALE_VERSION}" && \
yum -y install maxscale && \
yum clean all


Attachments: File stripped_console.logs     File stripped_maxctrl.log     File stripped_maxscale.config    

 Description   

I am getting following error even with an available slave:

error  : Failed to create new routing session. Couldn't find eligible candidate server. Freeing allocated resources.
error  : Failed to create new router session for service 'this-service'. See previous errors for more details.

When I query the state it shows me online:

// curl -u admin:mariadb localhost:8989/v1/servers/server1
...
"state": "Slave of External Server, Running",
...

Initially a readconnrouter was configured for a single slave only, no master declared. This didnt work so I tried many different combinations:

  • passive=true + removing monitor
  • include master in monitor
  • setting ignore_external=true
  • trying all combinations of router options: `master,slave`; `master`; 'slave'

None of them lead to "finding eligible candidate server". I cannot fully verify if this is a bug but all tested combinations didn't work on said server. If I try this configuration on an empty mysql server (docker-compose) it works flawlessly.

Some extra information about the database I point to:

  • AWS RDS MySQL 5.7.16
  • 3612 loaded users with many invalid subnets 255.255.252.0 (I set `skip_authentication=true`)

I also included the full config and logs.



 Comments   
Comment by markus makela [ 2019-05-22 ]

To just get it working, you can use router_options=running for readconnroute.

The actual problem looks to be with the state of the server being Slave of External Server instead of Slave which is probably explained by the fact that it's replicating from something. One option would be to change it so that readconnroute treats Slave of External Server the same way it treats Slave. The other option is to have the monitor assign the Slave state to that server in addition to Slave of External Server.

Comment by Kadir [ 2019-05-22 ]

Hi Markus, setting the router_options to running fixed the issue!

Do you mind me asking if the behavior I described is intended?

Comment by markus makela [ 2019-05-22 ]

It is expected behavior but not very desirable in this case. I think we'll have to review whether the Slave of External Server state should receive some special handling.

Comment by Richard Lane [ 2020-11-27 ]

Just note that "Slave of External Server" for us means that it is replicating from a remote data-center. It is NOT a Slave in the local cluster, but for us the local Master is also marked with "Slave of External Server".
I think it would confuse things to mark this a "Slave" also, since that would imply a Slave to a local monitored node, which it is not.

Generated at Thu Feb 08 04:14:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.