[MXS-3331] Could not bind connecting socket to local address Created: 2020-12-09  Updated: 2021-09-12  Resolved: 2021-09-02

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: 2.5.5
Fix Version/s: 2.5.16

Type: Bug Priority: Major
Reporter: Michal Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None
Environment:

docker container - buster, mariadb backends 10.3.25


Sprint: MXS-SPRINT-139

 Description   

Hello,

We tried to setup maxscale 2.5.5 on production after successfully setuped and working maxscale in LAB env. Test env is same as production env.

We had N maxscales sitting on some servers and HA was fullfilled by keepalived.
When traffic was delegated to keepalived VIP -> maxscale, this was appearing in LOG

keepalived's vip : 192.168.205.254
maxscale's ip : 192.168.205.10

LOG :

2020-12-09 04:55:20   error  : (3331) (server01) Could not bind connecting socket to local address "192.168.205.10", connecting to server using default local address: Address already in use
2020-12-09 04:55:20   error  : (3331) (server02) Could not bind connecting socket to local address "192.168.205.10", connecting to server using default local address: Address already in use
2020-12-09 04:55:21   error  : (3332) (server03) Could not bind connecting socket to local address "192.168.205.10", connecting t ........
............
............
............
............
............

Config :

[maxscale]
local_address = 192.168.205.10
threads = 5
skip_permission_checks = true
query_retries = 5
admin_secure_gui = false
auth_connect_timeout = 10s
auth_read_timeout = 3s
auth_write_timeout = 6s
max_auth_errors_until_block = 0
admin_host = 192.168.205.10
admin_port = 8990
 
 
[server01]
type = server
address = 192.168.205.10
port = 3306
protocol = MariaDBBackend
 
[server02]
type = server
address = 192.168.205.11
port = 3306
protocol = MariaDBBackend
 
[server03]
type = server
address = 192.168.205.21
port = 3306
protocol = MariaDBBackend
 
[server04]
type = server
address = 192.168.205.20
port = 3306
protocol = MariaDBBackend
 
 
[default-monitor]
type = monitor
module = galeramon
servers = server01,server02
user = root
password = OMMITED
monitor_interval = 2000ms
 
[shardA-monitor]
type = monitor
module = galeramon
servers = server03,server04
user = root
password = OMMITED
monitor_interval = 2000ms
 
 
[default-loadbalancer]
type = service
router = readconnroute
servers = server01,server02
router_options = master
user = root
password = OMMITED
enable_root_user = 1
connection_keepalive = 60s
 
[shardA-loadbalancer]
type = service
router = readconnroute
servers = server03,server04
router_options = master
user = root
password = OMMITED
enable_root_user = 1
connection_keepalive = 60s
 
[ShardRouter]
type = service
router = schemarouter
targets = default-loadbalancer,shardA-loadbalancer
user = root
password = OMMITED
enable_root_user = 1
auth_all_servers = 1
 
 
[ShardListener]
type = listener
service = ShardRouter
protocol = MariaDBClient
address = 192.168.205.254
port = 3306
 
[default-unix-listener]
type = listener
service = default-loadbalancer
protocol = MariaDBClient
socket = /var/lib/kolla/maxscale/default.sock
 
[shardA-unix-listener]
type = listener
service = shardA-loadbalancer
protocol = MariaDBClient
socket = /var/lib/kolla/maxscale/shardA.sock

  • Backends are always in pair of 2 (galera cluster)
  • On one cluster are SOME databases and on second OTHER databases
  • Users are everywhere without grants + with grants where DB is

Is this normal ?
Could be a bug ?

Regards,
Michal Arbet



 Comments   
Comment by markus makela [ 2020-12-09 ]

Does it work if you remove local_address from the config?

Comment by Michal [ 2020-12-09 ]

Well, as we were on production yesterday, we haven't tried to edit configuration from hand (and started to rollback maxscale feature).

What I can say is that yes, it's working with/without local_address on our LAB env, can't confirm that it's working on production ... Do you see some potentional problem from code ?

Per documentation is local_address just adress/interface which is used to create connection between maxscale and backend, isn't it ?

Comment by Michal [ 2020-12-09 ]

Only difference what I can see is that on LAB we have only interface, and on production Bond.

Comment by markus makela [ 2020-12-09 ]

Yes, it's the address that outbound connections bind to. It's only required if you need to use a specific interface. Most often it's not required.

Comment by markus makela [ 2020-12-09 ]

Looking at the code it seems the error is logged whenever the attempt to bind fails. Based on the actual error returned from the call, it looks like something is already bound to that address.

Comment by Michal [ 2020-12-09 ]

Well, yes of course, there are several services listening on that address, but is this problem ?
Shouldn't maxscale be able to choose only free IP:port ?

Comment by markus makela [ 2020-12-09 ]

My apologies, I only meant that the error states that something is bound to that specific address/port combination. It of course is possible to bind to the same network interface on different ports.

Comment by Michal [ 2020-12-09 ]

No problem, I just have no idea what to do with it .

Comment by markus makela [ 2020-12-09 ]

It's definitely not something we've seen before and I think we'll have to try and reproduce this on our side.

Comment by Michal [ 2020-12-09 ]

If I am correct allocating is controlled by https://man7.org/linux/man-pages/man2/bind.2.html , so allocating is controlled by OS , isn't int ?

Comment by markus makela [ 2021-08-30 ]

Have you had a chance to test if this still happens with the latest 2.5 release?

Comment by markus makela [ 2021-08-30 ]

Seems like this might be a socket number limitation being hit. How many connections on average are you seeing and how long do they last? This blog post as well as this one suggest that this would be the case. I think you can verify this by monitoring the amount of network sockets that are open when you see this error.

If this is indeed the case, the fix should be as simple as adding this:

@@ -609,6 +609,12 @@ int open_network_socket(enum mxs_socket_type type,
 
                         memcpy(&local_address, ai->ai_addr, ai->ai_addrlen);
 
+                        int one = 1;
+                        setsockopt(so, SOL_SOCKET, SO_REUSEADDR, &one, sizeof(one));
 
                         if (bind(so, (struct sockaddr*)&local_address, sizeof(local_address)) == 0)
                         {

Generated at Thu Feb 08 04:20:38 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.