[MXS-3328] Persistent Connections on Maxscale 2.5 seem to break client authentication Created: 2020-12-08  Updated: 2020-12-11  Resolved: 2020-12-11

Status: Closed
Project: MariaDB MaxScale
Component/s: mariadbbackend, readwritesplit
Affects Version/s: 2.5.5
Fix Version/s: 2.5.6

Type: Bug Priority: Major
Reporter: George Diamantopoulos Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: galera
Environment:

Debian Buster 10.7


Sprint: MXS-SPRINT-121

 Description   

I'm seeing weird authentication failures for clients with 2.5.5 and a 3-node mariadb 10.5.8 galera cluster as backend in a testing setup when persistent connections are enabled. This doesn't seem to happen with 2.4.x.

The issue manifests as follows. Soon after receiving connections, newer client connections start to fail. In the log the following is logged:

2020-12-08 09:04:01   error  : (79) Invalid authentication message from backend 'gal3.lab'. Error code: 1045, Msg : #28000: Access denied for user 'redmine'@'10.0.63.186' (using password: YES)

However there's no mysql client at 10.0.63.186 configured to use the username 'redmine'. This user is used in a client on another IP. Similarly, errors appear for that other user as well. In short, it seems that usernames and hosts are mixed up by maxscale when using persistent connections.

Here's the relevant configuration used for this test:

General:

[maxscale]
threads=auto
admin_host=127.0.0.1
admin_secure_gui=false

Servers:

[gal1]
type=server
address=gal1.lab
port=3306
persistpoolmax=300
persistmaxtime=3600s
proxy_protocol=on
ssl=true
ssl_verify_peer_certificate=true
ssl_verify_peer_host=true
ssl_ca_cert=/etc/maxscale/ssl/gal1.lab_chained_IR.crt
 
[gal2]
type=server
address=gal2.lab
port=3306
persistpoolmax=300
persistmaxtime=3600s
proxy_protocol=on
ssl=true
ssl_verify_peer_certificate=true
ssl_verify_peer_host=true
ssl_ca_cert=/etc/maxscale/ssl/gal2.lab_chained_IR.crt
 
[gal3]
type=server
address=gal3.lab
port=3306
persistpoolmax=300
persistmaxtime=3600s
proxy_protocol=on
ssl=true
ssl_verify_peer_certificate=true
ssl_verify_peer_host=true
ssl_ca_cert=/etc/maxscale/ssl/gal3.lab_chained_IR.crt

Monitors:

[Galera-Monitor]
type=monitor
module=galeramon
servers=gal1, gal2, gal3
user=maxscale-monitor
password=*************
monitor_interval=2000ms
available_when_donor=true

Routers:

[Read-Write-Service]
type=service
router=readwritesplit
servers=gal1, gal2, gal3
user=maxscale
password=****************
master_accept_reads=true
connection_keepalive=300s
master_reconnection=true
master_failure_mode=fail_on_write
max_sescmd_history=1500
prune_sescmd_history=true
session_track_trx_state=true

Listeners:

[Read-Write-Listener]
type=listener
service=Read-Write-Service
protocol=MariaDBClient
address=10.0.63.250
port=3306

Lastly, I was wondering whether MXS-3275 is in anyway related to this?

I'm available for more testing if needed. Thank you.



 Comments   
Comment by markus makela [ 2020-12-09 ]

One possibility is that proxy_protocol is related to this. If you can test whether the behavior changes when you disable it, we'd know more.

Comment by George Diamantopoulos [ 2020-12-09 ]

One possibility is that proxy_protocol is related to this. If you can test whether the behavior changes when you disable it, we'd know more.

I'm not sure it makes much sense to test for this. The way I understand it, with proxy_protocol disabled, all backend connections will appear to mariadb as originating from maxscale's IP, so the user/host mixup won't happen anyway. Right?

Comment by markus makela [ 2020-12-09 ]

Yeah, I think this would just be to confirm that it's the problem we expect: proxy_protocol causes the client IP to remain the same it is when the connection is originally created.

I think we're forced to change it in the current releases so that when proxy_protocol is configured for a server, we have the persistent connection pool is disabled. This will prevent this problem from happening alongside with any other possible problems.

To make it possible to use pooled connections with proxy_protocol, we'll have to partition the connections by the client IP.

Comment by George Diamantopoulos [ 2020-12-10 ]

Thank you for clarifying. Here's some more information revealed during testing:

  • Indeed with proxy_protocol=off, we weren't able to reproduce this

To sum up, the following tests have been performed and are presented here for your convenience:

  • Maxscale 2.5.5 + Mariadb 10.5.8, persistent connections enabled, proxy protocol enabled: issue present
  • Maxscale 2.5.5 + Mariadb 10.4.17, persistent connections enabled, proxy protocol enabled: issue present
  • Maxscale 2.5.5 + Mariadb 10.4.17, persistent connections enabled, proxy protocol DISABLED: issue NOT present
  • Maxscale 2.4.14 + Mariadb 10.4.17, persistent connections enabled, proxy protocol enabled: issue NOT present

So if I understand correctly, persistent connections were never meant to be used with proxy protocol at the same time? Or is this something that was introduced during Maxscale 2.5 development?

Comment by markus makela [ 2020-12-10 ]

Yes, persistent connections should never be mixed with proxy protocol. This should also be the case for 2.4 and it is odd that your testing didn't reveal the same problem there as well (unless it's a typo).

Comment by George Diamantopoulos [ 2020-12-10 ]

Nope, not a typo, I also have no explanation for this if that's the case...

Comment by markus makela [ 2020-12-10 ]

Actually, this should work in 2.4 as it still partitioned the persistent connections by the username and client IP address. This was removed in 2.5 as this partitioning is mostly useless as a COM_CHANGE_USER is sent to change the user. Partitioning the pool by username and IP only reduces the effectiveness of it in normal use-cases. The only exception where this partitioning is required would be when proxy_protocol is on but this was not known at the time the change in 2.5 was done.

I think this is something we should fix in 2.5 by conditionally partitioning the pool if proxy_protocol is on. Since this is what 2.4 does in all cases, the version in 2.5 should still be a performance improvement.

Generated at Thu Feb 08 04:20:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.