[MXS-3275] Strange authentication errors with persistent connections after upgrade to 2.5.5 Created: 2020-11-01  Updated: 2021-03-31  Resolved: 2021-03-31

Status: Closed
Project: MariaDB MaxScale
Component/s: Authenticator
Affects Version/s: 2.5.5
Fix Version/s: 2.5.9

Type: Bug Priority: Major
Reporter: Tim Westervoorde Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None

Sprint: MXS-SPRINT-121

 Description   

A lot of errors are logged after upgrading to 2.5.5 (from 2.3.x)

MariaDB version is 10.3.25

2020-11-01 16:30:24   error  : (511660) Invalid authentication message from backend 'db3'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:30:24   error  : (511660) Invalid authentication message from backend 'db2'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:30:25   error  : (511660) Invalid authentication message from backend 'db3'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:30:42   error  : (512393) Invalid authentication message from backend 'db3'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:30:42   error  : (512393) Invalid authentication message from backend 'db2'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:30:43   error  : (512393) Invalid authentication message from backend 'db3'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:30:47   error  : (512527) Invalid authentication message from backend 'db2'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:30:47   error  : (512527) Invalid authentication message from backend 'db3'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:31:02   error  : (513425) Invalid authentication message from backend 'db3'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:31:02   error  : (513425) Invalid authentication message from backend 'db2'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:31:02   error  : (513426) Invalid authentication message from backend 'db3'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:31:02   error  : (513429) Invalid authentication message from backend 'db3'. Error code: 1047, Msg : #08S01: Unknown command
2020-11-01 16:31:02   error  : (513426) Invalid authentication message from backend 'db2'. Error code: 1047, Msg : #08S01: Unknown command

I have virtually no clue why I'm getting these, most connections seem to work fine.



 Comments   
Comment by markus makela [ 2020-11-02 ]

Can you attach your maxscale.cnf with all sensitive information removed?

Comment by Tim Westervoorde [ 2020-11-02 ]

[maxscale]
threads=auto
local_address=10.10.x.x
query_classifier_cache_size=100M
 
[db1]
type=server
address=10.10.x.x
port=3307
protocol=MariaDBBackend
proxy_protocol=on
persistpoolmax=50
persistmaxtime=900
 
[db2]
type=server
address=10.10.x.x
port=3307
protocol=MariaDBBackend
proxy_protocol=on
persistpoolmax=50
persistmaxtime=900
 
[db3]
type=server
address=10.10.x.x
port=3307
protocol=MariaDBBackend
proxy_protocol=on
persistpoolmax=50
persistmaxtime=900
 
[Galera-Monitor]
type=monitor
module=galeramon
servers=db1,db2,db3
user=maxscale
password=xxx
monitor_interval=2000
available_when_donor=true
disable_master_failback=true
 
[Read-Write-Service]
type=service
router=readwritesplit
servers=db1,db2,db3
user=maxscale
password=xxxx
connection_keepalive=30
max_sescmd_history=50
disable_sescmd_history=false
master_accept_reads=false
master_failure_mode=fail_on_write
enable_root_user=true
master_reconnection=true
prune_sescmd_history=true
session_track_trx_state=true
causal_reads=true
causal_reads_timeout=2
 
[Read-Write-Listener]
type=listener
service=Read-Write-Service
protocol=MariaDBClient
port=3306

Comment by markus makela [ 2020-11-02 ]

Does the error message go away or change to something else if you remove proxy_protocol=on? Another thing to test would be to disable both persistpoolmax and persistmaxtime.

Comment by Tim Westervoorde [ 2020-11-02 ]

proxy_protocol=off makes no difference

proxy_protocol=on and persistpoolmax+persistmaxtime removed from the config fixes the error, so it's somehow related to persistent connections.

Comment by markus makela [ 2020-11-02 ]

What version of MariaDB are you using with MaxScale?

Comment by Tim Westervoorde [ 2020-11-02 ]

MariaDB version is 10.3.25

Comment by Tim Westervoorde [ 2020-11-04 ]

been running on 2.5.5 for 3 days without persistpoolmax+persistmaxtime without having this issue, so it's definitely related. queryload was approx. 3500q/s 90% write, 10% read

Comment by markus makela [ 2020-11-04 ]

OK, that suggests there's something wrong with the resetting of the connection when it is taken out of the connection pool.

Comment by Anoop P Alias [ 2021-02-25 ]

The same issue with module=mariadbmon also

it seems the persistpoolmax+persistmaxtime in the server is causing the error in all monitors in 2.5.5

The issue is not there in Maxscale 2.4.x and 2.3.x

So either a downgrade of Maxscale or commenting out the persistpoolmax+persistmaxtime fix the error for now

Comment by markus makela [ 2021-03-11 ]

Have you had a chance to upgrade to the latest 2.5 release of MaxScale? I'd be interested in knowing if you still see this problem.

Comment by Tim Westervoorde [ 2021-03-15 ]

Running 2.5.9 for some minutes with persistent connections enabled and haven't seen the issue. Will monitor for some more time and let you know some stats / info later. So far so good.

Comment by Tim Westervoorde [ 2021-03-15 ]

Errors are gone running stable for some hours now. I do have a question though

connection_pool_empty 788003
connections 1446
max_connections 1596
max_pool_size 12
persistent_connections 99
reused_connections 326937
total_connections 1114940

does this mean my persistent settings are too low (100 conn, 15min) because conn_pool_empty is high compared to reused_connections?

Comment by markus makela [ 2021-03-16 ]

I think the problem of low pool usage is already reported as MXS-3141. Since this issue was about the pool not working at all, I'd recommend reporting it there as well.

If you can, let us know how how the latest release behaves over a longer period. I'll leave the issue open for now.

Comment by Tim Westervoorde [ 2021-03-29 ]

haven't seen the auth issues any more, uptime 14 days.

connection_pool_empty 41540154
connections 1444
max_connections 2091
max_pool_size 13
persistent_connections 100
reused_connections 17979572

how do I interpret these numbers?

Comment by markus makela [ 2021-03-29 ]

connection_pool_empty is the number of times a new connection needed to be created and reused_connections is how often a connection was taken from the pool. There are a couple of reasons why a new connection would be created even there are connections in the pool. In the case of proxy_protocol, the address must match to the one in the connection pool. This is done to prevent clients from a certain IP appearing to be from another.

connections is the current number of connections and max_connections is the historical maximum of the current number of connections. max_pool_size is the maximum number of connections that were simultaneously in the pool and persistent_connections is the current number of connections that are in the pool.

Comment by Tim Westervoorde [ 2021-03-29 ]

about those last two, persistent_connections is a static number and max_pool_size the current available?

Comment by markus makela [ 2021-03-30 ]

persistent_connections is the current pool size (a bit of an unfortunate name for the field, should've been current_pool_size) and max_pool_size is the maximum number ever reached. The latter should eventually be the same value as persistpoolmax in the configuration file.

Comment by Tim Westervoorde [ 2021-03-30 ]

then how can max_pool_size < persistent_connections ? or am i missing something

Comment by markus makela [ 2021-03-31 ]

I missed that part and it definitely looks strange: I think the pool maximum might be for a single thread and the total count might be for all the threads. If you can, could you open a separate bug report for this?

Comment by markus makela [ 2021-03-31 ]

I'll close this as Fixed in 2.5.9 since you haven't seen any problems with that release. If you see any problems, let us know and we'll reopen this.

Generated at Thu Feb 08 04:20:14 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.