[MXS-2745] User loading is limited at startup Created: 2019-10-29  Updated: 2020-06-01  Resolved: 2020-06-01

Status: Closed
Project: MariaDB MaxScale
Component/s: Core
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Nedyalko Petrov (Inactive) Assignee: markus makela
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

SkySQL Dev/Test



 Description   

Original title: SkySQL:Data inconcistencies observed in replicas after upgrade to maxscale:2.4.2

After upgrade from maxscale:2.3.9 to maxscale to maxscale:2.4.2 in SkySQL, we started observing replicas data inconsistencies in the scenario of topology initialization followed by immediate data load.

Detail of these issues can be found here:
https://jira.mariadb.org/browse/DBAAS-1319
https://jira.mariadb.org/browse/DBAAS-1318

We did a bit of a research on our end investigating possible root cause in various moving parts - test clusters/topologies setup , operator/K8s, etc.
were not able to find anything obvious, apart from the maxscale migration from 2.3.9 to 2.4.2

So far we tried a couple of suggestion from the MaxScale team :

  • Consistent Critical Read Filter
  • Using Causal reads
    unfortunately without much of a success.

Please advice, if you need further detail on the issue or any other form of support we can provide you with to speed up the resolution process



 Comments   
Comment by markus makela [ 2019-10-29 ]

This might be caused by MXS-2443. Can you check whether the database is created in one connection and the tables are filled in another one? You can see the session number (MariaDB thread ID) in the log with log_info enabled.

Comment by Petko Vasilev (Inactive) [ 2019-10-29 ]

Further investigation suggests that this bug is not caused by replication.
I can reliably replicate it with a single mariadb server behind the maxscale.
If I do something like:

mysql -h <host> -P <port> -u <user> '-p<pass>' --ssl

and then

use sbtest;

I have no problems.

If I do

mysql -h <host> -P <port> -u <user> '-p<pass>' --ssl --database=sbtest

It almost always fails the first 1 or 2 tries.
Note, this only happens on a fresh install of maxscale and mariadb.
It does NOT happen if I do the calls directly to the mariadb server through 127.0.0.1 (so maxscale matters here).

I think this has something to do with auth/cred caching on the maxscale side. I don't know enough about it though.

Comment by markus makela [ 2019-10-29 ]

It might be that MaxScale for some reason fails to fetch the authentication data from the backend servers. Does the MaxScale log contain any errors or messages when this happens?

Comment by Petko Vasilev (Inactive) [ 2019-10-30 ]

2019-10-30 09:48:40   warning: (2) [Read-Write-Service] Refresh rate limit (once every 30 seconds) exceeded for load of users' table.
2019-10-30 09:48:40   warning: (2) [MariaDBAuth] Read-Write-Service: login attempt for user 'skysql_admin'@[10.0.4.4]:38782, authentication failed. Unknown database: sbtest

And again, it only happens if I start the mysql client with the --database=a parameter.
If I skip --database=a and do "USE a;" after that instead, it works correctly.

Comment by markus makela [ 2019-10-30 ]

I think there might be a problem where the user loading rate limit is triggered too early. If you add users_refresh_time=0 under the [maxscale] section, does it work?

Comment by Petko Vasilev (Inactive) [ 2019-10-30 ]

users_refresh_time=0 seems to work, thank you.
I'll check if it's a safe setting to have and remove my hacky workaround.
So the only thing left to say here is that whether it's a bug or not, the behavior is different from 2.3.9, where we had no such problem.

Comment by markus makela [ 2019-10-30 ]

From the documentation:

In MaxScale 2.3.9 and older versions, the minimum allowed value was 10 seconds but, due to a bug, the default value was 0 which allowed infinite refreshes.

Comment by markus makela [ 2019-10-30 ]

This is still probably a bug in MaxScale and how it limits the user loading at startup.

Comment by markus makela [ 2020-06-01 ]

There's already code in place that prevents the user loading rate limitation from triggerting shortly after startup which means this is most likely expected behavior and the rate limitation should be disabled.

Generated at Thu Feb 08 04:16:22 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.