We have a total of 6 servers, running 2 Galera cluster, 1 cluster have 4 nodes, the other have 2 nodes being slave of replication of the first cluster. Not that it actually matters.
When any of those servers is restarted, it gets slow like hell for about 30-45 minutes. It seems to load all in memory before it gets ok.
Queries that will later take less than 0.1 seconds are taking betwen several seconds and several minutes. A lot of queries are stuck in 'closing table' or 'cache' or other states.
It's also slowing down the whole cluster dramatically causing our app to hang completely for several minutes, until I aggressivelly kill query processes started rom recurrent scripts.
When pausing for an hour and restarting replication, this problem gets quite apparent because every replicated query (running one at a time) gets stuck for several seconds running (insert/delete or update alike) and then stuck again in closing table for several seconds.
Note that replication was originally made to a single MariaDB server and the same symptoms were visible.
For nearly an hour, the Seconds_Behind_Master is increasing until this resolves itself.
This is creating huge problem when we try to maintain/update servers as it hangs our applications.
See attached our current configuration file, everything else being set to default. We started with a blank configuration file and the problem was already there, tweaked many/some settings to no avail. Tried duplicating the settings from our old servers (that were not restarted) didn't make a difference.
Already played wiht the followign settings to no avail:
Is there any other settings that affect initial server start and could improve this situation?