[MDEV-20276] MariaDB-Galera 10.3 -> 10.4 | Startup time problem (hangs) Created: 2019-08-07 Updated: 2019-12-11 Resolved: 2019-12-11 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Galera, Galera SST, Replication, Server |
| Affects Version/s: | 10.4.7 |
| Fix Version/s: | N/A |
| Type: | Bug | Priority: | Critical |
| Reporter: | Florian Strankowski | Assignee: | Jan Lindström (Inactive) |
| Resolution: | Not a Bug | Votes: | 1 |
| Labels: | galera | ||
| Environment: |
CentOS 7 |
||
| Attachments: |
|
| Description |
|
We've upgraded from MariaDB 10.3 to the latest version of 10.4 (10.4.7). Once the upgrade has been completed we noticed some strange behavior during cluster-initialisation. No matter which node of our 7-node-cluster we start, it takes approx 8 minutes to get the node into the cluster, each time. Another test-cluster of ours has also inreased joining and sync times, but i'll stick to our production cluster here. The first time a "hung up" occurs is at the following point (see attached log): Aug 7 12:59:36 MariaDB-PROD-001 mysqld: 2019-08-07 12:59:36 2 [Note] WSREP: ####### drain monitors upto -1 Getting past this point takes 4 minutes and 33 seconds (273 seconds) The second time during the join-phase the problem occurs here: Aug 7 13:04:10 MariaDB-PROD-001 mysqld: 2019-08-07 13:04:10 0 [Note] WSREP: ####### drain monitors upto 0 This one took 4 minutes and 31 seconds (271 seconds), nearly exactly like the other one before o_Ô. So this might be something to work keep an eye on?! Same problems apply to all and every of our 7 node Cluster, also our 2nd cluster is affected. Regards |
| Comments |
| Comment by Thorsten Krohn [ 2019-08-16 ] | |||||||||
|
Hi, | |||||||||
| Comment by Thorsten Krohn [ 2019-09-06 ] | |||||||||
|
Hi, Can we provide more information to analyze the problem ? | |||||||||
| Comment by Jan Lindström (Inactive) [ 2019-09-06 ] | |||||||||
|
To me it looks like InnoDB is loading buffer pool context from disk:
In this example it used ~20s to load buffer pool. If this is problematic for your system, you could try disable buffer pool loading at startup. | |||||||||
| Comment by Thorsten Krohn [ 2019-09-06 ] | |||||||||
|
Hi, | |||||||||
| Comment by Jan Lindström (Inactive) [ 2019-09-06 ] | |||||||||
|
Can you try to run with --wsrep-debug=server and provide me a full error log. | |||||||||
| Comment by Florian Strankowski [ 2019-09-18 ] | |||||||||
|
I've attached the startup log including wsrep-debug enabled. Sorry for the delay though. | |||||||||
| Comment by Thorsten Krohn [ 2019-10-10 ] | |||||||||
|
Since we have repeatedly encountered problems with the multicast operation here, I have now changed the cluster to unicast operation. Now the nodes start again normally, without hangers. |