Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6.20
-
None
-
None
-
rhel 8
Description
We have a Galera replication cluster with 2 DB nodes and 1 arbitractor. We use Haproxy to redirect all transaction to DB node 1 as a primary. While DB node 2 serves as a slave.
Scenario:
- We have jMeter sending transactions to DB node 1. It keeps running during the scenario.
- Then after 10 min running the jMeter, we shutdown DB node 2 VM.
- Wait 30 min.
- Resume DB node 2 VM and it's DB server.
- DB node 2 starts recovering with IST.
- We monitor the writeset queue size and flow control in DB node 2. We have gcs.fc_limit=500.
- The writeset queue size is increasing and increase to about 1 million after about 10 min. There is no flow control happened.
- Then the writeset queue size starts decreasing (I guess it is processing the IST).
- Before DB node 2 finishes IST, we notice that it triggers flow control from time to time.
I would like to know the flow control triggered during the IST is normal or not. My concern is that it doesn't trigger flow control when the writeset queue is increasing and exceeded the fc_limit but it will have flow control when the queue is decreasing. Also I would be appreciated if someone can explain the machanism of Galera recovering (IST and SST) like this situation as I can't find related information from the web.