Details
Description
If galeramon is configured with use_priority=true and the node with the second best priority has wsrep_local_index=0 , the Master Stickiness label is not set for that node. This is because the code checks for a non-zero local index before assigning the label.
Original description:
In previous versions of MaxScale (specifically 2.3.12), the master stickiness feature worked by setting a priority using the use_priority option. When a server with priority 1 failed, the system would switch to the next server and apply stickiness to it. Currently, master stickiness is determined by wsrep_local_index, which does not meet our requirements. We request the addition of a configuration option to manage this behavior.
Steps to Reproduce:
- Set up a MaxScale configuration with multiple servers, each assigned a priority.
- Configure the system to use master stickiness based on priority (use_priority).
- Simulate a failure of the server with priority 1.
- Observe that the system no longer applies stickiness based on the set priority, but instead uses wsrep_local_index.
Expected Behavior:
The system should allow the configuration of master stickiness behavior through a standard option, enabling the use of priorities as in version 2.3.12. This should be clearly documented and easily configurable.
Proposed Solution:
- Introduce a new configuration option, such as master_stickiness_priority, in the MaxScale configuration file.
- Update the stickiness logic to allow the option to use priority-based stickiness or the current wsrep_local_index method.
- Ensure that the new option is properly documented in the MaxScale configuration guide.
Additional Information:
In version 2.3.12, the master stickiness feature worked by setting a priority with use_priority. When the server with priority 1 failed, the system would switch and apply stickiness to the next server based on priority. This behavior should be reintroduced with a configurable option to manage the stickiness mechanism.
We have implemented a workaround by removing the wsrep_local_index check and adding priority-based checks in our setup. This fix was made in the file MaxScale/server/modules/monitor/galeramon/galeramon.cc with the following modification:
if (m_config.disable_master_failback && server->server->is_master()) |
{
|
int highest_priority = INT_MAX; |
MonitorServer* highest_priority_server = nullptr;
|
|
for (const auto& [srv, info] : m_prev_info) |
{
|
if (srv->server->is_running() && srv->server->priority() < highest_priority) |
{
|
highest_priority = srv->server->priority();
|
highest_priority_server = srv;
|
}
|
}
|
|
if (highest_priority_server != server) |
{
|
states.push_back("Master Stickiness"); |
}
|
}
|
Rationale:
Providing a configuration option for master stickiness behavior based on server priority enhances flexibility and allows users to maintain desired operational behavior. This change aligns with best practices and meets specific use case requirements, thereby improving the overall reliability and usability of the MaxScale server management.