[MXS-2594] Enabling use_priority for not set priority on server level triggers an election Created: 2019-07-08  Updated: 2020-01-08  Resolved: 2019-07-20

Status: Closed
Project: MariaDB MaxScale
Component/s: galeramon
Affects Version/s: 2.3.9
Fix Version/s: 2.2.22, 2.3.10, 2.4.1

Type: Bug Priority: Major
Reporter: Wagner Bianchi (Inactive) Assignee: markus makela
Resolution: Fixed Votes: 0
Labels: None


 Description   

Folks,

Working on a pre-production environment where I'm setting up a new environment with a fresh MaxScale 2.3.9 and a MariaDB Cluster 10.2 (on Debian 9.7) I found an interesting situation I got curious about. The MariaDB Cluster already running has three nodes and below are their sweep_local_index before starting up with MaxScale implementation:

changed: [prod_mariadb03] => changed=true
  cmd: mysql -e 'show global status like "wsrep_local_index"'
  stdout: |-
    Variable_name   Value
    wsrep_local_index       0
changed: [prod_mariadb02] => changed=true
  cmd: mysql -e 'show global status like "wsrep_local_index"'
  stdout: |-
    Variable_name   Value
    wsrep_local_index       1
changed: [prod_mariadb01] => changed=true
  cmd: mysql -e 'show global status like "wsrep_local_index"'
  stdout: |-
    Variable_name   Value
    wsrep_local_index       2

Follow the implementation steps, what was done is as follows:

1. created a basic configuration file with global and service definitions:

root@prod-maxscale01:~# cat /etc/maxscale.cnf
[maxscale]
threads                     = auto
log_augmentation            = 1
ms_timestamp                = 1
admin_host                  = 0.0.0.0
admin_port                  = 8989
 
[rwsplit-service]
type                        = service
router                      = readwritesplit
user                        = maxusr
password                    = A0FE98035CFA5EB978337B739E949878
causal_reads                = true
causal_reads_timeout        = 30
master_reconnection         = true
max_sescmd_history          = 1000
prune_sescmd_history        = true
master_failure_mode         = fail_on_write

2. created the cluster on MaxScale using the dynamic commands below:

#: task: creating the monitor
maxctrl create monitor replication-cluster-monitor galeramon --monitor-user=maxmon --monitor-password=AFB909850E7181E9906159CE45176FAD
 
#: task: configuring the monitor for the replication cluster
maxctrl alter monitor replication-cluster-monitor monitor_interval          500 
maxctrl alter monitor replication-cluster-monitor disk_space_threshold      /var/lib:85
maxctrl alter monitor replication-cluster-monitor disk_space_check_interval 1000
 
#: task: create a listener
maxctrl create listener rwsplit-service replication-rwsplit-listener 3306
 
#: task: create servers
maxctrl create server prod_mariadb01 10.136.88.50  3306
maxctrl create server prod_mariadb02 10.136.69.104 3306
maxctrl create server prod_mariadb03 10.136.79.28  3306
 
#: task: link servers with the service
maxctrl link service rwsplit-service prod_mariadb01
maxctrl link service rwsplit-service prod_mariadb02
maxctrl link service rwsplit-service prod_mariadb03
 
#: task: link servers with the monitor
maxctrl link monitor replication-cluster-monitor prod_mariadb01
maxctrl link monitor replication-cluster-monitor prod_mariadb02
maxctrl link monitor replication-cluster-monitor prod_mariadb03

And then, checking logs, I noticed that, after creating the .secrets file I forgot to workout the file's ownership (chown maxscale:maxscale) and I got silly errors, fixed after adjusting it. The case is that, after having the GaleraMon monitor reading the .secrets file, we can see below the servers coming up online on MaxScale:

2019-07-08 12:58:11.657   error  : (secrets_readKeys): Access for secrets file [/var/lib/maxscale/.secrets] failed. Error 13, Permission denied.
2019-07-08 12:58:12.258   notice : (secrets_readKeys): Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'.
2019-07-08 12:58:12.271   notice : (post_tick): Found cluster members
2019-07-08 12:58:12.272   notice : (mon_log_state_change): Server changed state: prod_mariadb01[10.136.88.50:3306]: slave_up. [Auth Error, Down] -> [Slave, Synced, Running]
2019-07-08 12:58:12.272   notice : (mon_log_state_change): Server changed state: prod_mariadb02[10.136.69.104:3306]: slave_up. [Auth Error, Down] -> [Slave, Synced, Running]
2019-07-08 12:58:12.273   notice : (mon_log_state_change): Server changed state: prod_mariadb03[10.136.79.28:3306]: master_up. [Auth Error, Down] -> [Master, Synced, Running]

The configurations created for persisting dynamic commands for servers are below:

[prod_mariadb01]
type=server
port=3306
extra_port=0
persistpoolmax=0
persistmaxtime=0
proxy_protocol=false
ssl=false
ssl_version=MAX
ssl_cert_verify_depth=9
ssl_verify_peer_certificate=true
protocol=mariadbbackend
address=10.136.88.50
 
[prod_mariadb02]
type=server
port=3306
extra_port=0
persistpoolmax=0
persistmaxtime=0
proxy_protocol=false
ssl=false
ssl_version=MAX
ssl_cert_verify_depth=9
ssl_verify_peer_certificate=true
protocol=mariadbbackend
address=10.136.69.104
 
[prod_mariadb03]
type=server
port=3306
extra_port=0
persistpoolmax=0
persistmaxtime=0
proxy_protocol=false
ssl=false
ssl_version=MAX
ssl_cert_verify_depth=9
ssl_verify_peer_certificate=true
protocol=mariadbbackend
address=10.136.79.28

All right, that's good. However, at this point, I felt like enabling priorities, so I can better handle who is the next when the current master should fail:

root@prod-maxscale01:~# maxctrl alter monitor replication-cluster-monitor use_priority true
OK

Great, I enabled the priorities for the monitor (galeramon), awesome. And then, tailing logs, I see:

2019-07-08 12:59:34.832   notice : (do_alter_monitor): Updated monitor 'replication-cluster-monitor': use_priority=true
2019-07-08 12:59:34.838   notice : (load_server_journal): Loaded server states from journal file: /var/lib/maxscale/replication-cluster-monitor/monitor.dat
2019-07-08 12:59:34.850   notice : (mon_log_state_change): Server changed state: prod_mariadb01[10.136.88.50:3306]: new_master. [Slave, Synced, Running] -> [Master, Synced, Running]
2019-07-08 12:59:34.851   notice : (mon_log_state_change): Server changed state: prod_mariadb03[10.136.79.28:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]

I haven't set priories per server yet as I was planning to enable the use_priotities on the monitor side and then, set the priorities for servers. My current master was prod_mariadb03 and I was planning to set priorities like below (after enabling the use_priorities on monitor):

# prod_mariadb03 wsrep_cluster_index=0 priority=1
# prod_mariadb03 wsrep_cluster_index=1 priority=2
# prod_mariadb01 wsrep_cluster_index=2 priority=3
--
maxctrl alter server prod_mariadb03 priority 1
maxctrl alter server prod_mariadb02 priority 2
maxctrl alter server prod_mariadb01 priority 3

This way, I was attempting to avoid an election to be triggered. By the way, just enabling the priorities, an election was triggered without priorities set per server.

Following the theory behind this, if you don't set priorities, the galeramon will elect master based on the wsrep_local_index, and if I enable the priorities usage on the monitor side, it does not change the wsrep_local_index neither [should] add server's priorities underneath, am I right? Is that expected to have an election in this case?

Thanks!


Generated at Thu Feb 08 04:15:16 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.