[MXS-2594] Enabling use_priority for not set priority on server level triggers an election - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 2.3.9
Fix Version/s: 2.2.22, 2.3.10, 2.4.1
Component/s: galeramon
Labels:
None

Description

Folks,

Working on a pre-production environment where I'm setting up a new environment with a fresh MaxScale 2.3.9 and a MariaDB Cluster 10.2 (on Debian 9.7) I found an interesting situation I got curious about. The MariaDB Cluster already running has three nodes and below are their sweep_local_index before starting up with MaxScale implementation:

changed: [prod_mariadb03] => changed=true

  cmd: mysql -e 'show global status like "wsrep_local_index"'

  stdout: |-

    Variable_name   Value

    wsrep_local_index       0

changed: [prod_mariadb02] => changed=true

  cmd: mysql -e 'show global status like "wsrep_local_index"'

  stdout: |-

    Variable_name   Value

    wsrep_local_index       1

changed: [prod_mariadb01] => changed=true

  cmd: mysql -e 'show global status like "wsrep_local_index"'

  stdout: |-

    Variable_name   Value

    wsrep_local_index       2

Follow the implementation steps, what was done is as follows:

1. created a basic configuration file with global and service definitions:

root@prod-maxscale01:~# cat /etc/maxscale.cnf

[maxscale]

threads                     = auto

log_augmentation            = 1

ms_timestamp                = 1

admin_host                  = 0.0.0.0

admin_port                  = 8989

[rwsplit-service]

type                        = service

router                      = readwritesplit

user                        = maxusr

password                    = A0FE98035CFA5EB978337B739E949878

causal_reads                = true

causal_reads_timeout        = 30

master_reconnection         = true

max_sescmd_history          = 1000

prune_sescmd_history        = true

master_failure_mode         = fail_on_write

2. created the cluster on MaxScale using the dynamic commands below:

#: task: creating the monitor

maxctrl create monitor replication-cluster-monitor galeramon --monitor-user=maxmon --monitor-password=AFB909850E7181E9906159CE45176FAD

#: task: configuring the monitor for the replication cluster

maxctrl alter monitor replication-cluster-monitor monitor_interval          500

maxctrl alter monitor replication-cluster-monitor disk_space_threshold      /var/lib:85

maxctrl alter monitor replication-cluster-monitor disk_space_check_interval 1000

#: task: create a listener

maxctrl create listener rwsplit-service replication-rwsplit-listener 3306

#: task: create servers

maxctrl create server prod_mariadb01 10.136.88.50  3306

maxctrl create server prod_mariadb02 10.136.69.104 3306

maxctrl create server prod_mariadb03 10.136.79.28  3306

#: task: link servers with the service

maxctrl link service rwsplit-service prod_mariadb01

maxctrl link service rwsplit-service prod_mariadb02

maxctrl link service rwsplit-service prod_mariadb03

#: task: link servers with the monitor

maxctrl link monitor replication-cluster-monitor prod_mariadb01

maxctrl link monitor replication-cluster-monitor prod_mariadb02

maxctrl link monitor replication-cluster-monitor prod_mariadb03

And then, checking logs, I noticed that, after creating the .secrets file I forgot to workout the file's ownership (chown maxscale:maxscale) and I got silly errors, fixed after adjusting it. The case is that, after having the GaleraMon monitor reading the .secrets file, we can see below the servers coming up online on MaxScale:

2019-07-08 12:58:11.657   error  : (secrets_readKeys): Access for secrets file [/var/lib/maxscale/.secrets] failed. Error 13, Permission denied.

2019-07-08 12:58:12.258   notice : (secrets_readKeys): Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'.

2019-07-08 12:58:12.271   notice : (post_tick): Found cluster members

2019-07-08 12:58:12.272   notice : (mon_log_state_change): Server changed state: prod_mariadb01[10.136.88.50:3306]: slave_up. [Auth Error, Down] -> [Slave, Synced, Running]

2019-07-08 12:58:12.272   notice : (mon_log_state_change): Server changed state: prod_mariadb02[10.136.69.104:3306]: slave_up. [Auth Error, Down] -> [Slave, Synced, Running]

2019-07-08 12:58:12.273   notice : (mon_log_state_change): Server changed state: prod_mariadb03[10.136.79.28:3306]: master_up. [Auth Error, Down] -> [Master, Synced, Running]

The configurations created for persisting dynamic commands for servers are below:

[prod_mariadb01]

type=server

port=3306

extra_port=0

persistpoolmax=0

persistmaxtime=0

proxy_protocol=false

ssl=false

ssl_version=MAX

ssl_cert_verify_depth=9

ssl_verify_peer_certificate=true

protocol=mariadbbackend

address=10.136.88.50

[prod_mariadb02]

type=server

port=3306

extra_port=0

persistpoolmax=0

persistmaxtime=0

proxy_protocol=false

ssl=false

ssl_version=MAX

ssl_cert_verify_depth=9

ssl_verify_peer_certificate=true

protocol=mariadbbackend

address=10.136.69.104

[prod_mariadb03]

type=server

port=3306

extra_port=0

persistpoolmax=0

persistmaxtime=0

proxy_protocol=false

ssl=false

ssl_version=MAX

ssl_cert_verify_depth=9

ssl_verify_peer_certificate=true

protocol=mariadbbackend

address=10.136.79.28

All right, that's good. However, at this point, I felt like enabling priorities, so I can better handle who is the next when the current master should fail:

root@prod-maxscale01:~# maxctrl alter monitor replication-cluster-monitor use_priority true

OK

Great, I enabled the priorities for the monitor (galeramon), awesome. And then, tailing logs, I see:

2019-07-08 12:59:34.832   notice : (do_alter_monitor): Updated monitor 'replication-cluster-monitor': use_priority=true

2019-07-08 12:59:34.838   notice : (load_server_journal): Loaded server states from journal file: /var/lib/maxscale/replication-cluster-monitor/monitor.dat

2019-07-08 12:59:34.850   notice : (mon_log_state_change): Server changed state: prod_mariadb01[10.136.88.50:3306]: new_master. [Slave, Synced, Running] -> [Master, Synced, Running]

2019-07-08 12:59:34.851   notice : (mon_log_state_change): Server changed state: prod_mariadb03[10.136.79.28:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running]

I haven't set priories per server yet as I was planning to enable the use_priotities on the monitor side and then, set the priorities for servers. My current master was prod_mariadb03 and I was planning to set priorities like below (after enabling the use_priorities on monitor):

# prod_mariadb03 wsrep_cluster_index=0 priority=1

# prod_mariadb03 wsrep_cluster_index=1 priority=2

# prod_mariadb01 wsrep_cluster_index=2 priority=3

--

maxctrl alter server prod_mariadb03 priority 1

maxctrl alter server prod_mariadb02 priority 2

maxctrl alter server prod_mariadb01 priority 3

This way, I was attempting to avoid an election to be triggered. By the way, just enabling the priorities, an election was triggered without priorities set per server.

Following the theory behind this, if you don't set priorities, the galeramon will elect master based on the wsrep_local_index, and if I enable the priorities usage on the monitor side, it does not change the wsrep_local_index neither [should] add server's priorities underneath, am I right? Is that expected to have an election in this case?

Thanks!

Attachments

Activity

People

Assignee:: markus makela

Reporter:: Wagner Bianchi (Inactive)

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Dates

Created:: 2019-07-08 13:40

Updated:: 2020-01-08 08:05

Resolved:: 2019-07-20 06:54

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.