Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
2.3.9
-
None
Description
Folks,
Working on a pre-production environment where I'm setting up a new environment with a fresh MaxScale 2.3.9 and a MariaDB Cluster 10.2 (on Debian 9.7) I found an interesting situation I got curious about. The MariaDB Cluster already running has three nodes and below are their sweep_local_index before starting up with MaxScale implementation:
changed: [prod_mariadb03] => changed=true |
cmd: mysql -e 'show global status like "wsrep_local_index"' |
stdout: |-
|
Variable_name Value
|
wsrep_local_index 0 |
changed: [prod_mariadb02] => changed=true |
cmd: mysql -e 'show global status like "wsrep_local_index"' |
stdout: |-
|
Variable_name Value
|
wsrep_local_index 1 |
changed: [prod_mariadb01] => changed=true |
cmd: mysql -e 'show global status like "wsrep_local_index"' |
stdout: |-
|
Variable_name Value
|
wsrep_local_index 2 |
Follow the implementation steps, what was done is as follows:
1. created a basic configuration file with global and service definitions:
root@prod-maxscale01:~# cat /etc/maxscale.cnf |
[maxscale]
|
threads = auto
|
log_augmentation = 1 |
ms_timestamp = 1 |
admin_host = 0.0.0.0 |
admin_port = 8989 |
|
[rwsplit-service]
|
type = service
|
router = readwritesplit
|
user = maxusr
|
password = A0FE98035CFA5EB978337B739E949878
|
causal_reads = true |
causal_reads_timeout = 30 |
master_reconnection = true |
max_sescmd_history = 1000 |
prune_sescmd_history = true |
master_failure_mode = fail_on_write
|
2. created the cluster on MaxScale using the dynamic commands below:
#: task: creating the monitor
|
maxctrl create monitor replication-cluster-monitor galeramon --monitor-user=maxmon --monitor-password=AFB909850E7181E9906159CE45176FAD
|
|
#: task: configuring the monitor for the replication cluster |
maxctrl alter monitor replication-cluster-monitor monitor_interval 500 |
maxctrl alter monitor replication-cluster-monitor disk_space_threshold /var/lib:85 |
maxctrl alter monitor replication-cluster-monitor disk_space_check_interval 1000 |
|
#: task: create a listener
|
maxctrl create listener rwsplit-service replication-rwsplit-listener 3306 |
|
#: task: create servers
|
maxctrl create server prod_mariadb01 10.136.88.50 3306 |
maxctrl create server prod_mariadb02 10.136.69.104 3306 |
maxctrl create server prod_mariadb03 10.136.79.28 3306 |
|
#: task: link servers with the service
|
maxctrl link service rwsplit-service prod_mariadb01
|
maxctrl link service rwsplit-service prod_mariadb02
|
maxctrl link service rwsplit-service prod_mariadb03
|
|
#: task: link servers with the monitor
|
maxctrl link monitor replication-cluster-monitor prod_mariadb01
|
maxctrl link monitor replication-cluster-monitor prod_mariadb02
|
maxctrl link monitor replication-cluster-monitor prod_mariadb03
|
And then, checking logs, I noticed that, after creating the .secrets file I forgot to workout the file's ownership (chown maxscale:maxscale) and I got silly errors, fixed after adjusting it. The case is that, after having the GaleraMon monitor reading the .secrets file, we can see below the servers coming up online on MaxScale:
2019-07-08 12:58:11.657 error : (secrets_readKeys): Access for secrets file [/var/lib/maxscale/.secrets] failed. Error 13, Permission denied. |
2019-07-08 12:58:12.258 notice : (secrets_readKeys): Using encrypted passwords. Encryption key: '/var/lib/maxscale/.secrets'. |
2019-07-08 12:58:12.271 notice : (post_tick): Found cluster members |
2019-07-08 12:58:12.272 notice : (mon_log_state_change): Server changed state: prod_mariadb01[10.136.88.50:3306]: slave_up. [Auth Error, Down] -> [Slave, Synced, Running] |
2019-07-08 12:58:12.272 notice : (mon_log_state_change): Server changed state: prod_mariadb02[10.136.69.104:3306]: slave_up. [Auth Error, Down] -> [Slave, Synced, Running] |
2019-07-08 12:58:12.273 notice : (mon_log_state_change): Server changed state: prod_mariadb03[10.136.79.28:3306]: master_up. [Auth Error, Down] -> [Master, Synced, Running] |
The configurations created for persisting dynamic commands for servers are below:
[prod_mariadb01]
|
type=server
|
port=3306 |
extra_port=0 |
persistpoolmax=0 |
persistmaxtime=0 |
proxy_protocol=false |
ssl=false |
ssl_version=MAX
|
ssl_cert_verify_depth=9 |
ssl_verify_peer_certificate=true |
protocol=mariadbbackend
|
address=10.136.88.50 |
|
[prod_mariadb02]
|
type=server
|
port=3306 |
extra_port=0 |
persistpoolmax=0 |
persistmaxtime=0 |
proxy_protocol=false |
ssl=false |
ssl_version=MAX
|
ssl_cert_verify_depth=9 |
ssl_verify_peer_certificate=true |
protocol=mariadbbackend
|
address=10.136.69.104 |
|
[prod_mariadb03]
|
type=server
|
port=3306 |
extra_port=0 |
persistpoolmax=0 |
persistmaxtime=0 |
proxy_protocol=false |
ssl=false |
ssl_version=MAX
|
ssl_cert_verify_depth=9 |
ssl_verify_peer_certificate=true |
protocol=mariadbbackend
|
address=10.136.79.28 |
All right, that's good. However, at this point, I felt like enabling priorities, so I can better handle who is the next when the current master should fail:
root@prod-maxscale01:~# maxctrl alter monitor replication-cluster-monitor use_priority true |
OK
|
Great, I enabled the priorities for the monitor (galeramon), awesome. And then, tailing logs, I see:
2019-07-08 12:59:34.832 notice : (do_alter_monitor): Updated monitor 'replication-cluster-monitor': use_priority=true |
2019-07-08 12:59:34.838 notice : (load_server_journal): Loaded server states from journal file: /var/lib/maxscale/replication-cluster-monitor/monitor.dat |
2019-07-08 12:59:34.850 notice : (mon_log_state_change): Server changed state: prod_mariadb01[10.136.88.50:3306]: new_master. [Slave, Synced, Running] -> [Master, Synced, Running] |
2019-07-08 12:59:34.851 notice : (mon_log_state_change): Server changed state: prod_mariadb03[10.136.79.28:3306]: new_slave. [Master, Synced, Running] -> [Slave, Synced, Running] |
I haven't set priories per server yet as I was planning to enable the use_priotities on the monitor side and then, set the priorities for servers. My current master was prod_mariadb03 and I was planning to set priorities like below (after enabling the use_priorities on monitor):
# prod_mariadb03 wsrep_cluster_index=0 priority=1 |
# prod_mariadb03 wsrep_cluster_index=1 priority=2 |
# prod_mariadb01 wsrep_cluster_index=2 priority=3 |
--
|
maxctrl alter server prod_mariadb03 priority 1 |
maxctrl alter server prod_mariadb02 priority 2 |
maxctrl alter server prod_mariadb01 priority 3 |
This way, I was attempting to avoid an election to be triggered. By the way, just enabling the priorities, an election was triggered without priorities set per server.
Following the theory behind this, if you don't set priorities, the galeramon will elect master based on the wsrep_local_index, and if I enable the priorities usage on the monitor side, it does not change the wsrep_local_index neither [should] add server's priorities underneath, am I right? Is that expected to have an election in this case?
Thanks!