[MXS-3987] Maxscale doesn't rejoin old master as a slave Created: 2022-02-08  Updated: 2022-02-27  Resolved: 2022-02-24

Status: Closed
Project: MariaDB MaxScale
Component/s: N/A
Affects Version/s: None
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Asel Magzhanova Assignee: Unassigned
Resolution: Not a Bug Votes: 0
Labels: None


 Description   

I have master (192.168.56.118) and slave (192.168.56.119)

Master my.cnf:

[mysqld]
 
character_set_server=utf8mb4
collation_server=utf8mb4_unicode_ci
innodb_file_format=Barracuda
innodb_large_prefix=1
innodb_file_per_table=1
lower_case_table_names=1
 
port = 3306
 
log_bin = mariadb-bin
 
server_id = 1
 
gtid_strict_mode = 1
gtid_domain_id = 1
 
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
 
 
[client-server]
 
 
!includedir /etc/my.cnf.d

Slave my.cnf:

[mysqld]
 
character_set_server=utf8mb4
collation_server=utf8mb4_unicode_ci
innodb_file_format=Barracuda
innodb_large_prefix=1
innodb_file_per_table=1
lower_case_table_names=1
 
port = 3306
 
log_bin = mariadb-bin
 
server_id = 2
gtid_strict_mode = 1
gtid_domain_id = 1
 
[mysqld_safe]
log-error=/var/log/mariadb/mariadb.log
pid-file=/var/run/mariadb/mariadb.pid
 
 
[client-server]
 
 
!includedir /etc/my.cnf.d

Replication works correctly, on slave ststus command has no errors:

MariaDB [(none)]> show slave status\G
*************************** 1. row ***************************
                Slave_IO_State: Waiting for master to send event
                   Master_Host: 192.168.56.118
                   Master_User: repl
                   Master_Port: 3306
                 Connect_Retry: 60
               Master_Log_File: mariadb-bin.000001
           Read_Master_Log_Pos: 3630
                Relay_Log_File: pg-exercises-relay-bin.000002
                 Relay_Log_Pos: 3544
         Relay_Master_Log_File: mariadb-bin.000001
              Slave_IO_Running: Yes
             Slave_SQL_Running: Yes
               Replicate_Do_DB:
           Replicate_Ignore_DB:
            Replicate_Do_Table:
        Replicate_Ignore_Table:
       Replicate_Wild_Do_Table:
   Replicate_Wild_Ignore_Table:
                    Last_Errno: 0
                    Last_Error:
                  Skip_Counter: 0
           Exec_Master_Log_Pos: 3630
               Relay_Log_Space: 3860
               Until_Condition: None
                Until_Log_File:
                 Until_Log_Pos: 0
            Master_SSL_Allowed: No
            Master_SSL_CA_File:
            Master_SSL_CA_Path:
               Master_SSL_Cert:
             Master_SSL_Cipher:
                Master_SSL_Key:
         Seconds_Behind_Master: 0
 Master_SSL_Verify_Server_Cert: No
                 Last_IO_Errno: 0
                 Last_IO_Error:
                Last_SQL_Errno: 0
                Last_SQL_Error:
   Replicate_Ignore_Server_Ids:
              Master_Server_Id: 1
                Master_SSL_Crl:
            Master_SSL_Crlpath:
                    Using_Gtid: Slave_Pos
                   Gtid_IO_Pos: 1-1-22
       Replicate_Do_Domain_Ids:
   Replicate_Ignore_Domain_Ids:
                 Parallel_Mode: optimistic
                     SQL_Delay: 0
           SQL_Remaining_Delay: NULL
       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates
              Slave_DDL_Groups: 18
Slave_Non_Transactional_Groups: 1
    Slave_Transactional_Groups: 0
1 row in set (0.000 sec)

Also I have maxscale, maxscale.cnf:

[maxscale]
threads=auto
 
 
[server1]
type=server
address=192.168.56.118
port=3306
protocol=MariaDBBackend
 
 
[server2]
type=server
address=192.168.56.119
port=3306
protocol=MariaDBBackend
 
[MariaDB-Monitor]
type=monitor
module=mariadbmon
servers=server1,server2
user=maxscale
password=maxscale
auto_failover=true
auto_rejoin=true
monitor_interval=100
replication_user=repl
replication_password=lper
 
[Read-Write-Service]
type=service
router=readwritesplit
servers=server1,server2
user=maxscale
password=maxscale
 
 
[Read-Write-Listener]
type=listener
service=Read-Write-Service
protocol=MariaDBClient
port=4006

Firsty it works without problems:
maxctrl list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬────────┐
│ Server │ Address │ Port │ Connections │ State │ GTID │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server1 │ 192.168.56.118 │ 3306 │ 0 │ Master, Running │ 1-1-22 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server2 │ 192.168.56.119 │ 3306 │ 0 │ Slave, Running │ 1-1-22 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴────────┘

When master fails, slave bocomes a master:
maxctrl list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬────────┐
│ Server │ Address │ Port │ Connections │ State │ GTID │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server1 │ 192.168.56.118 │ 3306 │ 0 │ Down │ 1-1-22 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server2 │ 192.168.56.119 │ 3306 │ 0 │ Master, Running │ 1-1-22 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴────────┘

But, when I start master again I have such situation:
maxctrl list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬────────┐
│ Server │ Address │ Port │ Connections │ State │ GTID │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server1 │ 192.168.56.118 │ 3306 │ 0 │ Running │ 1-1-22 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server2 │ 192.168.56.119 │ 3306 │ 0 │ Master, Running │ 1-1-22 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴────────┘

And I found error in the maxscale log:
2022-02-08 12:22:05 warning: [mariadbmon] Automatic rejoin was not attempted on server 'server1' even though it is a valid candidate. Will keep retrying with this message suppressed for all servers. Errors:
gtid_current_pos of 'server1' (1-1-22) is incompatible with gtid_binlog_pos of 'server2' (1-2-8).

So, server1 previously was a master, so, I thought that it's ok that his global.gtid_binlog_pos is higher.
Perhaps I'm doing something wrong? Or is this the standard behavior?



 Comments   
Comment by markus makela [ 2022-02-24 ]

You need to have log_slave_updates enabled for the master to be able to rejoin.

Additionally, it is highly recommended to enable semi-sync replication to make the failover process more robust. However in this case it's unlikely that the lack of it is what causes the problem.

Comment by Asel Magzhanova [ 2022-02-27 ]

Thank you very much

Generated at Thu Feb 08 04:25:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.