[MXS-3987] Maxscale doesn't rejoin old master as a slave - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Not a Bug
Affects Version/s: None
Fix Version/s: N/A
Component/s: N/A
Labels:
None

Description

I have master (192.168.56.118) and slave (192.168.56.119)

Master my.cnf:

[mysqld]

character_set_server=utf8mb4

collation_server=utf8mb4_unicode_ci

innodb_file_format=Barracuda

innodb_large_prefix=1

innodb_file_per_table=1

lower_case_table_names=1

port = 3306

log_bin = mariadb-bin

server_id = 1

gtid_strict_mode = 1

gtid_domain_id = 1

[mysqld_safe]

log-error=/var/log/mariadb/mariadb.log

pid-file=/var/run/mariadb/mariadb.pid

[client-server]

!includedir /etc/my.cnf.d

Slave my.cnf:

[mysqld]

character_set_server=utf8mb4

collation_server=utf8mb4_unicode_ci

innodb_file_format=Barracuda

innodb_large_prefix=1

innodb_file_per_table=1

lower_case_table_names=1

port = 3306

log_bin = mariadb-bin

server_id = 2

gtid_strict_mode = 1

gtid_domain_id = 1

[mysqld_safe]

log-error=/var/log/mariadb/mariadb.log

pid-file=/var/run/mariadb/mariadb.pid

[client-server]

!includedir /etc/my.cnf.d

Replication works correctly, on slave ststus command has no errors:

MariaDB [(none)]> show slave status\G

*************************** 1. row ***************************

                Slave_IO_State: Waiting for master to send event

                   Master_Host: 192.168.56.118

                   Master_User: repl

                   Master_Port: 3306

                 Connect_Retry: 60

               Master_Log_File: mariadb-bin.000001

           Read_Master_Log_Pos: 3630

                Relay_Log_File: pg-exercises-relay-bin.000002

                 Relay_Log_Pos: 3544

         Relay_Master_Log_File: mariadb-bin.000001

              Slave_IO_Running: Yes

             Slave_SQL_Running: Yes

               Replicate_Do_DB:

           Replicate_Ignore_DB:

            Replicate_Do_Table:

        Replicate_Ignore_Table:

       Replicate_Wild_Do_Table:

   Replicate_Wild_Ignore_Table:

                    Last_Errno: 0

                    Last_Error:

                  Skip_Counter: 0

           Exec_Master_Log_Pos: 3630

               Relay_Log_Space: 3860

               Until_Condition: None

                Until_Log_File:

                 Until_Log_Pos: 0

            Master_SSL_Allowed: No

            Master_SSL_CA_File:

            Master_SSL_CA_Path:

               Master_SSL_Cert:

             Master_SSL_Cipher:

                Master_SSL_Key:

         Seconds_Behind_Master: 0

 Master_SSL_Verify_Server_Cert: No

                 Last_IO_Errno: 0

                 Last_IO_Error:

                Last_SQL_Errno: 0

                Last_SQL_Error:

   Replicate_Ignore_Server_Ids:

              Master_Server_Id: 1

                Master_SSL_Crl:

            Master_SSL_Crlpath:

                    Using_Gtid: Slave_Pos

                   Gtid_IO_Pos: 1-1-22

       Replicate_Do_Domain_Ids:

   Replicate_Ignore_Domain_Ids:

                 Parallel_Mode: optimistic

                     SQL_Delay: 0

           SQL_Remaining_Delay: NULL

       Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates

              Slave_DDL_Groups: 18

Slave_Non_Transactional_Groups: 1

    Slave_Transactional_Groups: 0

1 row in set (0.000 sec)

Also I have maxscale, maxscale.cnf:

[maxscale]

threads=auto

[server1]

type=server

address=192.168.56.118

port=3306

protocol=MariaDBBackend

[server2]

type=server

address=192.168.56.119

port=3306

protocol=MariaDBBackend

[MariaDB-Monitor]

type=monitor

module=mariadbmon

servers=server1,server2

user=maxscale

password=maxscale

auto_failover=true

auto_rejoin=true

monitor_interval=100

replication_user=repl

replication_password=lper

[Read-Write-Service]

type=service

router=readwritesplit

servers=server1,server2

user=maxscale

password=maxscale

[Read-Write-Listener]

type=listener

service=Read-Write-Service

protocol=MariaDBClient

port=4006

Firsty it works without problems:
maxctrl list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬────────┐
│ Server │ Address │ Port │ Connections │ State │ GTID │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server1 │ 192.168.56.118 │ 3306 │ 0 │ Master, Running │ 1-1-22 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server2 │ 192.168.56.119 │ 3306 │ 0 │ Slave, Running │ 1-1-22 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴────────┘

When master fails, slave bocomes a master:
maxctrl list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬────────┐
│ Server │ Address │ Port │ Connections │ State │ GTID │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server1 │ 192.168.56.118 │ 3306 │ 0 │ Down │ 1-1-22 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server2 │ 192.168.56.119 │ 3306 │ 0 │ Master, Running │ 1-1-22 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴────────┘

But, when I start master again I have such situation:
maxctrl list servers
┌─────────┬────────────────┬──────┬─────────────┬─────────────────┬────────┐
│ Server │ Address │ Port │ Connections │ State │ GTID │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server1 │ 192.168.56.118 │ 3306 │ 0 │ Running │ 1-1-22 │
├─────────┼────────────────┼──────┼─────────────┼─────────────────┼────────┤
│ server2 │ 192.168.56.119 │ 3306 │ 0 │ Master, Running │ 1-1-22 │
└─────────┴────────────────┴──────┴─────────────┴─────────────────┴────────┘

And I found error in the maxscale log:
2022-02-08 12:22:05 warning: [mariadbmon] Automatic rejoin was not attempted on server 'server1' even though it is a valid candidate. Will keep retrying with this message suppressed for all servers. Errors:
gtid_current_pos of 'server1' (1-1-22) is incompatible with gtid_binlog_pos of 'server2' (1-2-8).

So, server1 previously was a master, so, I thought that it's ok that his global.gtid_binlog_pos is higher.
Perhaps I'm doing something wrong? Or is this the standard behavior?

Maxscale doesn't rejoin old master as a slave

Details

Description

Attachments

Activity

People

Dates

Git Integration