[MDEV-10033] Async slave constantly reconnects to master if there is another slave with the same server_id Created: 2016-05-06  Updated: 2016-05-09

Status: Confirmed
Project: MariaDB Server
Component/s: Replication
Affects Version/s: 10.1.13, 5.5, 10.0, 10.1, 10.2
Fix Version/s: 10.2

Type: Bug Priority: Minor
Reporter: Bryan Seitz Assignee: Kristian Nielsen
Resolution: Unresolved Votes: 0
Labels: None
Environment:

CentOS7
Official Packages



 Description   

I have a 13 node Galera cluster, it was at version 10.1.10. I also have 10.1.10 Async slaves. I upgraded one of the slaves to 10.1.13 and now see the following over and over in my error log:

2016-05-06  4:31:55 139622869956352 [Note] Slave: received end packet from server, apparent master shutdown:
2016-05-06  4:31:55 139622869956352 [Note] Slave I/O thread: Failed reading log event, reconnecting to retry, log 'mysql-bin.000001' at position 742332; GTID position '2-12-1300'

I then upgraded the galera cluster to 10.1.13 to see if that fixed my issues but it did not. I tried downgrading the Async slave node to 10.1.10 but experienced the same issues. I even reinstalled this node with 10.1.10 which also did not help. The remaining 10.1.10 async slaves do not seem to exhibit this behavior.

               Slave_IO_State: Reconnecting after a failed master event read
                  Master_Host: 100.65.3.7
                  Master_User: slave_user
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000001
          Read_Master_Log_Pos: 767464
               Relay_Log_File: v-pdnsauth-001-net.016277
                Relay_Log_Pos: 736
        Relay_Master_Log_File: mysql-bin.000001
             Slave_IO_Running: Connecting
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table: powerdns.%
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 767464
              Relay_Log_Space: 1528
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
  Replicate_Ignore_Server_Ids:
             Master_Server_Id: 3
               Master_SSL_Crl:
           Master_SSL_Crlpath:
                   Using_Gtid: Slave_Pos
                  Gtid_IO_Pos: 2-1-1345
      Replicate_Do_Domain_Ids:
  Replicate_Ignore_Domain_Ids:
                Parallel_Mode: conservative



 Comments   
Comment by Bryan Seitz [ 2016-05-06 ]

[ash2] bryan_seitz@v-mysql-003-net:/data/mysql-logs$ sudo mysqlbinlog mysql-bin.000001 > /dev/null
[ash2] bryan_seitz@v-mysql-003-net:/data/mysql-logs$

Comment by Bryan Seitz [ 2016-05-06 ]

Slave Config:

###
# /etc/my.cnf
###
[mysql]
prompt='\U (\p) [\d] > '
 
[mysqld]
server_id                      = 101
datadir                        = /var/lib/mysql
user                           = mysql
default_storage_engine         = InnoDB
max_allowed_packet             = 1073741824
sysdate-is-now                 = 1
skip-name-resolve
 
### Slave
#enforce-gtid-consistency
#skip-slave-start
#gtid-mode                      = on
read-only                      = 1
sync-master-info               = 1
log_bin                        = binlog
sync_binlog                    = 1
log_slave_updates              = 1
binlog_format                  = ROW
replicate-wild-do-table        = powerdns.%
slave_net_timeout              = 3600
master_retry_count             = 86400
slave-transaction-retries      = 64
relay_log                      = v-pdnsauth-001-net.ash2.symcpe.net-relay-bin
expire-logs-days               = 7
 
# CACHES AND LIMITS #
tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-size               = 64M
query-cache-type               = 1
max-connections                = 500
thread-cache-size              = 50
open-files-limit               = 65535
table-definition-cache         = 1024
table-open-cache               = 32
### InnoDB
innodb_flush_method            = O_DIRECT
innodb_log_files_in_group      = 2
innodb_log-buffer-size         = 64M
innodb_log_file_size           = 256M
innodb_flush_log_at_trx_commit = 1
innodb_file_per_table          = 1
innodb_buffer_pool_size        = 2G
 
# LOGGING #
log-error                      = /var/lib/mysql/v-pdnsauth-001-net.ash2.symcpe.net-mysql-error.log
log-queries-not-using-indexes  = 1
slow-query-log                 = 1
slow-query-log-file            = /var/lib/mysql/v-pdnsauth-001-net.ash2.symcpe.net-mysql-slow.log

Comment by Bryan Seitz [ 2016-05-06 ]

Binlog entries:

mysqlbinlog mysql-bin.000001 --start-position=767464

/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#160506  3:45:56 server id 3  end_log_pos 249 CRC32 0xe8ea51c3  Start: binlog v 4, server v 10.1.13-MariaDB created 160506  3:45:56 at startup
# Warning: this binlog is either in use or was not closed properly.
ROLLBACK/*!*/;
BINLOG '
dBMsVw8DAAAA9QAAAPkAAAABAAQAMTAuMS4xMy1NYXJpYURCAGxvZwAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAB0EyxXEzgNAAgAEgAEBAQEEgAA3QAEGggAAAAICAgCAAAACgoKAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAEEwQAAcNR6ug=
'/*!*/;
# at 767464
#160506  4:34:15 server id 3  end_log_pos 767506 CRC32 0xdfa83ff6       GTID 2-3-1346 trans
/*!100101 SET @@session.skip_parallel_replication=0*//*!*/;
/*!100001 SET @@session.gtid_domain_id=2*//*!*/;
/*!100001 SET @@session.server_id=3*//*!*/;
/*!100001 SET @@session.gtid_seq_no=1346*//*!*/;
BEGIN
/*!*/;
# at 767506
#160506  4:34:15 server id 3  end_log_pos 767592 CRC32 0xa56c9075       Table_map: `designate_powerdns_kilo`.`domains` mapped to number 18
# at 767592
#160506  4:34:15 server id 3  end_log_pos 767934 CRC32 0x64a70ffb       Update_rows: table id 18 flags: STMT_END_F
 
BINLOG '
xx4sVxMDAAAAVgAAAGi2CwAAABIAAAAAAAEAF2Rlc2lnbmF0ZV9wb3dlcmRuc19raWxvAAdkb21h
aW5zAAgDDw8DDwMP/gr9Av0CEgB4AP5g7HWQbKU=
xx4sVxgDAAAAVgEAAL63CwAAABIAAAAAAAEACP//IJEqAAAgAGFtYW4tbWlzaHJhMS5kZXYuYXNo
Mi5zeW1jcGUuY29tJQAxMDAuNzIuMTMwLjkzOjUzNTQsMTAwLjcyLjEzOC41ODo1MzU0tRAsVwVT
TEFWRSBhZWE1ZmQxNTc1MGY0Y2JlOGNmYzNkMzk5ZjQ1OTQ5ZCA3M2UyNjY1MWYxOTY0YjEyYTli
MDU5OTg3Zjg2MDE0ZSCRKgAAIABhbWFuLW1pc2hyYTEuZGV2LmFzaDIuc3ltY3BlLmNvbSUAMTAw
LjcyLjEzMC45Mzo1MzU0LDEwMC43Mi4xMzguNTg6NTM1NMceLFcFU0xBVkUgYWVhNWZkMTU3NTBm
NGNiZThjZmMzZDM5OWY0NTk0OWQgNzNlMjY2NTFmMTk2NGIxMmE5YjA1OTk4N2Y4NjAxNGX7D6dk
'/*!*/;
# at 767934
#160506  4:34:15 server id 3  end_log_pos 767965 CRC32 0xa781af21       Xid = 3033817
COMMIT/*!*/;
# at 767965
#160506  4:34:15 server id 3  end_log_pos 768007 CRC32 0x6c5b2cdb       GTID 2-3-1347 trans
/*!100001 SET @@session.gtid_seq_no=1347*//*!*/;
BEGIN
/*!*/;
# at 768007
#160506  4:34:15 server id 3  end_log_pos 768093 CRC32 0xcf2a81ba       Table_map: `designate_powerdns_kilo`.`domains` mapped to number 18
# at 768093
#160506  4:34:15 server id 3  end_log_pos 768407 CRC32 0xba833a24       Update_rows: table id 18 flags: STMT_END_F
 
BINLOG '
xx4sVxMDAAAAVgAAAF24CwAAABIAAAAAAAEAF2Rlc2lnbmF0ZV9wb3dlcmRuc19raWxvAAdkb21h
aW5zAAgDDw8DDwMP/gr9Av0CEgB4AP5g7LqBKs8=
xx4sVxgDAAAAOgEAAJe5CwAAABIAAAAAAAEACP//IAErAAATAGRldi5waHgyLnN5bWNwZS5jb20k
ADEwMC44MC4xMzAuOTo1MzU0LDEwMC44MC4xMzEuMzk6NTM1NLUQLFcFU0xBVkUgYWVhNWZkMTU3
NTBmNGNiZThjZmMzZDM5OWY0NTk0OWQgN2NmY2UzNzU3NmJkNGM4MzkyYjkxMzhiZjNmMDM3M2Qg
ASsAABMAZGV2LnBoeDIuc3ltY3BlLmNvbSQAMTAwLjgwLjEzMC45OjUzNTQsMTAwLjgwLjEzMS4z
OTo1MzU0xx4sVwVTTEFWRSBhZWE1ZmQxNTc1MGY0Y2JlOGNmYzNkMzk5ZjQ1OTQ5ZCA3Y2ZjZTM3
NTc2YmQ0YzgzOTJiOTEzOGJmM2YwMzczZCQ6g7o=
'/*!*/;
# at 768407
#160506  4:34:15 server id 3  end_log_pos 768438 CRC32 0x100d10b6       Xid = 3033818
COMMIT/*!*/;
# at 768438
#160506  4:34:16 server id 3  end_log_pos 768480 CRC32 0x6e00dd33       GTID 2-3-1348 trans
/*!100001 SET @@session.gtid_seq_no=1348*//*!*/;
BEGIN
/*!*/;
# at 768480
#160506  4:34:16 server id 3  end_log_pos 768566 CRC32 0xdf316157       Table_map: `designate_powerdns_kilo`.`domains` mapped to number 18
# at 768566
#160506  4:34:16 server id 3  end_log_pos 768894 CRC32 0x92b58dda       Update_rows: table id 18 flags: STMT_END_F
 
BINLOG '
yB4sVxMDAAAAVgAAADa6CwAAABIAAAAAAAEAF2Rlc2lnbmF0ZV9wb3dlcmRuc19raWxvAAdkb21h
aW5zAAgDDw8DDwMP/gr9Av0CEgB4AP5g7FdhMd8=
yB4sVxgDAAAASAEAAH67CwAAABIAAAAAAAEACP//IDErAAAZAGtjcHJvamVjdC5hc2gyLnN5bWNw
ZS5jb20lADEwMC43Mi4xMzAuOTM6NTM1NCwxMDAuNzIuMTM4LjU4OjUzNTS1ECxXBVNMQVZFIDMx
ODUwZDQ0MTYyZjRlZmJhNGQ4NWYxYzE2NDcyYTUzIDZiMTUwZjc5NWI4NTRjYmQ5MTM1NWVmMDI1
OWY2YWM4IDErAAAZAGtjcHJvamVjdC5hc2gyLnN5bWNwZS5jb20lADEwMC43Mi4xMzAuOTM6NTM1
NCwxMDAuNzIuMTM4LjU4OjUzNTTIHixXBVNMQVZFIDMxODUwZDQ0MTYyZjRlZmJhNGQ4NWYxYzE2
NDcyYTUzIDZiMTUwZjc5NWI4NTRjYmQ5MTM1NWVmMDI1OWY2YWM42o21kg==
'/*!*/;
# at 768894
#160506  4:34:16 server id 3  end_log_pos 768925 CRC32 0x2ef0765a       Xid = 3033819
COMMIT/*!*/;

Comment by Bryan Seitz [ 2016-05-06 ]

Looks like I had two slaves with the same server_id. While not ideal, it should still work no ?

Comment by Elena Stepanova [ 2016-05-09 ]

Actually it doesn't, both MySQL and MariaDB documentation clearly say:

On the master and each slave, you must use the server-id option to establish a unique replication ID in the range from 1 to 232 − 1. “Unique” means that each ID must be different from every other ID in use by any other replication master or slave.

The server_id must be unique for each server in the replicating group.

The other related nasty side-effect is constantly rotating relay log.
MySQL 5.5 exhibits the same behavior. In MySQL 5.6 and higher it's different, maybe because apart from server-id it also uses server-uuid (or maybe on some other reason).

So, maybe there is a way to do something smarter than endless reconnects; for that, I'll keep the report open.

Generated at Thu Feb 08 07:39:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.