[MCOL-1137] Mysql replication master and slave both setup after a masternode failover Created: 2018-01-05  Updated: 2023-10-26  Resolved: 2018-02-01

Status: Closed
Project: MariaDB ColumnStore
Component/s: ?
Affects Version/s: 1.1.2
Fix Version/s: 1.1.3

Type: Bug Priority: Minor
Reporter: David Hill (Inactive) Assignee: Daniel Lee (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

non-root amazon EBS 3pm combo setup


Sprint: 2018-01, 2018-02, 2018-03

 Description   

started with a amazon ami 3pm combo setup using EBS and replication is enabled.
I stopped pm1 instance, which was the Master replication node. pm3 became the master replication node.It show it was the master and the slave node. So when it was changed to master, it show have been had the slave removed.

PM1

MariaDB [david]> show master status\G

                                                      • 1. row ***************************
                                                        File: mysql-bin.000002
                                                        Position: 3214
                                                        Binlog_Do_DB:
                                                        Binlog_Ignore_DB:
                                                        1 row in set (0.00 sec)

MariaDB [(none)]> show slave status\G
Empty set (0.00 sec)

PM3

MariaDB [(none)]> show master status\G;

                                                      • 1. row ***************************
                                                        File: mysql-bin.000002
                                                        Position: 342
                                                        Binlog_Do_DB:
                                                        Binlog_Ignore_DB:
                                                        1 row in set (0.00 sec)

ERROR: No query specified

MariaDB [(none)]> show slave status\G;

                                                      • 1. row ***************************
                                                        Slave_IO_State: Waiting for master to send event
                                                        Master_Host: 172.30.0.204
                                                        Master_User: idbrep
                                                        Master_Port: 3306
                                                        Connect_Retry: 60
                                                        Master_Log_File: mysql-bin.000002
                                                        Read_Master_Log_Pos: 2679
                                                        Relay_Log_File: relay-bin.000002
                                                        Relay_Log_Pos: 555
                                                        Relay_Master_Log_File: mysql-bin.000002
                                                        Slave_IO_Running: Yes
                                                        Slave_SQL_Running: Yes
                                                        Replicate_Do_DB:
                                                        Replicate_Ignore_DB:
                                                        Replicate_Do_Table:
                                                        Replicate_Ignore_Table:
                                                        Replicate_Wild_Do_Table:
                                                        Replicate_Wild_Ignore_Table:
                                                        Last_Errno: 0
                                                        Last_Error:
                                                        Skip_Counter: 0
                                                        Exec_Master_Log_Pos: 2679
                                                        Relay_Log_Space: 858
                                                        Until_Condition: None
                                                        Until_Log_File:
                                                        Until_Log_Pos: 0
                                                        Master_SSL_Allowed: No
                                                        Master_SSL_CA_File:
                                                        Master_SSL_CA_Path:
                                                        Master_SSL_Cert:
                                                        Master_SSL_Cipher:
                                                        Master_SSL_Key:
                                                        Seconds_Behind_Master: 0
                                                        Master_SSL_Verify_Server_Cert: No
                                                        Last_IO_Errno: 0
                                                        Last_IO_Error:
                                                        Last_SQL_Errno: 0
                                                        Last_SQL_Error:
                                                        Replicate_Ignore_Server_Ids:
                                                        Master_Server_Id: 1
                                                        Master_SSL_Crl:
                                                        Master_SSL_Crlpath:
                                                        Using_Gtid: No
                                                        Gtid_IO_Pos:
                                                        Replicate_Do_Domain_Ids:
                                                        Replicate_Ignore_Domain_Ids:
                                                        Parallel_Mode: conservative
                                                        SQL_Delay: 0
                                                        SQL_Remaining_Delay: NULL
                                                        Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
                                                        1 row in set (0.00 sec)

ERROR: No query specified

MariaDB [(none)]>

-----------------------------------------------------------------------------------------------------------

[mariadb-user@ip-172-30-0-204 ~]$ ma getsystemi
getsysteminfo Fri Jan 5 15:47:17 2018

System 1.1.2

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE Fri Jan 5 15:43:52 2018

Module pm1 ACTIVE Fri Jan 5 15:43:48 2018
Module pm2 ACTIVE Fri Jan 5 15:43:44 2018
Module pm3 ACTIVE Fri Jan 5 15:43:43 2018

Active Parent OAM Performance Module is 'pm1'
Primary Front-End MariaDB ColumnStore Module is 'pm1'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor pm1 ACTIVE Fri Jan 5 15:42:23 2018 1283
ProcessManager pm1 ACTIVE Fri Jan 5 15:42:29 2018 1440
DBRMControllerNode pm1 ACTIVE Fri Jan 5 15:43:18 2018 2897
ServerMonitor pm1 ACTIVE Fri Jan 5 15:43:20 2018 2956
DBRMWorkerNode pm1 ACTIVE Fri Jan 5 15:43:20 2018 2996
DecomSvr pm1 ACTIVE Fri Jan 5 15:43:24 2018 3159
PrimProc pm1 ACTIVE Fri Jan 5 15:43:27 2018 3262
ExeMgr pm1 ACTIVE Fri Jan 5 15:43:37 2018 5003
WriteEngineServer pm1 ACTIVE Fri Jan 5 15:43:41 2018 5143
DDLProc pm1 ACTIVE Fri Jan 5 15:43:45 2018 5333
DMLProc pm1 ACTIVE Fri Jan 5 15:43:49 2018 5494
mysqld pm1 ACTIVE Fri Jan 5 15:43:41 2018 2696

ProcessMonitor pm2 ACTIVE Fri Jan 5 15:43:07 2018 15334
ProcessManager pm2 COLD_STANDBY Fri Jan 5 15:43:36 2018
DBRMControllerNode pm2 COLD_STANDBY Fri Jan 5 15:43:36 2018
ServerMonitor pm2 ACTIVE Fri Jan 5 15:43:22 2018 15820
DBRMWorkerNode pm2 ACTIVE Fri Jan 5 15:43:23 2018 15846
DecomSvr pm2 ACTIVE Fri Jan 5 15:43:26 2018 15877
PrimProc pm2 ACTIVE Fri Jan 5 15:43:30 2018 15885
ExeMgr pm2 ACTIVE Fri Jan 5 15:43:39 2018 16794
WriteEngineServer pm2 ACTIVE Fri Jan 5 15:43:43 2018 16815
DDLProc pm2 COLD_STANDBY Fri Jan 5 15:43:44 2018
DMLProc pm2 COLD_STANDBY Fri Jan 5 15:43:44 2018
mysqld pm2 ACTIVE Fri Jan 5 15:43:45 2018 15694

ProcessMonitor pm3 ACTIVE Fri Jan 5 15:43:08 2018 14322
ProcessManager pm3 HOT_STANDBY Fri Jan 5 15:43:12 2018 14457
DBRMControllerNode pm3 COLD_STANDBY Fri Jan 5 15:43:24 2018
ServerMonitor pm3 ACTIVE Fri Jan 5 15:43:27 2018 14823
DBRMWorkerNode pm3 ACTIVE Fri Jan 5 15:43:28 2018 14868
DecomSvr pm3 ACTIVE Fri Jan 5 15:43:31 2018 14882
PrimProc pm3 ACTIVE Fri Jan 5 15:43:34 2018 14890
ExeMgr pm3 ACTIVE Fri Jan 5 15:43:39 2018 14969
WriteEngineServer pm3 ACTIVE Fri Jan 5 15:43:43 2018 14990
DDLProc pm3 COLD_STANDBY Fri Jan 5 15:43:43 2018
DMLProc pm3 COLD_STANDBY Fri Jan 5 15:43:43 2018
mysqld pm3 ACTIVE Fri Jan 5 15:43:26 2018 14698

Active Alarm Counts: Critical = 0, Major = 0, Minor = 0, Warning = 0, Info = 0
[mariadb-user@ip-172-30-0-204 ~]$

--------------------------------------------------------------------------------------

System 1.1.2

System and Module statuses

Component Status Last Status Change
------------ -------------------------- ------------------------
System ACTIVE Thu Jan 4 21:36:54 2018

Module pm1 AUTO_DISABLED/DEGRADED Thu Jan 4 21:35:01 2018
Module pm2 ACTIVE Thu Jan 4 21:36:12 2018
Module pm3 ACTIVE Thu Jan 4 21:35:38 2018

Active Parent OAM Performance Module is 'pm3'
Primary Front-End MariaDB ColumnStore Module is 'pm3'
MariaDB ColumnStore Replication Feature is enabled

MariaDB ColumnStore Process statuses

Process Module Status Last Status Change Process ID
------------------ ------ --------------- ------------------------ ----------
ProcessMonitor pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
ProcessManager pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
DBRMControllerNode pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
ServerMonitor pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
DBRMWorkerNode pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
DecomSvr pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
PrimProc pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
ExeMgr pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
WriteEngineServer pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
DDLProc pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
DMLProc pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018
mysqld pm1 AUTO_OFFLINE Thu Jan 4 21:35:51 2018

ProcessMonitor pm2 ACTIVE Thu Jan 4 21:19:18 2018 3458
ProcessManager pm2 COLD_STANDBY Thu Jan 4 21:36:12 2018
DBRMControllerNode pm2 COLD_STANDBY Thu Jan 4 21:36:12 2018
ServerMonitor pm2 ACTIVE Thu Jan 4 21:19:33 2018 3951
DBRMWorkerNode pm2 ACTIVE Thu Jan 4 21:19:34 2018 3963
DecomSvr pm2 ACTIVE Thu Jan 4 21:19:37 2018 3995
PrimProc pm2 ACTIVE Thu Jan 4 21:19:40 2018 4003
ExeMgr pm2 ACTIVE Thu Jan 4 21:19:49 2018 4914
WriteEngineServer pm2 ACTIVE Thu Jan 4 21:19:53 2018 4935
DDLProc pm2 COLD_STANDBY Thu Jan 4 21:36:12 2018
DMLProc pm2 COLD_STANDBY Thu Jan 4 21:36:12 2018
mysqld pm2 ACTIVE Thu Jan 4 21:36:14 2018 3825

ProcessMonitor pm3 ACTIVE Thu Jan 4 21:19:19 2018 3457
ProcessManager pm3 ACTIVE Thu Jan 4 21:36:38 2018 3599
DBRMControllerNode pm3 ACTIVE Thu Jan 4 21:35:15 2018 7013
ServerMonitor pm3 ACTIVE Thu Jan 4 21:35:17 2018 7029
DBRMWorkerNode pm3 ACTIVE Thu Jan 4 21:35:17 2018 7050
DecomSvr pm3 ACTIVE Thu Jan 4 21:35:21 2018 7088
PrimProc pm3 ACTIVE Thu Jan 4 21:35:23 2018 7106
ExeMgr pm3 ACTIVE Thu Jan 4 21:35:27 2018 7177
WriteEngineServer pm3 ACTIVE Thu Jan 4 21:35:31 2018 7209
DDLProc pm3 ACTIVE Thu Jan 4 21:35:35 2018 7257
DMLProc pm3 ACTIVE Thu Jan 4 21:36:54 2018 7320
mysqld pm3 ACTIVE Thu Jan 4 21:36:26 2018 6868

Active Alarm Counts: Critical = 3, Major = 1, Minor = 0, Warning = 0, Info = 0
mcsadmin> getstorage
getstorageconfig Thu Jan 4 21:38:31 2018

System Storage Configuration

Performance Module (DBRoot) Storage Type = external
User Module Storage Type = internal
System Assigned DBRoot Count = 3
DBRoot IDs assigned to 'pm1' =
DBRoot IDs assigned to 'pm2' = 2
DBRoot IDs assigned to 'pm3' = 1, 3

--------------------------------------------------------------------------------------------------

mcsmysql
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 30
Server version: 10.2.10-MariaDB-log Columnstore 1.1.2-1

Copyright (c) 2000, 2017, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show master status\G;

                                                      • 1. row ***************************
                                                        File: mysql-bin.000003
                                                        Position: 2679
                                                        Binlog_Do_DB:
                                                        Binlog_Ignore_DB:
                                                        1 row in set (0.00 sec)

ERROR: No query specified

MariaDB [(none)]> show slave status\G;

                                                      • 1. row ***************************
                                                        Slave_IO_State:
                                                        Master_Host: 172.30.0.204
                                                        Master_User: idbrep
                                                        Master_Port: 3306
                                                        Connect_Retry: 60
                                                        Master_Log_File:
                                                        Read_Master_Log_Pos: 4
                                                        Relay_Log_File:
                                                        Relay_Log_Pos: 4
                                                        Relay_Master_Log_File:
                                                        Slave_IO_Running: No
                                                        Slave_SQL_Running: No
                                                        Replicate_Do_DB:
                                                        Replicate_Ignore_DB:
                                                        Replicate_Do_Table:
                                                        Replicate_Ignore_Table:
                                                        Replicate_Wild_Do_Table:
                                                        Replicate_Wild_Ignore_Table:
                                                        Last_Errno: 0
                                                        Last_Error:
                                                        Skip_Counter: 0
                                                        Exec_Master_Log_Pos: 0
                                                        Relay_Log_Space: 256
                                                        Until_Condition: None
                                                        Until_Log_File:
                                                        Until_Log_Pos: 0
                                                        Master_SSL_Allowed: No
                                                        Master_SSL_CA_File:
                                                        Master_SSL_CA_Path:
                                                        Master_SSL_Cert:
                                                        Master_SSL_Cipher:
                                                        Master_SSL_Key:
                                                        Seconds_Behind_Master: NULL
                                                        Master_SSL_Verify_Server_Cert: No
                                                        Last_IO_Errno: 0
                                                        Last_IO_Error:
                                                        Last_SQL_Errno: 0
                                                        Last_SQL_Error:
                                                        Replicate_Ignore_Server_Ids:
                                                        Master_Server_Id: 0
                                                        Master_SSL_Crl:
                                                        Master_SSL_Crlpath:
                                                        Using_Gtid: No
                                                        Gtid_IO_Pos:
                                                        Replicate_Do_Domain_Ids:
                                                        Replicate_Ignore_Domain_Ids:
                                                        Parallel_Mode: conservative
                                                        SQL_Delay: 0
                                                        SQL_Remaining_Delay: NULL
                                                        Slave_SQL_Running_State:
                                                        1 row in set (0.00 sec)

ERROR: No query specified

MariaDB [(none)]>



 Comments   
Comment by David Hill (Inactive) [ 2018-01-18 ]

https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/372

Fixed multiple issues

1. after pm1 outage, slave on master was still setup. change in disable-slave script to disable slave on the master
2. on amazon pm1 stoppage, procmgr failover code would process the outage. But it would then get reprocess by the DOWN MODULE code. Added code to prevent this. will skip down module if amazon and already in DISABLED STATE
3. after pm1 recoverd, it didnt contain any of the new databases and table that were created while it was down. change codew to do a DIST-DB when pm1 comes back up

Comment by Ben Thompson (Inactive) [ 2018-01-19 ]

Reviewed / Merged

Comment by David Hill (Inactive) [ 2018-01-24 ]

problem detected on a system with1 um and no slaves

Comment by David Hill (Inactive) [ 2018-01-24 ]

new failure from initial code checkin

Run MariaDB ColumnStore Replication Setup..
ERROR: Error return in running the MariaDB ColumnStore Master replication, check /tmp/master-rep*.logs on um1

MariaDB ColumnStore Replication Setup Failed, check logs

BUG WHEN THERE IS ONLY 1UM WITH NO SLAVES

Comment by David Hill (Inactive) [ 2018-01-24 ]

https://github.com/mariadb-corporation/mariadb-columnstore-engine/pull/385

Comment by Ben Thompson (Inactive) [ 2018-01-24 ]

Reviewed / Merged

Comment by David Hill (Inactive) [ 2018-01-24 ]

with latest change, also test a 1um/2pm system where there are no slave nodes configured initially

Comment by Daniel Lee (Inactive) [ 2018-01-25 ]

Build verified: 1.1.3-1 created on 01/24/2018, ami-99b40be1

Encounter few failover issues on a 3pm combo system with external storage, including

1) Afer stopped the PM1 instance, PM2 failed to take over, due to dbroot 1 mounting issue
2) After started PM1, PM1 tried to startup all processes and failed in a loop and eventually stopped trying.

In a different test

1) After failover seem to be working, could not create table in PM2 (MCOL1034)
2) After starting PM1, could not create table because it is not the replication master, but yet it is the active PM node.
3) Also noticed that all three server nodes had server-id=1 in my.cnf

We need to fix all failover issues before testing this one again.

Comment by David Hill (Inactive) [ 2018-01-30 ]

retesting with the changes from mcol-1034

Comment by David Hill (Inactive) [ 2018-01-31 ]

should be retested with fix for 1034, which is merged now.

Comment by Daniel Lee (Inactive) [ 2018-02-01 ]

Build verified: mcs-1.1.3 ami (ami-7e2b9006) released to QA on 02/01/2018.

Generated at Thu Feb 08 02:26:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.