Details
-
Bug
-
Status: Closed (View Workflow)
-
Minor
-
Resolution: Fixed
-
1.2.1
-
None
Description
In some cases, MaxScale will not choose a node as master, and it logs a confusing error message that says it can't do so because the node is "RUNNING MASTER".
2015-12-30 15:06:46 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
This error message is a bit confusing. Can this error message be clarified at all?
This was seen in the following context:
A user is using MaxScale's ReadWriteSplit router along with a 3-node Galera Cluster.
At one point, the cluster's master hung. At that point, MaxScale could no longer connect to the master.
MaxScale log:
2015-12-30 14:32:51 Error : Monitor was unable to connect to server 10.10.10.130:3306 : "Can't connect to MySQL server on '10.10.10.130' (4)"
|
2015-12-30 14:32:51 Error : server 10.10.10.130:3306 lost the master status. Readwritesplit service can't locate the master. Client sessions will be closed.
|
The user brought up the hung node shortly after.
Galera Cluster log:
2015-12-30 15:05:39 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
|
2015-12-30 15:05:39 0 [Note] /usr/sbin/mysqld (mysqld 5.6.26-74.0-56-log) starting as process 5932 ...
|
2015-12-30 15:05:39 5932 [Note] WSREP: Read nil XID from storage engines, skipping position init
|
2015-12-30 15:05:39 5932 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
|
2015-12-30 15:05:39 5932 [Note] WSREP: wsrep_load(): Galera 3.12.2(rf3e626d) by Codership Oy <info@codership.com> loaded successfully.
|
The node needed a state transfer, but it was able to synchronize with the group fairly quickly:
Galera Cluster log:
2015-12-30 15:05:39 5932 [Warning] WSREP: Gap in state sequence. Need state transfer.
|
2015-12-30 15:05:39 5932 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.10.10.130' --datadir '/home/galera/data/' --defaults-file '/home/galera/conf/my.cnf' --defaults-group-suffix '' --parent '5932' --binlog '/home/galera/logdata/mysql-bin' '
|
WSREP_SST: [INFO] Streaming with xbstream (20151230 15:05:40.004)
|
WSREP_SST: [INFO] Using socat as streamer (20151230 15:05:40.006)
|
WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151230 15:05:40.077)
|
2015-12-30 15:05:40 5932 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.10.10.130:4444/xtrabackup_sst//1
|
2015-12-30 15:05:40 5932 [Note] WSREP: REPL Protocols: 7 (3, 2)
|
2015-12-30 15:05:40 5932 [Note] WSREP: Service thread queue flushed.
|
2015-12-30 15:05:40 5932 [Note] WSREP: Assign initial position for certification: 2604394632, protocol version: 3
|
2015-12-30 15:05:40 5932 [Note] WSREP: Service thread queue flushed.
|
2015-12-30 15:05:40 5932 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.10.10.130:4568
|
2015-12-30 15:05:40 5932 [Note] WSREP: Member 0.0 (node2) requested state transfer from '*any*'. Selected 1.0 (node3)(SYNCED) as donor.
|
2015-12-30 15:05:40 5932 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2604394640)
|
2015-12-30 15:05:40 5932 [Note] WSREP: Requesting state transfer: success, donor: 1
|
2015-12-30 15:05:41 5932 [Note] WSREP: 1.0 (node3): State transfer to 0.0 (node2) complete.
|
2015-12-30 15:05:41 5932 [Note] WSREP: Member 1.0 (node3) synced with group.
|
WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20151230 15:05:41.583)
|
WSREP_SST: [INFO] Galera co-ords from recovery: aed16fff-961b-11e3-b31d-ebb207beff83:2604252415 (20151230 15:05:41.585)
|
WSREP_SST: [INFO] Total time on joiner: 0 seconds (20151230 15:05:41.587)
|
WSREP_SST: [INFO] Removing the sst_in_progress file (20151230 15:05:41.588)
|
2015-12-30 15:05:41 5932 [Note] WSREP: SST complete, seqno: 2604252415
|
2015-12-30 15:05:41 5932 [Note] Plugin 'FEDERATED' is disabled.
|
2015-12-30 15:05:41 5932 [Warning] Using unique option prefix innodb_read_ahead instead of innodb-read-ahead-threshold is deprecated and will be removed in a future release. Please use the full name instead.
|
2015-12-30 15:05:41 7f1ed49e8720 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html.
|
2015-12-30 15:05:41 5932 [Note] InnoDB: Using atomics to ref count buffer pool pages
|
2015-12-30 15:05:41 5932 [Note] InnoDB: The InnoDB memory heap is disabled
|
2015-12-30 15:05:41 5932 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2015-12-30 15:05:41 5932 [Note] InnoDB: Memory barrier is not used
|
2015-12-30 15:05:41 5932 [Note] InnoDB: Compressed tables use zlib 1.2.7
|
2015-12-30 15:05:41 5932 [Note] InnoDB: Using Linux native AIO
|
2015-12-30 15:05:41 5932 [Note] InnoDB: Using CPU crc32 instructions
|
2015-12-30 15:05:41 5932 [Note] InnoDB: Initializing buffer pool, size = 208.0G
|
2015-12-30 15:05:42 5932 [Note] WSREP: (243d391e, 'tcp://0.0.0.0:4567') turning message relay requesting off
|
2015-12-30 15:05:51 5932 [Note] InnoDB: Completed initialization of buffer pool
|
2015-12-30 15:05:53 5932 [Note] InnoDB: Highest supported file format is Barracuda.
|
2015-12-30 15:05:56 5932 [Note] InnoDB: 128 rollback segment(s) are active.
|
2015-12-30 15:05:56 5932 [Note] InnoDB: Waiting for purge to start
|
2015-12-30 15:05:56 5932 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.26-74.0 started; log sequence number 2911365201969
|
2015-12-30 15:05:56 7ee2b24bf700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
|
2015-12-30 15:05:56 5932 [Note] RSA private key file not found: /home/galera/data//private_key.pem. Some authentication plugins will not work.
|
2015-12-30 15:05:56 5932 [Note] RSA public key file not found: /home/galera/data//public_key.pem. Some authentication plugins will not work.
|
2015-12-30 15:05:56 5932 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
|
2015-12-30 15:05:56 5932 [Note] - '0.0.0.0' resolves to '0.0.0.0';
|
2015-12-30 15:05:56 5932 [Note] Server socket created on IP: '0.0.0.0'.
|
2015-12-30 15:05:56 5932 [ERROR] /usr/sbin/mysqld: Table './mysql/user' is marked as crashed and should be repaired
|
2015-12-30 15:05:56 5932 [Warning] Checking table: './mysql/user'
|
2015-12-30 15:05:56 5932 [ERROR] 1 client is using or hasn't closed the table properly
|
2015-12-30 15:05:56 5932 [ERROR] /usr/sbin/mysqld: Table './mysql/db' is marked as crashed and should be repaired
|
2015-12-30 15:05:56 5932 [Warning] Checking table: './mysql/db'
|
2015-12-30 15:05:56 5932 [ERROR] 1 client is using or hasn't closed the table properly
|
2015-12-30 15:05:56 5932 [Note] Event Scheduler: Loaded 0 events
|
2015-12-30 15:05:56 5932 [Note] WSREP: Signalling provider to continue.
|
2015-12-30 15:05:56 5932 [Note] WSREP: Initialized wsrep sidno 3
|
2015-12-30 15:05:56 5932 [Note] WSREP: SST received: aed16fff-961b-11e3-b31d-ebb207beff83:2604252415
|
2015-12-30 15:05:56 5932 [Note] WSREP: Receiving IST: 142217 writesets, seqnos 2604252415-2604394632
|
2015-12-30 15:05:56 5932 [Note] /usr/sbin/mysqld: ready for connections.
|
Version: '5.6.26-74.0-56-log' socket: '/home/galera/run/mysqld.sock' port: 3306 Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12
|
2015-12-30 15:06:41 5932 [Note] WSREP: IST received: aed16fff-961b-11e3-b31d-ebb207beff83:2604394632
|
2015-12-30 15:06:41 5932 [Note] WSREP: 0.0 (node2): State transfer from 1.0 (node3) complete.
|
2015-12-30 15:06:41 5932 [Note] WSREP: Shifting JOINER -> JOINED (TO: 2604395006)
|
2015-12-30 15:06:41 5932 [Note] WSREP: Member 0.0 (node2) synced with group.
|
2015-12-30 15:06:41 5932 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 2604395009)
|
2015-12-30 15:06:41 5932 [Note] WSREP: Synchronized with group, ready for connections
|
2015-12-30 15:35:16 7ee2b24bf700 InnoDB: Buffer pool(s) load completed at 151230 15:35:16
|
After the node synchronizes with the group, I would expect MaxScale to fail back to the old master. However, MaxScale would not choose the node as master.
MaxScale log:
2015-12-30 15:06:46 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:49 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:50 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:51 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:52 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:53 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:55 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:55 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:58 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:06:59 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:07:00 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:07:01 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:07:03 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:07:03 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:07:04 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 15:07:52 Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
|
2015-12-30 19:05:46 MaxScale received signal SIGTERM. Exiting.
|
2015-12-30 19:05:46 MaxScale is shut down.
|
After MaxScale was restarted, it had no problem choosing the node as master.