[MXS-511] ReadWriteSplit router won't choose node as master and logs confusing "RUNNING MASTER" error message - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Minor
Resolution: Fixed
Affects Version/s: 1.2.1
Fix Version/s: 1.4.0
Component/s: readwritesplit
Labels:
None

Description

In some cases, MaxScale will not choose a node as master, and it logs a confusing error message that says it can't do so because the node is "RUNNING MASTER".

2015-12-30 15:06:46   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

This error message is a bit confusing. Can this error message be clarified at all?

This was seen in the following context:

A user is using MaxScale's ReadWriteSplit router along with a 3-node Galera Cluster.

At one point, the cluster's master hung. At that point, MaxScale could no longer connect to the master.

MaxScale log:

2015-12-30 14:32:51   Error : Monitor was unable to connect to server 10.10.10.130:3306 : "Can't connect to MySQL server on '10.10.10.130' (4)"

2015-12-30 14:32:51   Error : server 10.10.10.130:3306 lost the master status. Readwritesplit service can't locate the master. Client sessions will be closed.

The user brought up the hung node shortly after.

Galera Cluster log:

2015-12-30 15:05:39 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).

2015-12-30 15:05:39 0 [Note] /usr/sbin/mysqld (mysqld 5.6.26-74.0-56-log) starting as process 5932 ...

2015-12-30 15:05:39 5932 [Note] WSREP: Read nil XID from storage engines, skipping position init

2015-12-30 15:05:39 5932 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'

2015-12-30 15:05:39 5932 [Note] WSREP: wsrep_load(): Galera 3.12.2(rf3e626d) by Codership Oy <info@codership.com> loaded successfully.

The node needed a state transfer, but it was able to synchronize with the group fairly quickly:

Galera Cluster log:

2015-12-30 15:05:39 5932 [Warning] WSREP: Gap in state sequence. Need state transfer.

2015-12-30 15:05:39 5932 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.10.10.130' --datadir '/home/galera/data/' --defaults-file '/home/galera/conf/my.cnf' --defaults-group-suffix '' --parent '5932' --binlog '/home/galera/logdata/mysql-bin' '

WSREP_SST: [INFO] Streaming with xbstream (20151230 15:05:40.004)

WSREP_SST: [INFO] Using socat as streamer (20151230 15:05:40.006)

WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151230 15:05:40.077)

2015-12-30 15:05:40 5932 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.10.10.130:4444/xtrabackup_sst//1

2015-12-30 15:05:40 5932 [Note] WSREP: REPL Protocols: 7 (3, 2)

2015-12-30 15:05:40 5932 [Note] WSREP: Service thread queue flushed.

2015-12-30 15:05:40 5932 [Note] WSREP: Assign initial position for certification: 2604394632, protocol version: 3

2015-12-30 15:05:40 5932 [Note] WSREP: Service thread queue flushed.

2015-12-30 15:05:40 5932 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.10.10.130:4568

2015-12-30 15:05:40 5932 [Note] WSREP: Member 0.0 (node2) requested state transfer from '*any*'. Selected 1.0 (node3)(SYNCED) as donor.

2015-12-30 15:05:40 5932 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2604394640)

2015-12-30 15:05:40 5932 [Note] WSREP: Requesting state transfer: success, donor: 1

2015-12-30 15:05:41 5932 [Note] WSREP: 1.0 (node3): State transfer to 0.0 (node2) complete.

2015-12-30 15:05:41 5932 [Note] WSREP: Member 1.0 (node3) synced with group.

WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20151230 15:05:41.583)

WSREP_SST: [INFO] Galera co-ords from recovery: aed16fff-961b-11e3-b31d-ebb207beff83:2604252415 (20151230 15:05:41.585)

WSREP_SST: [INFO] Total time on joiner: 0 seconds (20151230 15:05:41.587)

WSREP_SST: [INFO] Removing the sst_in_progress file (20151230 15:05:41.588)

2015-12-30 15:05:41 5932 [Note] WSREP: SST complete, seqno: 2604252415

2015-12-30 15:05:41 5932 [Note] Plugin 'FEDERATED' is disabled.

2015-12-30 15:05:41 5932 [Warning] Using unique option prefix innodb_read_ahead instead of innodb-read-ahead-threshold is deprecated and will be removed in a future release. Please use the full name instead.

2015-12-30 15:05:41 7f1ed49e8720 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html.

2015-12-30 15:05:41 5932 [Note] InnoDB: Using atomics to ref count buffer pool pages

2015-12-30 15:05:41 5932 [Note] InnoDB: The InnoDB memory heap is disabled

2015-12-30 15:05:41 5932 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2015-12-30 15:05:41 5932 [Note] InnoDB: Memory barrier is not used

2015-12-30 15:05:41 5932 [Note] InnoDB: Compressed tables use zlib 1.2.7

2015-12-30 15:05:41 5932 [Note] InnoDB: Using Linux native AIO

2015-12-30 15:05:41 5932 [Note] InnoDB: Using CPU crc32 instructions

2015-12-30 15:05:41 5932 [Note] InnoDB: Initializing buffer pool, size = 208.0G

2015-12-30 15:05:42 5932 [Note] WSREP: (243d391e, 'tcp://0.0.0.0:4567') turning message relay requesting off

2015-12-30 15:05:51 5932 [Note] InnoDB: Completed initialization of buffer pool

2015-12-30 15:05:53 5932 [Note] InnoDB: Highest supported file format is Barracuda.

2015-12-30 15:05:56 5932 [Note] InnoDB: 128 rollback segment(s) are active.

2015-12-30 15:05:56 5932 [Note] InnoDB: Waiting for purge to start

2015-12-30 15:05:56 5932 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-74.0 started; log sequence number 2911365201969

2015-12-30 15:05:56 7ee2b24bf700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool

2015-12-30 15:05:56 5932 [Note] RSA private key file not found: /home/galera/data//private_key.pem. Some authentication plugins will not work.

2015-12-30 15:05:56 5932 [Note] RSA public key file not found: /home/galera/data//public_key.pem. Some authentication plugins will not work.

2015-12-30 15:05:56 5932 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306

2015-12-30 15:05:56 5932 [Note]   - '0.0.0.0' resolves to '0.0.0.0';

2015-12-30 15:05:56 5932 [Note] Server socket created on IP: '0.0.0.0'.

2015-12-30 15:05:56 5932 [ERROR] /usr/sbin/mysqld: Table './mysql/user' is marked as crashed and should be repaired

2015-12-30 15:05:56 5932 [Warning] Checking table:   './mysql/user'

2015-12-30 15:05:56 5932 [ERROR] 1 client is using or hasn't closed the table properly

2015-12-30 15:05:56 5932 [ERROR] /usr/sbin/mysqld: Table './mysql/db' is marked as crashed and should be repaired

2015-12-30 15:05:56 5932 [Warning] Checking table:   './mysql/db'

2015-12-30 15:05:56 5932 [ERROR] 1 client is using or hasn't closed the table properly

2015-12-30 15:05:56 5932 [Note] Event Scheduler: Loaded 0 events

2015-12-30 15:05:56 5932 [Note] WSREP: Signalling provider to continue.

2015-12-30 15:05:56 5932 [Note] WSREP: Initialized wsrep sidno 3

2015-12-30 15:05:56 5932 [Note] WSREP: SST received: aed16fff-961b-11e3-b31d-ebb207beff83:2604252415

2015-12-30 15:05:56 5932 [Note] WSREP: Receiving IST: 142217 writesets, seqnos 2604252415-2604394632

2015-12-30 15:05:56 5932 [Note] /usr/sbin/mysqld: ready for connections.

Version: '5.6.26-74.0-56-log'  socket: '/home/galera/run/mysqld.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12

2015-12-30 15:06:41 5932 [Note] WSREP: IST received: aed16fff-961b-11e3-b31d-ebb207beff83:2604394632

2015-12-30 15:06:41 5932 [Note] WSREP: 0.0 (node2): State transfer from 1.0 (node3) complete.

2015-12-30 15:06:41 5932 [Note] WSREP: Shifting JOINER -> JOINED (TO: 2604395006)

2015-12-30 15:06:41 5932 [Note] WSREP: Member 0.0 (node2) synced with group.

2015-12-30 15:06:41 5932 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 2604395009)

2015-12-30 15:06:41 5932 [Note] WSREP: Synchronized with group, ready for connections

2015-12-30 15:35:16 7ee2b24bf700 InnoDB: Buffer pool(s) load completed at 151230 15:35:16

After the node synchronizes with the group, I would expect MaxScale to fail back to the old master. However, MaxScale would not choose the node as master.

MaxScale log:

2015-12-30 15:06:46   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:49   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:50   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:51   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:52   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:53   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:55   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:55   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:58   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:06:59   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:07:00   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:07:01   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:07:03   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:07:03   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:07:04   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 15:07:52   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

2015-12-30 19:05:46   MaxScale received signal SIGTERM. Exiting.

2015-12-30 19:05:46   MaxScale is shut down.

After MaxScale was restarted, it had no problem choosing the node as master.

Attachments

Activity

People

Assignee:: markus makela

Reporter:: Geoff Montee (Inactive)

Votes:: 1 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2016-01-04 20:09

Updated:: 2016-02-15 17:27

Resolved:: 2016-02-15 17:27

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.