Uploaded image for project: 'MariaDB MaxScale'
  1. MariaDB MaxScale
  2. MXS-511

ReadWriteSplit router won't choose node as master and logs confusing "RUNNING MASTER" error message

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: 1.2.1
    • Fix Version/s: 1.4.0
    • Component/s: readwritesplit
    • Labels:
      None

      Description

      In some cases, MaxScale will not choose a node as master, and it logs a confusing error message that says it can't do so because the node is "RUNNING MASTER".

      2015-12-30 15:06:46   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.

      This error message is a bit confusing. Can this error message be clarified at all?

      This was seen in the following context:

      A user is using MaxScale's ReadWriteSplit router along with a 3-node Galera Cluster.

      At one point, the cluster's master hung. At that point, MaxScale could no longer connect to the master.

      MaxScale log:

      2015-12-30 14:32:51   Error : Monitor was unable to connect to server 10.10.10.130:3306 : "Can't connect to MySQL server on '10.10.10.130' (4)"
      2015-12-30 14:32:51   Error : server 10.10.10.130:3306 lost the master status. Readwritesplit service can't locate the master. Client sessions will be closed.

      The user brought up the hung node shortly after.

      Galera Cluster log:

      2015-12-30 15:05:39 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
      2015-12-30 15:05:39 0 [Note] /usr/sbin/mysqld (mysqld 5.6.26-74.0-56-log) starting as process 5932 ...
      2015-12-30 15:05:39 5932 [Note] WSREP: Read nil XID from storage engines, skipping position init
      2015-12-30 15:05:39 5932 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib/libgalera_smm.so'
      2015-12-30 15:05:39 5932 [Note] WSREP: wsrep_load(): Galera 3.12.2(rf3e626d) by Codership Oy <info@codership.com> loaded successfully.

      The node needed a state transfer, but it was able to synchronize with the group fairly quickly:

      Galera Cluster log:

      2015-12-30 15:05:39 5932 [Warning] WSREP: Gap in state sequence. Need state transfer.
      2015-12-30 15:05:39 5932 [Note] WSREP: Running: 'wsrep_sst_xtrabackup-v2 --role 'joiner' --address '10.10.10.130' --datadir '/home/galera/data/' --defaults-file '/home/galera/conf/my.cnf' --defaults-group-suffix '' --parent '5932' --binlog '/home/galera/logdata/mysql-bin' '
      WSREP_SST: [INFO] Streaming with xbstream (20151230 15:05:40.004)
      WSREP_SST: [INFO] Using socat as streamer (20151230 15:05:40.006)
      WSREP_SST: [INFO] Evaluating timeout -k 110 100 socat -u TCP-LISTEN:4444,reuseaddr stdio | xbstream -x; RC=( ${PIPESTATUS[@]} ) (20151230 15:05:40.077)
      2015-12-30 15:05:40 5932 [Note] WSREP: Prepared SST request: xtrabackup-v2|10.10.10.130:4444/xtrabackup_sst//1
      2015-12-30 15:05:40 5932 [Note] WSREP: REPL Protocols: 7 (3, 2)
      2015-12-30 15:05:40 5932 [Note] WSREP: Service thread queue flushed.
      2015-12-30 15:05:40 5932 [Note] WSREP: Assign initial position for certification: 2604394632, protocol version: 3
      2015-12-30 15:05:40 5932 [Note] WSREP: Service thread queue flushed.
      2015-12-30 15:05:40 5932 [Note] WSREP: Prepared IST receiver, listening at: tcp://10.10.10.130:4568
      2015-12-30 15:05:40 5932 [Note] WSREP: Member 0.0 (node2) requested state transfer from '*any*'. Selected 1.0 (node3)(SYNCED) as donor.
      2015-12-30 15:05:40 5932 [Note] WSREP: Shifting PRIMARY -> JOINER (TO: 2604394640)
      2015-12-30 15:05:40 5932 [Note] WSREP: Requesting state transfer: success, donor: 1
      2015-12-30 15:05:41 5932 [Note] WSREP: 1.0 (node3): State transfer to 0.0 (node2) complete.
      2015-12-30 15:05:41 5932 [Note] WSREP: Member 1.0 (node3) synced with group.
      WSREP_SST: [INFO] xtrabackup_ist received from donor: Running IST (20151230 15:05:41.583)
      WSREP_SST: [INFO] Galera co-ords from recovery: aed16fff-961b-11e3-b31d-ebb207beff83:2604252415 (20151230 15:05:41.585)
      WSREP_SST: [INFO] Total time on joiner: 0 seconds (20151230 15:05:41.587)
      WSREP_SST: [INFO] Removing the sst_in_progress file (20151230 15:05:41.588)
      2015-12-30 15:05:41 5932 [Note] WSREP: SST complete, seqno: 2604252415
      2015-12-30 15:05:41 5932 [Note] Plugin 'FEDERATED' is disabled.
      2015-12-30 15:05:41 5932 [Warning] Using unique option prefix innodb_read_ahead instead of innodb-read-ahead-threshold is deprecated and will be removed in a future release. Please use the full name instead.
      2015-12-30 15:05:41 7f1ed49e8720 InnoDB: Warning: Using innodb_locks_unsafe_for_binlog is DEPRECATED. This option may be removed in future releases. Please use READ COMMITTED transaction isolation level instead, see http://dev.mysql.com/doc/refman/5.6/en/set-transaction.html.
      2015-12-30 15:05:41 5932 [Note] InnoDB: Using atomics to ref count buffer pool pages
      2015-12-30 15:05:41 5932 [Note] InnoDB: The InnoDB memory heap is disabled
      2015-12-30 15:05:41 5932 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
      2015-12-30 15:05:41 5932 [Note] InnoDB: Memory barrier is not used
      2015-12-30 15:05:41 5932 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2015-12-30 15:05:41 5932 [Note] InnoDB: Using Linux native AIO
      2015-12-30 15:05:41 5932 [Note] InnoDB: Using CPU crc32 instructions
      2015-12-30 15:05:41 5932 [Note] InnoDB: Initializing buffer pool, size = 208.0G
      2015-12-30 15:05:42 5932 [Note] WSREP: (243d391e, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2015-12-30 15:05:51 5932 [Note] InnoDB: Completed initialization of buffer pool
      2015-12-30 15:05:53 5932 [Note] InnoDB: Highest supported file format is Barracuda.
      2015-12-30 15:05:56 5932 [Note] InnoDB: 128 rollback segment(s) are active.
      2015-12-30 15:05:56 5932 [Note] InnoDB: Waiting for purge to start
      2015-12-30 15:05:56 5932 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.26-74.0 started; log sequence number 2911365201969
      2015-12-30 15:05:56 7ee2b24bf700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
      2015-12-30 15:05:56 5932 [Note] RSA private key file not found: /home/galera/data//private_key.pem. Some authentication plugins will not work.
      2015-12-30 15:05:56 5932 [Note] RSA public key file not found: /home/galera/data//public_key.pem. Some authentication plugins will not work.
      2015-12-30 15:05:56 5932 [Note] Server hostname (bind-address): '0.0.0.0'; port: 3306
      2015-12-30 15:05:56 5932 [Note]   - '0.0.0.0' resolves to '0.0.0.0';
      2015-12-30 15:05:56 5932 [Note] Server socket created on IP: '0.0.0.0'.
      2015-12-30 15:05:56 5932 [ERROR] /usr/sbin/mysqld: Table './mysql/user' is marked as crashed and should be repaired
      2015-12-30 15:05:56 5932 [Warning] Checking table:   './mysql/user'
      2015-12-30 15:05:56 5932 [ERROR] 1 client is using or hasn't closed the table properly
      2015-12-30 15:05:56 5932 [ERROR] /usr/sbin/mysqld: Table './mysql/db' is marked as crashed and should be repaired
      2015-12-30 15:05:56 5932 [Warning] Checking table:   './mysql/db'
      2015-12-30 15:05:56 5932 [ERROR] 1 client is using or hasn't closed the table properly
      2015-12-30 15:05:56 5932 [Note] Event Scheduler: Loaded 0 events
      2015-12-30 15:05:56 5932 [Note] WSREP: Signalling provider to continue.
      2015-12-30 15:05:56 5932 [Note] WSREP: Initialized wsrep sidno 3
      2015-12-30 15:05:56 5932 [Note] WSREP: SST received: aed16fff-961b-11e3-b31d-ebb207beff83:2604252415
      2015-12-30 15:05:56 5932 [Note] WSREP: Receiving IST: 142217 writesets, seqnos 2604252415-2604394632
      2015-12-30 15:05:56 5932 [Note] /usr/sbin/mysqld: ready for connections.
      Version: '5.6.26-74.0-56-log'  socket: '/home/galera/run/mysqld.sock'  port: 3306  Percona XtraDB Cluster (GPL), Release rel74.0, Revision 624ef81, WSREP version 25.12, wsrep_25.12
      2015-12-30 15:06:41 5932 [Note] WSREP: IST received: aed16fff-961b-11e3-b31d-ebb207beff83:2604394632
      2015-12-30 15:06:41 5932 [Note] WSREP: 0.0 (node2): State transfer from 1.0 (node3) complete.
      2015-12-30 15:06:41 5932 [Note] WSREP: Shifting JOINER -> JOINED (TO: 2604395006)
      2015-12-30 15:06:41 5932 [Note] WSREP: Member 0.0 (node2) synced with group.
      2015-12-30 15:06:41 5932 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 2604395009)
      2015-12-30 15:06:41 5932 [Note] WSREP: Synchronized with group, ready for connections
      2015-12-30 15:35:16 7ee2b24bf700 InnoDB: Buffer pool(s) load completed at 151230 15:35:16

      After the node synchronizes with the group, I would expect MaxScale to fail back to the old master. However, MaxScale would not choose the node as master.

      MaxScale log:

      2015-12-30 15:06:46   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:49   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:50   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:51   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:52   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:53   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:55   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:55   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:58   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:06:59   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:07:00   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:07:01   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:07:03   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:07:03   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:07:04   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 15:07:52   Error : Server at 10.10.10.130:3306 should be master but is RUNNING MASTER instead and can't be chosen to master.
      2015-12-30 19:05:46   MaxScale received signal SIGTERM. Exiting.
      2015-12-30 19:05:46   MaxScale is shut down.

      After MaxScale was restarted, it had no problem choosing the node as master.

        Attachments

          Activity

            People

            Assignee:
            markus makela markus makela
            Reporter:
            GeoffMontee Geoff Montee
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

              Dates

              Created:
              Updated:
              Resolved: