|
We got the same error on MariaDb 10.2.14 + Galera 25.3.23
This is a 3 node cluster. Node 1 received the query and the error ocurred in nodes 2 and 3, where the daemon terminated as shown in the log below. At this time node 1 became unresponsive (wsrep_cluster_status = non-Primary) and nodes 2 and 3 rejected to start since there was no cluster available to connect to.
This is Scenario 5 described by the great people at Percona in https://www.percona.com/blog/2014/09/01/galera-replication-how-to-recover-a-pxc-cluster/, and we recovered by following instructions at http://galeracluster.com/documentation-webpages/quorumreset.html
Find below the logs for each of 3 nodes.
node 1:
2018-08-21 16:43:11 140170342946560 [Note] WSREP: declaring ae4dc100 at tcp://93.189.38.31:4567 stable
|
2018-08-21 16:43:11 140170342946560 [Note] WSREP: forgetting 4696366d (tcp://93.189.38.29:4567)
|
2018-08-21 16:43:11 140170342946560 [Note] WSREP: Node ae4dc100 state prim
|
2018-08-21 16:43:11 140170342946560 [Note] WSREP: view(view_id(NON_PRIM,ae4dc100,347) memb {
|
cf088599,0
|
} joined {
|
} left {
|
} partitioned {
|
4696366d,0
|
ae4dc100,0
|
})
|
2018-08-21 16:43:11 140170342946560 [Note] WSREP: forgetting ae4dc100 (tcp://93.189.38.31:4567)
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2018-08-21 16:43:11 140170342946560 [Note] WSREP: view(view_id(NON_PRIM,cf088599,348) memb {
|
cf088599,0
|
} joined {
|
} left {
|
} partitioned {
|
4696366d,0
|
ae4dc100,0
|
})
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: Flow-control interval: [253, 256]
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: Trying to continue unpaused monitor
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: Received NON-PRIMARY.
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 43240555)
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: Flow-control interval: [253, 256]
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: Trying to continue unpaused monitor
|
2018-08-21 16:43:11 140170334553856 [Note] WSREP: Received NON-PRIMARY.
|
...
|
...
|
node 2:
2018-08-21 16:43:11 140479239440128 [ERROR] mysqld: Can't find record in 'coche_busquedas'
|
2018-08-21 16:43:11 140479239440128 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table coches_pro.coche_busquedas; Can't find record in 'coche_busquedas', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 536,
|
Internal MariaDB error code: 1032
|
2018-08-21 16:43:11 140479239440128 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 43240555
|
2018-08-21 16:43:11 140479239440128 [Warning] WSREP: Failed to apply app buffer: seqno: 43240555, status: 1
|
at galera/src/trx_handle.cpp:apply():351
|
Retrying 2th time
|
2018-08-21 16:43:11 140479239440128 [ERROR] mysqld: Can't find record in 'coche_busquedas'
|
2018-08-21 16:43:11 140479239440128 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table coches_pro.coche_busquedas; Can't find record in 'coche_busquedas', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 536,
|
Internal MariaDB error code: 1032
|
2018-08-21 16:43:11 140479239440128 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 43240555
|
2018-08-21 16:43:11 140479239440128 [Warning] WSREP: Failed to apply app buffer: seqno: 43240555, status: 1
|
at galera/src/trx_handle.cpp:apply():351
|
Retrying 3th time
|
2018-08-21 16:43:11 140479239440128 [ERROR] mysqld: Can't find record in 'coche_busquedas'
|
2018-08-21 16:43:11 140479239440128 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table coches_pro.coche_busquedas; Can't find record in 'coche_busquedas', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 536, Internal MariaDB error code: 1032
|
2018-08-21 16:43:11 140479239440128 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 43240555
|
2018-08-21 16:43:11 140479239440128 [Warning] WSREP: Failed to apply app buffer: seqno: 43240555, status: 1
|
at galera/src/trx_handle.cpp:apply():351
|
Retrying 4th time
|
2018-08-21 16:43:11 140479239440128 [ERROR] mysqld: Can't find record in 'coche_busquedas'
|
2018-08-21 16:43:11 140479239440128 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table coches_pro.coche_busquedas; Can't find record in 'coche_busquedas', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 536, Internal MariaDB error code: 1032
|
2018-08-21 16:43:11 140479239440128 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 43240555
|
2018-08-21 16:43:11 140479239440128 [ERROR] WSREP: Failed to apply trx: source: cf088599-a499-11e8-9c9f-3328c5765a46 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 1784796 trx_id: 91827000 seqnos (l: 34501412, g: 43240555, s: 43240554, d: 43240509, ts: 6600880066020902)
|
2018-08-21 16:43:11 140479239440128 [ERROR] WSREP: Failed to apply trx 43240555 4 times
|
2018-08-21 16:43:11 140479239440128 [ERROR] WSREP: Node consistency compromised, aborting...
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: Closing send monitor...
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: Closed send monitor.
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: gcomm: terminating thread
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: gcomm: joining thread
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: gcomm: closing backend
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: view(view_id(NON_PRIM,4696366d,346) memb {
|
4696366d,0
|
} joined {
|
} left {
|
} partitioned {
|
ae4dc100,0
|
cf088599,0
|
})
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: view((empty))
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: gcomm: closed
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Flow-control interval: [253, 256]
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Trying to continue unpaused monitor
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Received NON-PRIMARY.
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 43240555)
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Received self-leave message.
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Flow-control interval: [253, 256]
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Trying to continue unpaused monitor
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Received SELF-LEAVE. Closing connection.
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 43240555)
|
2018-08-21 16:43:11 140444625266432 [Note] WSREP: RECV thread exiting 0: Success
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: recv_thread() joined.
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: Closing replication queue.
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: Closing slave action queue.
|
2018-08-21 16:43:11 140479239440128 [Note] WSREP: /usr/sbin/mysqld: Terminated.
|
node 3
2018-08-21 16:43:11 140260152989440 [ERROR] mysqld: Can't find record in 'coche_busquedas'
|
2018-08-21 16:43:11 140260152989440 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table coches_pro.coche_busquedas; Can't find record in 'coche_busquedas', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 536, Internal MariaDB error code: 1032
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 43240555
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: Failed to apply app buffer: seqno: 43240555, status: 1
|
at galera/src/trx_handle.cpp:apply():351
|
Retrying 2th time
|
2018-08-21 16:43:11 140260152989440 [ERROR] mysqld: Can't find record in 'coche_busquedas'
|
2018-08-21 16:43:11 140260152989440 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table coches_pro.coche_busquedas; Can't find record in 'coche_busquedas', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 536, Internal MariaDB error code: 1032
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 43240555
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: Failed to apply app buffer: seqno: 43240555, status: 1
|
at galera/src/trx_handle.cpp:apply():351
|
Retrying 3th time
|
2018-08-21 16:43:11 140260152989440 [ERROR] mysqld: Can't find record in 'coche_busquedas'
|
2018-08-21 16:43:11 140260152989440 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table coches_pro.coche_busquedas; Can't find record in 'coche_busquedas', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 536, Internal MariaDB error code: 1032
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 43240555
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: Failed to apply app buffer: seqno: 43240555, status: 1
|
at galera/src/trx_handle.cpp:apply():351
|
Retrying 4th time
|
2018-08-21 16:43:11 140260152989440 [ERROR] mysqld: Can't find record in 'coche_busquedas'
|
2018-08-21 16:43:11 140260152989440 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table coches_pro.coche_busquedas; Can't find record in 'coche_busquedas', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 536, Internal MariaDB error code: 1032
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: RBR event 3 Delete_rows_v1 apply warning: 120, 43240555
|
2018-08-21 16:43:11 140260152989440 [ERROR] WSREP: Failed to apply trx: source: cf088599-a499-11e8-9c9f-3328c5765a46 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 1784796 trx_id: 91827000 seqnos (l: 29463206, g: 43240555, s: 43240554, d: 43240509, ts: 6600880066020902)
|
2018-08-21 16:43:11 140260152989440 [ERROR] WSREP: Failed to apply trx 43240555 4 times
|
2018-08-21 16:43:11 140260152989440 [ERROR] WSREP: Node consistency compromised, aborting...
|
2018-08-21 16:43:11 140260152989440 [Note] WSREP: Closing send monitor...
|
2018-08-21 16:43:11 140260152989440 [Note] WSREP: Closed send monitor.
|
2018-08-21 16:43:11 140260152989440 [Note] WSREP: gcomm: terminating thread
|
2018-08-21 16:43:11 140260152989440 [Note] WSREP: gcomm: joining thread
|
2018-08-21 16:43:11 140260152989440 [Note] WSREP: gcomm: closing backend
|
2018-08-21 16:43:11 140260152989440 [Note] WSREP: declaring cf088599 at tcp://93.189.38.27:4567 stable
|
2018-08-21 16:43:11 140260152989440 [Note] WSREP: forgetting 4696366d (tcp://93.189.38.29:4567)
|
2018-08-21 16:43:11 140260152989440 [Note] WSREP: Node ae4dc100 state prim
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: user message in state LEAVING
|
2018-08-21 16:43:11 140260152989440 [Warning] WSREP: ae4dc100 sending install message failed: Transport endpoint is not connected
|
2018-08-21 16:43:14 140260152989440 [Note] WSREP: (ae4dc100, 'tcp://0.0.0.0:4567') connection to peer cf088599 with addr tcp://93.189.38.27:4567 timed out, no messages seen in PT3S
|
2018-08-21 16:43:14 140260152989440 [Note] WSREP: (ae4dc100, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://93.189.38.27:4567
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: (ae4dc100, 'tcp://0.0.0.0:4567') reconnecting to cf088599 (tcp://93.189.38.27:4567), attempt 0
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: cleaning up 4696366d (tcp://93.189.38.29:4567)
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: evs::proto(ae4dc100, LEAVING, view_id(REG,ae4dc100,347)) suspecting node: cf088599
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: evs::proto(ae4dc100, LEAVING, view_id(REG,ae4dc100,347)) suspected node without join message, declaring inactive
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: view(view_id(NON_PRIM,ae4dc100,347) memb {
|
ae4dc100,0
|
} joined {
|
} left {
|
} partitioned {
|
4696366d,0
|
cf088599,0
|
})
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: view((empty))
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: gcomm: closed
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Flow-control interval: [253, 256]
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Trying to continue unpaused monitor
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Received NON-PRIMARY.
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 43240555)
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Received self-leave message.
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Flow-control interval: [253, 256]
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Trying to continue unpaused monitor
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Received SELF-LEAVE. Closing connection.
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 43240555)
|
2018-08-21 16:43:16 140225539970816 [Note] WSREP: RECV thread exiting 0: Success
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: recv_thread() joined.
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: Closing replication queue.
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: Closing slave action queue.
|
2018-08-21 16:43:16 140260152989440 [Note] WSREP: /usr/sbin/mysqld: Terminated.
|
|
|
jeetupatil,
From your log, the node didn't crash, it shut down gracefully, as it should when node consistency is compromised which, as you yourself said, happened not due to cluster misbehavior, but due to manual intervention. What result would you expect from the cluster instead?
jonhattan,
Unlike the initial report, your comment doesn't contain any information on how the data sets diverged, so there isn't anything to tell. And yes, if two of three nodes fail, the remaining node won't work since it constitutes only a minority of the cluster.
|
|
From both reports full error logs from all nodes would have been useful as with provided information it is not possible to analyze why cluster consistency was compromised. Did you run some local transactions on one of the nodes ? I suggest that you upgrade to more recent server version and if issue is repeatable provide full error logs on all nodes and check is consistency in dropped node really compromised and why.
|
|
I´m having the same error with MariaDB 10.1.31 and Galera 25.3.18.
I have 4 nodes. One is the master and the other 3 nodes are slave. Only the master receives SQL operations (insert, update, delete, select ...).
I use this cluster to Zabbix, monitoring tool. The size of the database is 1.2 TB.
Recently this problem started to happen. The 3 slave nodes try to execute "Delete_rows_v1" operation in a table and as, they didn´t manage, they goes down. The master becomes no-primary and Zabbix stops.
Very serious problem.
Does someone solve this problem? Any help?
This is the log from one slave node. The others slave nodes have the same messages at the same time:
======
2019-02-14 16:27:01 139992771357440 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table zabbix.problem; Can't find record in 'problem', Error_code: 1032; ha
ndler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 504698, Internal MariaDB error code: 1032
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: RBR event 63 Delete_rows_v1 apply warning: 120, 1859890715
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: Failed to apply app buffer: seqno: 1859890715, status: 1
2019-02-14 16:27:01 139992771357440 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table zabbix.problem; Can't find record in 'problem', Error_code: 1032; ha
ndler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 504698, Internal MariaDB error code: 1032
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: RBR event 63 Delete_rows_v1 apply warning: 120, 1859890715
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: Failed to apply app buffer: seqno: 1859890715, status: 1
2019-02-14 16:27:01 139992771357440 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table zabbix.problem; Can't find record in 'problem', Error_code: 1032; ha
ndler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 504698, Internal MariaDB error code: 1032
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: RBR event 63 Delete_rows_v1 apply warning: 120, 1859890715
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: Failed to apply app buffer: seqno: 1859890715, status: 1
2019-02-14 16:27:01 139939014506240 [Note] WSREP: declaring b46992ff at tcp://10.30.138.98:4567 stable
2019-02-14 16:27:01 139939014506240 [Note] WSREP: declaring dc780cbe at tcp://10.152.27.200:4567 stable
2019-02-14 16:27:01 139939014506240 [Note] WSREP: forgetting 88813637 (tcp://10.152.28.140:4567)
2019-02-14 16:27:01 139992771357440 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table zabbix.problem; Can't find record in 'problem', Error_code: 1032; ha
ndler error HA_ERR_KEY_NOT_FOUND; the event's master log FIRST, end_log_pos 504698, Internal MariaDB error code: 1032
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: RBR event 63 Delete_rows_v1 apply warning: 120, 1859890715
2019-02-14 16:27:01 139992771357440 [ERROR] WSREP: Failed to apply trx: source: b46992ff-dc31-11e8-a16a-fb4e0972de17 version: 3 local: 0 state: APPLYING flags: 1 conn_id:
55934263 trx_id: 11905597584 seqnos (l: 5290198, g: 1859890715, s: 1859890713, d: 1859890695, ts: 9309730685227667)
2019-02-14 16:27:01 139992771357440 [ERROR] WSREP: Failed to apply trx 1859890715 4 times
2019-02-14 16:27:01 139992771357440 [ERROR] WSREP: Node consistency compromised, aborting...
2019-02-14 16:27:01 139992771357440 [Note] WSREP: Closing send monitor...
2019-02-14 16:27:01 139992771357440 [Note] WSREP: Closed send monitor.
2019-02-14 16:27:01 139992771357440 [Note] WSREP: gcomm: terminating thread
2019-02-14 16:27:01 139992771357440 [Note] WSREP: gcomm: joining thread
2019-02-14 16:27:01 139992771357440 [Note] WSREP: gcomm: closing backend
2019-02-14 16:27:01 139992771357440 [Note] WSREP: Node a4d90112 state prim
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: user message in state LEAVING
2019-02-14 16:27:01 139992771357440 [Warning] WSREP: a4d90112 sending install message failed: Transport endpoint is not connected
2019-02-14 16:27:01 139992771357440 [Note] WSREP: view(view_id(NON_PRIM,a4d90112,144) memb {
2019-02-14 16:27:01 139992771357440 [Note] WSREP: view((empty))
2019-02-14 16:27:01 139992771357440 [Note] WSREP: gcomm: closed
2019-02-14 16:27:01 139939006113536 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Flow-control interval: [1000000000, 1000000000]
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Trying to continue unpaused monitor
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Received NON-PRIMARY.
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1859890747)
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Received self-leave message.
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Flow-control interval: [1000000000, 1000000000]
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Trying to continue unpaused monitor
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Received SELF-LEAVE. Closing connection.
2019-02-14 16:27:01 139939006113536 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 1859890747)
2019-02-14 16:27:01 139939006113536 [Note] WSREP: RECV thread exiting 0: Success
2019-02-14 16:27:01 139992771357440 [Note] WSREP: recv_thread() joined.
2019-02-14 16:27:01 139992771357440 [Note] WSREP: Closing replication queue.
2019-02-14 16:27:01 139992771357440 [Note] WSREP: Closing slave action queue.
2019-02-14 16:27:01 139992771357440 [Note] WSREP: /usr/sbin/mysqld: Terminated.
======
This is the master log at this time:
=====================
2019-02-14 16:27:01 140512887568128 [Note] WSREP: (b46992ff, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.152.28.140:4567
2019-02-14 16:27:01 140512887568128 [Note] WSREP: declaring a4d90112 at tcp://10.30.139.247:4567 stable
2019-02-14 16:27:01 140512887568128 [Note] WSREP: declaring dc780cbe at tcp://10.152.27.200:4567 stable
2019-02-14 16:27:01 140512887568128 [Note] WSREP: forgetting 88813637 (tcp://10.152.28.140:4567)
2019-02-14 16:27:01 140512887568128 [Note] WSREP: (b46992ff, 'tcp://0.0.0.0:4567') turning message relay requesting off
2019-02-14 16:27:01 140512887568128 [Note] WSREP: Node a4d90112 state prim
2019-02-14 16:27:01 140512887568128 [Note] WSREP: declaring dc780cbe at tcp://10.152.27.200:4567 stable
2019-02-14 16:27:01 140512887568128 [Note] WSREP: forgetting a4d90112 (tcp://10.30.139.247:4567)
2019-02-14 16:27:01 140512887568128 [Note] WSREP: Node b46992ff state prim
2019-02-14 16:27:01 140512887568128 [Note] WSREP: view(view_id(PRIM,b46992ff,145) memb {
2019-02-14 16:27:01 140512887568128 [Note] WSREP: save pc into disk
2019-02-14 16:27:01 140512887568128 [Note] WSREP: forgetting 88813637 (tcp://10.152.28.140:4567)
2019-02-14 16:27:01 140512887568128 [Note] WSREP: forgetting a4d90112 (tcp://10.30.139.247:4567)
2019-02-14 16:27:01 140512879175424 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
2019-02-14 16:27:01 140512879175424 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 1f240ef5-3086-11e9-a861-d7fde14a40b6
2019-02-14 16:27:01 140512879175424 [Note] WSREP: STATE EXCHANGE: sent state msg: 1f240ef5-3086-11e9-a861-d7fde14a40b6
2019-02-14 16:27:01 140512879175424 [Note] WSREP: STATE EXCHANGE: got state msg: 1f240ef5-3086-11e9-a861-d7fde14a40b6 from 0 (s6006as2400)
2019-02-14 16:27:01 140512879175424 [Note] WSREP: STATE EXCHANGE: got state msg: 1f240ef5-3086-11e9-a861-d7fde14a40b6 from 1 (s602das490)
2019-02-14 16:27:01 140512879175424 [Note] WSREP: Quorum results:
2019-02-14 16:27:01 140512879175424 [Note] WSREP: Flow-control interval: [1000000000, 1000000000]
2019-02-14 16:27:01 140512879175424 [Note] WSREP: Trying to continue unpaused monitor
2019-02-14 16:27:01 140512943704832 [Note] WSREP: New cluster view: global state: 84d4d8c6-7c93-11e7-9bed-3e4a4b9dd52a:1859890747, view# 19: Primary, number of nodes: 2, m
y index: 0, protocol version 3
2019-02-14 16:27:01 140512943704832 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-02-14 16:27:01 140512943704832 [Note] WSREP: REPL Protocols: 7 (3, 2)
2019-02-14 16:27:01 140512943704832 [Note] WSREP: Assign initial position for certification: 1859890747, protocol version: 3
2019-02-14 16:27:01 140512937899776 [Note] WSREP: Service thread queue flushed.
2019-02-14 16:27:03 140512887568128 [Note] WSREP: (b46992ff, 'tcp://0.0.0.0:4567') reconnecting to dc780cbe (tcp://10.152.27.200:4567), attempt 0
2019-02-14 16:27:06 140512887568128 [Note] WSREP: cleaning up 88813637 (tcp://10.152.28.140:4567)
2019-02-14 16:27:06 140512887568128 [Note] WSREP: cleaning up a4d90112 (tcp://10.30.139.247:4567)
2019-02-14 16:27:07 140512887568128 [Note] WSREP: evs::proto(b46992ff, OPERATIONAL, view_id(REG,b46992ff,145)) suspecting node: dc780cbe
2019-02-14 16:27:07 140512887568128 [Note] WSREP: evs::proto(b46992ff, OPERATIONAL, view_id(REG,b46992ff,145)) suspected node without join message, declaring inactive
2019-02-14 16:27:08 140512887568128 [Note] WSREP: view(view_id(NON_PRIM,b46992ff,145) memb {
2019-02-14 16:27:08 140512887568128 [Note] WSREP: view(view_id(NON_PRIM,b46992ff,146) memb {
2019-02-14 16:27:08 140512879175424 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
2019-02-14 16:27:08 140512879175424 [Note] WSREP: Flow-control interval: [1000000000, 1000000000]
2019-02-14 16:27:08 140512879175424 [Note] WSREP: Trying to continue unpaused monitor
2019-02-14 16:27:08 140512879175424 [Note] WSREP: Received NON-PRIMARY.
2019-02-14 16:27:08 140512879175424 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 1859890747)
2019-02-14 16:27:08 140512943704832 [Note] WSREP: New cluster view: global state: 84d4d8c6-7c93-11e7-9bed-3e4a4b9dd52a:1859890747, view# -1: non-Primary, number of nodes:
1, my index: 0, protocol version 3
2019-02-14 16:27:08 140512943704832 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
2019-02-14 16:27:08 140474743606016 [Note] WSREP: cluster conflict due to certification failure for threads:
2019-02-14 16:27:08 140474742696704 [Note] WSREP: cluster conflict due to certification failure for threads:
2019-02-14 16:27:08 140473480563456 [Note] WSREP: cluster conflict due to certification failure for threads:
2019-02-14 16:27:08 140473752890112 [Note] WSREP: cluster conflict due to certification failure for threads:
2019-02-14 16:27:08 140473752890112 [Note] WSREP: Victim thread:
2019-02-14 16:27:08 140474743606016 [Note] WSREP: Victim thread:
2019-02-14 16:27:08 140474742696704 [Note] WSREP: Victim thread:
2019-02-14 16:27:08 140462563953408 [Note] WSREP: cluster conflict due to certification failure for threads:
===============
|
|
Can I have output from show create table table zabbix.problem; And if it has foreign keys also output from parent (and grandparent) tables ?
|
|
Sorry for the delay.
Follow the show create table output:
=============================
MariaDB [zabbix]> show create table problem\G
Table: problem
Create Table: CREATE TABLE `problem` (
`eventid` bigint(20) unsigned NOT NULL,
`source` int(11) NOT NULL DEFAULT '0',
`object` int(11) NOT NULL DEFAULT '0',
`objectid` bigint(20) unsigned NOT NULL DEFAULT '0',
`clock` int(11) NOT NULL DEFAULT '0',
`ns` int(11) NOT NULL DEFAULT '0',
`r_eventid` bigint(20) unsigned DEFAULT NULL,
`r_clock` int(11) NOT NULL DEFAULT '0',
`r_ns` int(11) NOT NULL DEFAULT '0',
`correlationid` bigint(20) unsigned DEFAULT NULL,
`userid` bigint(20) unsigned DEFAULT NULL,
`name` varchar(2048) COLLATE utf8_bin NOT NULL DEFAULT '',
`acknowledged` int(11) NOT NULL DEFAULT '0',
`severity` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`eventid`),
KEY `problem_1` (`source`,`object`,`objectid`),
KEY `problem_2` (`r_clock`),
KEY `problem_3` (`r_eventid`),
CONSTRAINT `c_problem_1` FOREIGN KEY (`eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE,
CONSTRAINT `c_problem_2` FOREIGN KEY (`r_eventid`) REFERENCES `events` (`eventid`) ON DELETE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
MariaDB [zabbix]> show create table events\G
Table: events
Create Table: CREATE TABLE `events` (
`eventid` bigint(20) unsigned NOT NULL,
`source` int(11) NOT NULL DEFAULT '0',
`object` int(11) NOT NULL DEFAULT '0',
`objectid` bigint(20) unsigned NOT NULL DEFAULT '0',
`clock` int(11) NOT NULL DEFAULT '0',
`value` int(11) NOT NULL DEFAULT '0',
`acknowledged` int(11) NOT NULL DEFAULT '0',
`ns` int(11) NOT NULL DEFAULT '0',
`name` varchar(2048) COLLATE utf8_bin NOT NULL DEFAULT '',
`severity` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`eventid`),
KEY `events_1` (`source`,`object`,`objectid`,`clock`),
KEY `events_2` (`source`,`object`,`clock`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_bin
==========================
Some more informations:
1) The events table is very big. More than 36 millions of registers.
2) We were having performance problems with the cluster and it was only solved when we disabled totaly the swap memory on the servers.
Thanks in advance.
|
|
Reading the last comment made me have a déjà-vu.
I have a cluster that crashed several times. It has more than 2000 InnoDB tables in a number of databases. One databases is for Jetbrains Teamcity. I remember one of these tables being listed in the logs at least twice: teamcity.node_events
My first though: Is there anything special for tables matching '%events' that makes the rows become inconsitent across cluster nodes?
Currently the table has 1003 rows, so it is not very huge. The structure looks like this:
CREATE TABLE `node_events` (
|
`id` bigint(20) NOT NULL,
|
`name` varchar(64) DEFAULT NULL,
|
`long_arg1` bigint(20) DEFAULT NULL,
|
`long_arg2` bigint(20) DEFAULT NULL,
|
`str_arg` varchar(255) DEFAULT NULL,
|
`node_id` varchar(80) DEFAULT NULL,
|
`created` datetime DEFAULT NULL,
|
PRIMARY KEY (`id`),
|
KEY `node_events_created_idx` (`created`)
|
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
|
|
|
Just had another crash, this time table name is: crmnet.t_mn_eventperiods_audiences
Again it contains "event"...
|