[MDEV-8744] galera nodes terminate on duplicate key found Created: 2015-09-04  Updated: 2020-12-07  Resolved: 2020-12-07

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.0.21-galera
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Jan Lindström (Inactive)
Resolution: Won't Fix Votes: 2
Labels: galera
Environment:

rhe6, 3 node galera cluser, db01 as replication slave, db02/03 as other members


Issue Links:
Relates
relates to MDEV-8223 MariaDB Galera cluster crashed Closed

 Description   

Both nodes db02/db03 where initialized on a read only cluster using SST the day before, db01 was the donor for db02 and db02 was the donor for db03 after that.

The db01 was a replication slave to a non-galera 10.0.17 instance applying a bunch of binary logs.

It seems the writeset passed certification and then failed on apply.

The galera nodes 02/03 that applied the writeset terminated when a duplicate key was found.

So when this duplicate key happened two nodes just terminated. More preferable options in priority order are:

  • the row is being overwritten anyway, so just write a warning to the log
  • some table level repair
  • instigating a SST

Here are the error replication logs for the few hours where it was a replication slave.

db03

150903  9:21:55 [Warning] WSREP: Failed to report last committed 6749189, -4 (Interrupted system call)
150903  9:22:50 [Warning] WSREP: Failed to report last committed 6756265, -4 (Interrupted system call)
150903  9:26:16 [Warning] WSREP: Failed to report last committed 6776709, -4 (Interrupted system call)
150903  9:31:02 [Warning] WSREP: Failed to report last committed 6824634, -4 (Interrupted system call)
150903  9:32:03 [Warning] WSREP: Failed to report last committed 6833378, -4 (Interrupted system call)
150903  9:33:38 [Warning] WSREP: Failed to report last committed 6844334, -4 (Interrupted system call)
150903  9:35:44 [Warning] WSREP: Failed to report last committed 6860131, -4 (Interrupted system call)
150903  9:45:13 [Warning] WSREP: Failed to report last committed 6949704, -4 (Interrupted system call)
150903  9:45:46 [Warning] WSREP: Failed to report last committed 6955695, -4 (Interrupted system call)
150903  9:49:47 [Warning] WSREP: Failed to report last committed 6996438, -4 (Interrupted system call)
150903  9:50:20 [Warning] WSREP: Failed to report last committed 7002206, -4 (Interrupted system call)
150903  9:51:25 [Warning] WSREP: Failed to report last committed 7012730, -4 (Interrupted system call)
150903  9:52:02 [Warning] WSREP: Failed to report last committed 7018641, -4 (Interrupted system call)
150903  9:54:39 [Warning] WSREP: Failed to report last committed 7047556, -4 (Interrupted system call)
150903  9:59:29 [Warning] WSREP: Failed to report last committed 7097686, -4 (Interrupted system call)
150903 10:00:06 [Warning] WSREP: Failed to report last committed 7104670, -4 (Interrupted system call)
150903 10:10:56 [Warning] WSREP: Failed to report last committed 7236385, -4 (Interrupted system call)
150903 10:22:12 [Warning] WSREP: Failed to report last committed 7718826, -4 (Interrupted system call)
150903 10:22:14 [Warning] WSREP: Failed to report last committed 7718948, -4 (Interrupted system call)
150903 10:22:32 [Warning] WSREP: Failed to report last committed 7737252, -4 (Interrupted system call)
150903 10:22:36 [Warning] WSREP: Failed to report last committed 7738864, -4 (Interrupted system call)
150903 10:24:12 [Warning] WSREP: Failed to report last committed 7818199, -4 (Interrupted system call)
150903 10:24:28 [Warning] WSREP: Failed to report last committed 7828066, -4 (Interrupted system call)
150903 10:25:26 [Warning] WSREP: Failed to report last committed 7871014, -4 (Interrupted system call)
150903 10:28:03 [Warning] WSREP: Failed to report last committed 8001517, -4 (Interrupted system call)
150903 10:28:05 [Warning] WSREP: Failed to report last committed 8002691, -4 (Interrupted system call)
150903 10:28:37 [Warning] WSREP: Failed to report last committed 8019473, -4 (Interrupted system call)
150903 10:28:39 [Warning] WSREP: Failed to report last committed 8019630, -4 (Interrupted system call)
150903 10:28:42 [Warning] WSREP: Failed to report last committed 8020407, -4 (Interrupted system call)
150903 10:29:22 [Warning] WSREP: Failed to report last committed 8028512, -4 (Interrupted system call)
150903 10:32:10 [Warning] WSREP: Failed to report last committed 8116509, -4 (Interrupted system call)
150903 10:34:26 [Warning] WSREP: Failed to report last committed 8247508, -4 (Interrupted system call)
150903 10:34:28 [Warning] WSREP: Failed to report last committed 8247561, -4 (Interrupted system call)
150903 10:34:35 [Warning] WSREP: Failed to report last committed 8247729, -4 (Interrupted system call)
150903 10:34:43 [Warning] WSREP: Failed to report last committed 8247830, -4 (Interrupted system call)
150903 10:34:47 [Warning] WSREP: Failed to report last committed 8249538, -4 (Interrupted system call)
150903 10:36:40 [Warning] WSREP: Failed to report last committed 8284578, -4 (Interrupted system call)
150903 10:36:40 [Warning] WSREP: Failed to report last committed 8284578, -4 (Interrupted system call)
150903 10:36:54 [Warning] WSREP: Failed to report last committed 8293440, -4 (Interrupted system call)
150903 10:37:54 [Warning] WSREP: Failed to report last committed 8323560, -4 (Interrupted system call)
150903 10:38:48 [Warning] WSREP: Failed to report last committed 8345208, -4 (Interrupted system call)
150903 10:38:52 [Warning] WSREP: Failed to report last committed 8345452, -4 (Interrupted system call)
150903 10:39:31 [Warning] WSREP: Failed to report last committed 8369466, -4 (Interrupted system call)
150903 10:40:49 [Warning] WSREP: Failed to report last committed 8392110, -4 (Interrupted system call)
150903 10:41:40 [Warning] WSREP: Failed to report last committed 8412504, -4 (Interrupted system call)
150903 10:42:15 [Warning] WSREP: Failed to report last committed 8427568, -4 (Interrupted system call)
150903 10:43:58 [Warning] WSREP: Failed to report last committed 8453216, -4 (Interrupted system call)
150903 10:44:24 [Warning] WSREP: Failed to report last committed 8466044, -4 (Interrupted system call)
150903 10:45:00 [Warning] WSREP: Failed to report last committed 8479319, -4 (Interrupted system call)
150903 12:10:57 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table radio.feeds_bollywood; Duplicate entry '7068111' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 288, Internal MariaDB error code: 1062
150903 12:10:57 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 121, 10976748
150903 12:10:57 [Warning] WSREP: Failed to apply app buffer: seqno: 10976748, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
150903 12:10:57 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table radio.feeds_bollywood; Duplicate entry '7068111' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 288, Internal MariaDB error code: 1062
150903 12:10:57 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 121, 10976748
150903 12:10:57 [Warning] WSREP: Failed to apply app buffer: seqno: 10976748, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
150903 12:10:57 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table radio.feeds_bollywood; Duplicate entry '7068111' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 288, Internal MariaDB error code: 1062
150903 12:10:57 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 121, 10976748
150903 12:10:57 [Warning] WSREP: Failed to apply app buffer: seqno: 10976748, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
150903 12:10:57 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table radio.feeds_bollywood; Duplicate entry '7068111' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 288, Internal MariaDB error code: 1062
150903 12:10:57 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 121, 10976748
150903 12:10:57 [ERROR] WSREP: Failed to apply trx: source: 1a8107d1-4c79-11e5-99e7-eae8116e7712 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 210967 trx_id: 10965163191 seqnos (l: 8249736, g: 10976748, s: 10976744, d: 10976747, ts: 1272223709941162)
150903 12:10:57 [ERROR] WSREP: Failed to apply trx 10976748 4 times
150903 12:10:57 [ERROR] WSREP: Node consistency compromized, aborting...
150903 12:10:57 [Note] WSREP: Closing send monitor...
150903 12:10:57 [Note] WSREP: Closed send monitor.
150903 12:10:57 [Note] WSREP: gcomm: terminating thread
150903 12:10:57 [Note] WSREP: gcomm: joining thread
150903 12:10:57 [Note] WSREP: gcomm: closing backend
150903 12:10:58 [Note] WSREP: view(view_id(NON_PRIM,01566c03,23) memb {
        01566c03,0
} joined {
} left {
} partitioned {
        1a8107d1,0
        89d94a4c,0
})
150903 12:10:58 [Note] WSREP: view((empty))
150903 12:10:58 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
150903 12:10:58 [Note] WSREP: gcomm: closed
150903 12:10:58 [Note] WSREP: Flow-control interval: [1024, 1024]
150903 12:10:58 [Note] WSREP: Received NON-PRIMARY.
150903 12:10:58 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 10976790)
150903 12:10:58 [Note] WSREP: Received self-leave message.
150903 12:10:58 [Note] WSREP: Flow-control interval: [0, 0]
150903 12:10:58 [Note] WSREP: Received SELF-LEAVE. Closing connection.
150903 12:10:58 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 10976790)
150903 12:10:58 [Note] WSREP: RECV thread exiting 0: Success
150903 12:10:58 [Note] WSREP: recv_thread() joined.
150903 12:10:58 [Note] WSREP: Closing replication queue.
150903 12:10:58 [Note] WSREP: Closing slave action queue.
150903 12:10:58 [Note] WSREP: /usr/sbin/mysqld: Terminated.
150903 12:11:00 mysqld_safe Number of processes running now: 0
150903 12:11:00 mysqld_safe WSREP: not restarting wsrep node automatically
150903 12:11:00 mysqld_safe mysqld from pid file /var/lib/mysql/mysqld.pid ended

db02

150903  6:47:28 [Warning] WSREP: Failed to report last committed 2879272, -4 (Interrupted system call)
150903  9:28:50 [Warning] WSREP: Failed to report last committed 6801022, -4 (Interrupted system call)
150903  9:29:54 [Warning] WSREP: Failed to report last committed 6812372, -4 (Interrupted system call)
150903  9:32:03 [Warning] WSREP: Failed to report last committed 6831538, -4 (Interrupted system call)
150903  9:35:11 [Warning] WSREP: Failed to report last committed 6853713, -4 (Interrupted system call)
150903  9:35:13 [Warning] WSREP: Failed to report last committed 6853714, -4 (Interrupted system call)
150903  9:37:54 [Warning] WSREP: Failed to report last committed 6879762, -4 (Interrupted system call)
150903  9:40:31 [Warning] WSREP: Failed to report last committed 6900523, -4 (Interrupted system call)
150903  9:42:49 [Warning] WSREP: Failed to report last committed 6922403, -4 (Interrupted system call)
150903  9:42:55 [Warning] WSREP: Failed to report last committed 6923162, -4 (Interrupted system call)
150903  9:44:39 [Warning] WSREP: Failed to report last committed 6942183, -4 (Interrupted system call)
150903  9:45:45 [Warning] WSREP: Failed to report last committed 6953773, -4 (Interrupted system call)
150903  9:48:04 [Warning] WSREP: Failed to report last committed 6977037, -4 (Interrupted system call)
150903  9:52:10 [Warning] WSREP: Failed to report last committed 7018054, -4 (Interrupted system call)
150903  9:58:25 [Warning] WSREP: Failed to report last committed 7084484, -4 (Interrupted system call)
150903  9:59:29 [Warning] WSREP: Failed to report last committed 7095783, -4 (Interrupted system call)
150903 10:00:00 [Warning] WSREP: Failed to report last committed 7101989, -4 (Interrupted system call)
150903 10:06:09 [Warning] WSREP: Failed to report last committed 7175600, -4 (Interrupted system call)
150903 10:10:57 [Warning] WSREP: Failed to report last committed 7234366, -4 (Interrupted system call)
150903 10:24:24 [Warning] WSREP: Failed to report last committed 7825598, -4 (Interrupted system call)
150903 10:25:02 [Warning] WSREP: Failed to report last committed 7853232, -4 (Interrupted system call)
150903 10:25:04 [Warning] WSREP: Failed to report last committed 7854483, -4 (Interrupted system call)
150903 10:25:07 [Warning] WSREP: Failed to report last committed 7854811, -4 (Interrupted system call)
150903 10:26:00 [Warning] WSREP: Failed to report last committed 7895539, -4 (Interrupted system call)
150903 10:29:39 [Warning] WSREP: Failed to report last committed 8028659, -4 (Interrupted system call)
150903 10:32:36 [Warning] WSREP: Failed to report last committed 8149913, -4 (Interrupted system call)
150903 10:33:16 [Warning] WSREP: Failed to report last committed 8198141, -4 (Interrupted system call)
150903 10:34:20 [Warning] WSREP: Failed to report last committed 8247333, -4 (Interrupted system call)
150903 10:34:22 [Warning] WSREP: Failed to report last committed 8247378, -4 (Interrupted system call)
150903 10:34:32 [Warning] WSREP: Failed to report last committed 8247561, -4 (Interrupted system call)
150903 10:35:18 [Warning] WSREP: Failed to report last committed 8257479, -4 (Interrupted system call)
150903 10:35:31 [Warning] WSREP: Failed to report last committed 8261964, -4 (Interrupted system call)
150903 10:35:33 [Warning] WSREP: Failed to report last committed 8262357, -4 (Interrupted system call)
150903 10:35:55 [Warning] WSREP: Failed to report last committed 8263723, -4 (Interrupted system call)
150903 10:35:59 [Warning] WSREP: Failed to report last committed 8264477, -4 (Interrupted system call)
150903 10:39:20 [Warning] WSREP: Failed to report last committed 8367706, -4 (Interrupted system call)
150903 10:41:40 [Warning] WSREP: Failed to report last committed 8412446, -4 (Interrupted system call)
150903 10:42:23 [Warning] WSREP: Failed to report last committed 8428644, -4 (Interrupted system call)
150903 10:42:25 [Warning] WSREP: Failed to report last committed 8428798, -4 (Interrupted system call)
150903 10:42:56 [Warning] WSREP: Failed to report last committed 8434220, -4 (Interrupted system call)
150903 10:43:29 [Warning] WSREP: Failed to report last committed 8442515, -4 (Interrupted system call)
150903 10:43:57 [Warning] WSREP: Failed to report last committed 8453532, -4 (Interrupted system call)
150903 10:44:01 [Warning] WSREP: Failed to report last committed 8455945, -4 (Interrupted system call)
150903 10:44:36 [Warning] WSREP: Failed to report last committed 8471815, -4 (Interrupted system call)
150903 10:44:49 [Warning] WSREP: Failed to report last committed 8475309, -4 (Interrupted system call)
150903 10:44:50 [Warning] WSREP: Failed to report last committed 8476501, -4 (Interrupted system call)
150903 10:45:09 [Warning] WSREP: Failed to report last committed 8483399, -4 (Interrupted system call)
150903 10:46:13 [Warning] WSREP: Failed to report last committed 8507041, -4 (Interrupted system call)
150903 10:46:20 [Warning] WSREP: Failed to report last committed 8508579, -4 (Interrupted system call)
150903 11:57:37 [Warning] WSREP: Failed to report last committed 10422316, -4 (Interrupted system call)
150903 12:10:57 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table radio.feeds_bollywood; Duplicate entry '7068111' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 288, Internal MariaDB error code: 1062
150903 12:10:57 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 121, 10976748
150903 12:10:57 [Warning] WSREP: Failed to apply app buffer: seqno: 10976748, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 2th time
150903 12:10:57 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table radio.feeds_bollywood; Duplicate entry '7068111' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 288, Internal MariaDB error code: 1062
150903 12:10:57 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 121, 10976748
150903 12:10:57 [Warning] WSREP: Failed to apply app buffer: seqno: 10976748, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 3th time
150903 12:10:57 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table radio.feeds_bollywood; Duplicate entry '7068111' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 288, Internal MariaDB error code: 1062
150903 12:10:57 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 121, 10976748
150903 12:10:57 [Warning] WSREP: Failed to apply app buffer: seqno: 10976748, status: 1
         at galera/src/trx_handle.cpp:apply():351
Retrying 4th time
150903 12:10:57 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table radio.feeds_bollywood; Duplicate entry '7068111' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log FIRST, end_log_pos 288, Internal MariaDB error code: 1062
150903 12:10:57 [Warning] WSREP: RBR event 2 Write_rows_v1 apply warning: 121, 10976748
150903 12:10:57 [ERROR] WSREP: Failed to apply trx: source: 1a8107d1-4c79-11e5-99e7-eae8116e7712 version: 3 local: 0 state: APPLYING flags: 1 conn_id: 210967 trx_id: 10965163191 seqnos (l: 8249741, g: 10976748, s: 10976744, d: 10976747, ts: 1272223709941162)
150903 12:10:57 [ERROR] WSREP: Failed to apply trx 10976748 4 times
150903 12:10:57 [ERROR] WSREP: Node consistency compromized, aborting...
150903 12:10:57 [Note] WSREP: Closing send monitor...
150903 12:10:57 [Note] WSREP: Closed send monitor.
150903 12:10:57 [Note] WSREP: gcomm: terminating thread
150903 12:10:57 [Note] WSREP: gcomm: joining thread
150903 12:10:57 [Note] WSREP: gcomm: closing backend
150903 12:10:58 [Note] WSREP: view(view_id(NON_PRIM,01566c03,23) memb {
        89d94a4c,0
} joined {
} left {
} partitioned {
        01566c03,0
        1a8107d1,0
})
150903 12:10:58 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
150903 12:10:58 [Note] WSREP: view((empty))
150903 12:10:58 [Note] WSREP: gcomm: closed

db01 log

 
150901 11:17:19 [Note] WSREP: 2.0 (galera02): State transfer to 0.0 (galera03) complete.
150901 11:17:19 [Note] WSREP: Member 2.0 (galera02) synced with group.
150901 11:18:00 [Note] WSREP: 0.0 (galera03): State transfer from 2.0 (galera02) complete.
150901 11:18:00 [Note] WSREP: Member 0.0 (galera03) synced with group.
150902 16:46:04 [Warning] Access denied for user 'root'@'localhost' (using password: NO)
150903  6:46:27 [Note] WSREP: ready state reached
150903  6:46:27 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.034775' at position 53798665, relay log './mysqld-relay-bin.000201' position: 53798956
150903  9:42:52 [Warning] WSREP: Failed to report last committed 6924409, -4 (Interrupted system call)
150903  9:42:58 [Warning] WSREP: Failed to report last committed 6925103, -4 (Interrupted system call)
150903  9:54:10 [Warning] WSREP: Failed to report last committed 7041944, -4 (Interrupted system call)
150903 10:25:05 [Warning] WSREP: Failed to report last committed 7854855, -4 (Interrupted system call)
150903 10:32:13 [Warning] WSREP: Failed to report last committed 8116744, -4 (Interrupted system call)
150903 10:35:40 [Warning] WSREP: Failed to report last committed 8264009, -4 (Interrupted system call)
150903 10:36:42 [Warning] WSREP: Failed to report last committed 8284577, -4 (Interrupted system call)
150903 10:43:11 [Warning] WSREP: Failed to report last committed 8436680, -4 (Interrupted system call)
150903 12:11:01 [Note] WSREP: forgetting 01566c03 (tcp://10.131.27.6:4567)
150903 12:11:01 [Note] WSREP: forgetting 89d94a4c (tcp://10.131.27.69:4567)
150903 12:11:01 [Note] WSREP: Node 1a8107d1 state prim
150903 12:11:01 [Note] WSREP: view(view_id(PRIM,1a8107d1,24) memb {
        1a8107d1,0
} joined {
} left {
} partitioned {
        01566c03,0
        89d94a4c,0
})
150903 12:11:01 [Note] WSREP: save pc into disk
150903 12:11:01 [Note] WSREP: forgetting 01566c03 (tcp://10.131.27.6:4567)
150903 12:11:01 [Note] WSREP: forgetting 89d94a4c (tcp://10.131.27.69:4567)
150903 12:11:01 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
150903 12:11:01 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 67d30b51-51e9-11e5-aca5-9bfe9792b9bf
150903 12:11:01 [Note] WSREP: STATE EXCHANGE: sent state msg: 67d30b51-51e9-11e5-aca5-9bfe9792b9bf
150903 12:11:01 [Note] WSREP: STATE EXCHANGE: got state msg: 67d30b51-51e9-11e5-aca5-9bfe9792b9bf from 0 (galera01)
150903 12:11:01 [Note] WSREP: Quorum results:
        version    = 3,
        component  = PRIMARY,
        conf_id    = 23,
        members    = 1/1 (joined/total),
        act_id     = 10976800,
        last_appl. = 10976715,
        protocols  = 0/7/3 (gcs/repl/appl),
        group UUID = 1a822538-4c79-11e5-b834-1a7ed5bac453
150903 12:11:01 [Note] WSREP: Flow-control interval: [1024, 1024]
150903 12:11:01 [Note] WSREP: New cluster view: global state: 1a822538-4c79-11e5-b834-1a7ed5bac453:10976800, view# 24: Primary, number of nodes: 1, my index: 0, protocol version 3
150903 12:11:01 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
150903 12:11:01 [Note] WSREP: REPL Protocols: 7 (3, 2)
150903 12:11:01 [Note] WSREP: Service thread queue flushed.
150903 12:11:01 [Note] WSREP: Assign initial position for certification: 10976800, protocol version: 3
150903 12:11:01 [Note] WSREP: Service thread queue flushed.
150903 12:11:06 [Note] WSREP:  cleaning up 89d94a4c (tcp://10.131.27.69:4567)
150903 12:11:06 [Note] WSREP:  cleaning up 01566c03 (tcp://10.131.27.6:4567)
150903 12:30:39 [Note] Slave SQL thread stopped because it reached its UNTIL position 298100

CREATE TABLE `feeds_bollywood` (
`feeds_bollywood_id` int(11) NOT NULL AUTO_INCREMENT,
`title` varchar(250) NOT NULL,
`artist` varchar(250) NOT NULL,
`status` varchar(50) NOT NULL,
`rating` varchar(10) DEFAULT NULL,
`time` varchar(50) NOT NULL,
`arrival_time` varchar(50) DEFAULT NULL,
`duration` varchar(50) NOT NULL,
`url` varchar(500) DEFAULT NULL,
`media_library_id` int(11) DEFAULT NULL,
`filename` varchar(250) DEFAULT NULL,
`delete` tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (`feeds_bollywood_id`),
KEY `media_library_id` (`media_library_id`),
KEY `delete` (`delete`)
) ENGINE=InnoDB AUTO_INCREMENT=7070722 DEFAULT CHARSET=utf8

I restarted node 02 without wsrep_provider set and it did have this row.
MariaDB [radio]> select * from feeds_bollywood where feeds_bollywood_id=7068111;
1 row in set (0.02 sec)

The node01, continued as a primary (was started with --wsrep-new-cluster), and continued as a replication slave to the other mariadb-10.0.17 master. at 12:30 there was no for PK 7068111 in feeds_bollywood on the db01.

db02:

# ls -la mysql-bin.00033*
-rw-rw---- 1 mysql mysql 104857985 Sep  3 12:08 mysql-bin.000330
-rw-rw---- 1 mysql mysql 104918601 Sep  3 12:08 mysql-bin.000331
-rw-rw---- 1 mysql mysql 104867091 Sep  3 12:09 mysql-bin.000332
-rw-rw---- 1 mysql mysql 104965528 Sep  3 12:09 mysql-bin.000333
-rw-rw---- 1 mysql mysql 104949204 Sep  3 12:10 mysql-bin.000334
-rw-rw---- 1 mysql mysql 104861872 Sep  3 12:10 mysql-bin.000335
-rw-rw---- 1 mysql mysql  62606947 Sep  3 12:10 mysql-bin.000336

# mysqlbinlog -vv --start-position 62398663  mysql-bin.000336 
 
/*!50530 SET @@SESSION.PSEUDO_SLAVE_MODE=1*/;
/*!40019 SET @@session.max_insert_delayed_threads=0*/;
/*!50003 SET @OLD_COMPLETION_TYPE=@@COMPLETION_TYPE,COMPLETION_TYPE=0*/;
DELIMITER /*!*/;
# at 4
#150902 17:24:38 server id 1  end_log_pos 248 CRC32 0x768272b4  Start: binlog v 4, server v 10.0.21-MariaDB-wsrep-log created 150902 17:24:38
# Warning: this binlog is either in use or was not closed properly.
BINLOG '
RrLmVQ8BAAAA9AAAAPgAAAABAAQAMTAuMC4yMS1NYXJpYURCLXdzcmVwLWxvZwAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAEzgNAAgAEgAEBAQEEgAA3AAEGggAAAAICAgCAAAACgoKAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
AAAAAAAAAAAEEwQBtHKCdg==
'/*!*/;
# at 62398663
# at 62398752
#150902 17:33:03 server id 8  end_log_pos 62398752 CRC32 0x9719e342     Table_map: `radio`.`feeds_bollywood` mapped to number 164
#150902 17:33:03 server id 8  end_log_pos 62398959 CRC32 0x5914b29b     Write_rows: table id 164 flags: STMT_END_F
 
BINLOG '
P7TmVRMIAAAAWQAAACAhuAMAAKQAAAAAAAEABXJhZGlvAA9mZWVkc19ib2xseXdvb2QADAMPDw8P
Dw8PDwMPARLuAu4ClgAeAJYAlgCWANwF7gJQB0LjGZc=
P7TmVRcIAAAAzwAAAO8huAMAAKQAAAAAAAEADP//APDP2WsADwBGZWVsIGl0IFNvaG5peWUIAEJv
bmFmaWRlB2hpc3RvcnkBNBMyMDE1LTA5LTAyIDE4OjI5OjAxA04vQQgwMDowMzo1NlgAaHR0cHM6
Ly9pdHVuZXMuYXBwbGUuY29tL2F1L2FsYnVtL2ZlZWwtaXQtc29obml5ZS9pZDQ4NzIxNDY5OD9p
PTQ4NzIxNDcwMSZ1bz00JmF0PTEwbDhyZgAAAAAAAACbshRZ
'/*!*/;
### INSERT INTO `radio`.`feeds_bollywood`
### SET
###   @1=7068111 /* INT meta=0 nullable=0 is_null=0 */
###   @2='Feel it Sohniye' /* VARSTRING(750) meta=750 nullable=0 is_null=0 */
###   @3='Bonafide' /* VARSTRING(750) meta=750 nullable=0 is_null=0 */
###   @4='history' /* VARSTRING(150) meta=150 nullable=0 is_null=0 */
###   @5='4' /* VARSTRING(30) meta=30 nullable=1 is_null=0 */
###   @6='2015-09-02 18:29:01' /* VARSTRING(150) meta=150 nullable=0 is_null=0 */
###   @7='N/A' /* VARSTRING(150) meta=150 nullable=1 is_null=0 */
###   @8='00:03:56' /* VARSTRING(150) meta=150 nullable=0 is_null=0 */
...

I looked though the entire db01 mysql-bin.00033*, the only operation on that row 7068111 is the above entry.



 Comments   
Comment by Daniel Black [ 2015-09-04 ]

Looked at db01 binary log - found the following which explains it missing on db01. It did occur 3 mins after the insert and the delete entry wasn't in the binary log (log-slave-updates) of 02.

/*!*/;
# at 33032862251
# at 33032862340
# at 33032863313
#150902 17:36:45 server id 8  end_log_pos 2968091268 CRC32 0x74e9dcd3   Table_map: `radio`.`feeds_bollywood` mapped to number 107266
#150902 17:36:45 server id 8  end_log_pos 2968092241 CRC32 0xb838df3f   Delete_rows: table id 107266
#150902 17:36:45 server id 8  end_log_pos 2968093013 CRC32 0x2d91fd1d   Delete_rows: table id 107266 flags: STMT_END_F
 
BINLOG '
HbXmVRMIAAAAWQAAAIR66bAAAAKjAQAAAAEABXJhZGlvAA9mZWVkc19ib2xseXdvb2QADAMPDw8P
Dw8PDwMPARLuAu4ClgAeAJYAlgCWANwF7gJQB9Pc6XQ=
HbXmVRkIAAAAzQMAAFF+6bAAAAKjAQAAAAAADP//APDP2WsADwBGZWVsIGl0IFNvaG5peWUIAEJv
bmFmaWRlB2hpc3RvcnkBNBMyMDE1LTA5LTAyIDE4OjI5OjAxA04vQQgwMDowMzo1NlgAaHR0cHM6
Ly9pdHVuZXMuYXBwbGUuY29tL2F1L2FsYnVtL2ZlZWwtaXQtc29obml5ZS9pZDQ4NzIxNDY5OD9p
PTQ4NzIxNDcwMSZ1bz00JmF0PTEwbDhyZgAAAAAAAAEA8NHZawAhAEJlcGFyd2FoIChmZWF0LiBB
cGVrc2hhIERhbmRla2FyKRAATWVldCBCcm9zIEFuamphbgdoaXN0b3J5ATQTMjAxNS0wOS0wMiAx
ODoyMzo1MQNOL0EIMDA6MDU6MDRoAGh0dHBzOi8vaXR1bmVzLmFwcGxlLmNvbS9hdS9hbGJ1bS9i
ZXBhcndhaC1mZWF0Li1hcGVrc2hhLWRhbmRla2FyL2lkOTU2MDY2ODgyP2k9OTU2MDY2ODg0JnVv
PTQmYXQ9MTBsOHJmAAAAAAAAAQDw09lrAAkATWFkYW1peWFuGQBNaWthIFNpbmdoICYgTWFtdGEg
U2hhcm1hB2hpc3RvcnkBNBMyMDE1LTA5LTAyIDE4OjE4OjQ5A04vQQgwMDowNDo1OFIAaHR0cHM6
Ly9pdHVuZXMuYXBwbGUuY29tL2F1L2FsYnVtL21hZGFtaXlhbi9pZDk0ODc0NDk0MT9pPTk0ODc0
NDk3MiZ1bz00JmF0PTEwbDhyZgAAAAAAAAEA8NXZawAWAEJhZG1hc2hpeWFhbiAoTWFzaCBVcCkI
AERKIEtpcmFuB3BsYXlpbmcBNBMyMDE1LTA5LTAyIDE4OjMzOjAzEzIwMTUtMDktMDIgMTg6MzM6
MDQIMDA6MDM6MzRdAGh0dHBzOi8vaXR1bmVzLmFwcGxlLmNvbS9hdS9hbGJ1bS9iYWRtYXNoaXlh
YW4tbWFzaC11cC9pZDk2NTE2NTU5OT9pPTk2NTE2NTk1NiZ1bz00JmF0PTEwbDhyZgAAAAAAAAEA
8NfZawANAERvbGx5IEtpIERvbGkLAERpdnlhIEt1bWFyB2hpc3RvcnkBNBMyMDE1LTA5LTAyIDE4
OjEzOjU0A04vQQgwMDowNDo1MlYAaHR0cHM6Ly9pdHVuZXMuYXBwbGUuY29tL2F1L2FsYnVtL2Rv
bGx5LWtpLWRvbGkvaWQ5NTUxMjMyNTI/aT05NTUxMjMyNTYmdW89NCZhdD0xMGw4cmYAAAAAAAAB
P984uA==
### DELETE FROM `radio`.`feeds_bollywood`
### WHERE
###   @1=7068111 /* INT meta=0 nullable=0 is_null=0 */

Comment by Daniel Black [ 2015-09-04 ]

at time of abort 02/03 both had set global sync_binlog=0, global innodb_flush_log_at_trx_commit=0;
not sure if that changes any observations much.

Comment by Daniel Black [ 2015-09-25 ]

folks, this was a pretty important one that cause all nodes except the one that received the writeset to terminate and not restart.

Comment by Nirbhay Choubey (Inactive) [ 2015-09-25 ]

danblack hmm.. could this be related to http://bugs.mysql.com/bug.php?id=76872 ?

Comment by Daniel Black [ 2015-09-26 ]

> could this be related to http://bugs.mysql.com/bug.php?id=76872 ?

I'm thinking no based on it looks like this occurred when a auto increment/offset has been changed based on [3 Sep 11:16] Laurynas Biveinis analysis. For the case I experienced, the logs indicate this was in a stable 3 node galera cluster receiving lots (like a several days worth from a busy server) of update events including lots of inserts into auto-increment tables and it was only after ~5 hrs that this fault occurred. The [20 Aug 13:08] Rolf Martin-Hoster comment in the bug seems to indicate that this occurs quite frequently would indicate that frequently changing auto_increment values so there could be more to the story if this isn't the case. So if Laurynas's analysis is correct its probably a this is a different bug unless there is more to the story.

Regardless of if http://bugs.mysql.com/bug.php?id=76872 is the cause, the error handling in galera needs improving. My current work around is slave-skip-errors=all which I'm not entirely comfortable with.

Comment by MG [ 2015-10-18 ]

Dan,

I was reading your bug and had some ideas to look at or clarify in the bug just for documentation.

1) Your traditional master uses binlog_format=statement?

2) You have no (write) activity on the galera nodes except the replication threads on db1?

3) Does replica / galera node db1 have slave_parallel_threads>0

4) Do tables on db1 have triggers?

5) I guess you set slave skip errors on db2 and db3? Did you try slave_exec_mode=idempotent?

Comment by Daniel Black [ 2015-10-18 ]

> 1) Your traditional master uses binlog_format=statement?

no, mixed

> 2) You have no (write) activity on the galera nodes except the replication threads on db1?

correct.

> 3) Does replica / galera node db1 have slave_parallel_threads>0

> 4) Do tables on db1 have triggers?

sys schema only

> 5) I guess you set slave skip errors on db2 and db3?

yes

> Did you try slave_exec_mode=idempotent?

no. Its probably a better setting however.

Comment by Daniel Black [ 2015-10-19 ]

@@slave_parallel_threads =20. db1 being a galera and auto values on auto_inc*. its master has auto_increment_offset=increment=2

Comment by Seppo Jaakola [ 2016-02-12 ]

Reading through these logs it seems that the ultimate problem is delete statement for row with ID 7068111,
issued on node db1, and that this delete is never replicated to other nodes.

Can you figure out why this delete did not replicate? Did it come through mysql replication to node db1, or was it ad hoc query to the node, maybe even under wsrep_on=OFF setting? Are there any signs in db2 and db3 logs around the time of this delete operation? Do binlogs on db2 and db3 show that other statements (bebfore and after the delete) were replicated to these nodes, and we only miss this single delete?

Comment by Daniel Black [ 2016-02-29 ]

> Reading through these logs it seems that the ultimate problem is delete statement for row with ID 7068111, issued on node db1, and that this delete is never replicated to other nodes.

yes.

> Can you figure out why this delete did not replicate?

no.

> Did it come through mysql replication to node db1, or was it ad hoc query to the node, maybe even under wsrep_on=OFF setting?

via async replication. wsrep_on=OFF was never used.

> Are there any signs in db2 and db3 logs around the time of this delete operation?

No. The above db2/db3 logs are complete for the period quoted.

> Do binlogs on db2 and db3 show that other statements (bebfore and after the delete) were replicated to these nodes, and we only miss this single delete?

There was other statements there replicated. I don't have access to the binlogs now. Might have been an older wsrep version.

Generated at Thu Feb 08 07:29:27 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.