Details
Description
A user encountered the following sequence of events.
The server was starting up with --wsrep-new-cluster:
2018-02-22 23:47:41 139925613557888 [Note] WSREP: Read nil XID from storage engines, skipping position init
|
2018-02-22 23:47:41 139925613557888 [Note] WSREP: wsrep_load(): loading provider library '/usr/lib64/galera/libgalera_smm.so'
|
2018-02-22 23:47:41 139925613557888 [Note] WSREP: wsrep_load(): Galera 25.3.22(r3764) by Codership Oy <info@codership.com> loaded successfully.
|
2018-02-22 23:47:41 139925613557888 [Note] WSREP: CRC-32C: using hardware acceleration.
|
2018-02-22 23:47:41 139925613557888 [Note] WSREP: Found saved state: 4ee72403-e1e2-11e7-998a-73a130e7e5cb:-1, safe_to_bootstrap: 1
|
2018-02-22 23:47:41 139925613557888 [Note] WSREP: Recovering GCache ring buffer: version: 1, UUID: 4ee72403-e1e2-11e7-998a-73a130e7e5cb, offset: -1
|
2018-02-22 23:47:41 139925613557888 [Note] WSREP: GCache::RingBuffer initial scan... 0.0% ( 0/2147483688 bytes) complete.
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: GCache::RingBuffer initial scan...100.0% (2147483688/2147483688 bytes) complete.
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Recovering GCache ring buffer: found gapless sequence 520442-551341
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: GCache::RingBuffer unused buffers scan... 0.0% ( 0/547216918 bytes) complete.
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: GCache::RingBuffer unused buffers scan...100.0% (547216918/547216918 bytes) complete.
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): found 4/30904 locked buffers
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: GCache DEBUG: RingBuffer::recover(): used space: 547216918/2147483648
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Passing config to GCS: base_dir = /var/lib/mysql/; base_host = 172.33.40.143; base_port = 4567; cert.log_conflicts = no; debug = no; evs.auto_evict = 0; evs.delay_margin = PT1S; evs.delayed_keep_period = PT30S; evs.inactive_check_period = PT0.5S; evs.inactive_timeout = PT1M; evs.install_timeout = PT1M; evs.join_retrans_period = PT0.5S; evs.keepalive_period = PT3S; evs.max_install_timeouts = 3; evs.send_window = 1024; evs.stats_report_period = PT1M; evs.suspect_timeout = PT30S; evs.user_send_window = 512; evs.view_forget_timeout = PT24H; gcache.dir = /var/lib/mysql/; gcache.keep_pages_size = 0; gcache.mem_size = 0; gcache.name = /var/lib/mysql//galera.cache; gcache.page_size = 128M; gcache.recover = yes; gcache.size = 2G; gcomm.thread_prio = ; gcs.fc_debug = 0; gcs.fc_factor = 1.0; gcs.fc_limit = 16; gcs.fc_master_slave = no; gcs.max_packet_size = 64500; gcs.max_throttle = 0.25; gcs.recv_q_hard_limit = 9223372036854775807; gcs.recv_q_soft_limit = 0.25; gcs.sync_donor = no; gmcast.segment = 0; gmcast.ve
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: GCache history reset: 4ee72403-e1e2-11e7-998a-73a130e7e5cb:551341 -> 4ee72403-e1e2-11e7-998a-73a130e7e5cb:551339
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: GCache DEBUG: RingBuffer::seqno_reset(): full reset
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Assign initial position for certification: 551339, protocol version: -1
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: wsrep_sst_grab()
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Start replication
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: 'wsrep-new-cluster' option used, bootstrapping the cluster
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Setting initial position to 4ee72403-e1e2-11e7-998a-73a130e7e5cb:551339
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: protonet asio version 0
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Using CRC-32C for message checksums.
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: backend: asio
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: gcomm thread scheduling priority set to other:0
|
2018-02-22 23:47:51 139925613557888 [Warning] WSREP: access file(/var/lib/mysql//gvwstate.dat) failed(No such file or directory)
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: restore pc from disk failed
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: GMCast version 0
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: (cb295870, 'tcp://0.0.0.0:4567') listening at tcp://0.0.0.0:4567
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: (cb295870, 'tcp://0.0.0.0:4567') multicast: , ttl: 1
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: EVS version 0
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: gcomm: bootstrapping new group 'galera_cluster_01'
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: start_prim is enabled, turn off pc_recovery
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Node cb295870 state prim
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: view(view_id(PRIM,cb295870,1) memb {
|
cb295870,0
|
} joined {
|
} left {
|
} partitioned {
|
})
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: save pc into disk
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: discarding pending addr without UUID: tcp://172.31.29.97:4567
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: discarding pending addr proto entry 0x5568e6fc6cd0
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: discarding pending addr without UUID: tcp://172.32.10.9:4567
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: discarding pending addr proto entry 0x5568e6fcf4d0
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: discarding pending addr without UUID: tcp://172.33.40.143:4567
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: discarding pending addr proto entry 0x5568e6fd7ce0
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: gcomm: connected
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Changing maximum packet size to 64500, resulting msg size: 32636
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Shifting CLOSED -> OPEN (TO: 0)
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Opened channel 'galera_cluster_01'
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 1
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: Waiting for SST to complete.
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: STATE_EXCHANGE: sent state UUID: cb299881-182a-11e8-9d9c-f6ef07f79521
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: STATE EXCHANGE: sent state msg: cb299881-182a-11e8-9d9c-f6ef07f79521
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: STATE EXCHANGE: got state msg: cb299881-182a-11e8-9d9c-f6ef07f79521 from 0 (dbc03)
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: Quorum results:
|
version = 4,
|
component = PRIMARY,
|
conf_id = 0,
|
members = 1/1 (joined/total),
|
act_id = 551339,
|
last_appl. = -1,
|
protocols = 0/7/3 (gcs/repl/appl),
|
group UUID = 4ee72403-e1e2-11e7-998a-73a130e7e5cb
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: Flow-control interval: [16, 16]
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: Trying to continue unpaused monitor
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: Restored state OPEN -> JOINED (551339)
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: Member 0.0 (dbc03) synced with group.
|
2018-02-22 23:47:51 139922778347264 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 551339)
|
2018-02-22 23:47:51 139925002704640 [Note] WSREP: New cluster view: global state: 4ee72403-e1e2-11e7-998a-73a130e7e5cb:551339, view# 1: Primary, number of nodes: 1, my index: 0, protocol version 3
|
2018-02-22 23:47:51 139925613557888 [Note] WSREP: SST complete, seqno: 551339
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: For Galera, using innodb_lock_schedule_algorithm=fcfs
|
2018-02-22 23:47:51 139925613557888 [Warning] InnoDB: Using innodb_file_format is deprecated and the parameter may be removed in future releases. See http://dev.mysql.com/doc/refman/5.7/en/innodb-file-format.html
|
2018-02-22 23:47:51 139925613557888 [Warning] InnoDB: Using innodb_file_format_max is deprecated and the parameter may be removed in future releases. See http://dev.mysql.com/doc/refman/5.7/en/innodb-file-format.html
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: Uses event mutexes
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: Compressed tables use zlib 1.2.7
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: Using Linux native AIO
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: Number of pools: 1
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: Using SSE2 crc32 instructions
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: Initializing buffer pool, total size = 20G, instances = 20, chunk size = 128M
|
2018-02-22 23:47:51 139925613557888 [Note] InnoDB: Completed initialization of buffer pool
|
2018-02-22 23:47:51 139922786739968 [Note] WSREP: (cb295870, 'tcp://0.0.0.0:4567') connection established to e2e8f67a tcp://172.31.24.27:4567
|
2018-02-22 23:47:52 139922786739968 [Note] WSREP: (cb295870, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers:
|
2018-02-22 23:47:52 139899263854336 [Note] InnoDB: If the mysqld execution user is authorized, page cleaner thread priority can be changed. See the man page of setpriority().
|
2018-02-22 23:47:52 139925613557888 [Note] InnoDB: Highest supported file format is Barracuda.
|
2018-02-22 23:47:52 139922786739968 [Note] WSREP: declaring e2e8f67a at tcp://172.31.24.27:4567 stable
|
And then the Galera provider received an exception and it died:
2018-02-22 23:47:52 139922786739968 [ERROR] WSREP: caught exception in PC, state dump to stderr follows:
|
pc::Proto{uuid=cb295870,start_prim=1,npvo=0,ignore_sb=0,ignore_quorum=0,state=1,last_sent_seq=3,checksum=0,instances=
|
cb295870,prim=1,un=0,last_seq=3,last_prim=view_id(PRIM,cb295870,1),to_seq=2,weight=1,segment=0
|
,state_msgs=
|
cb295870,pcmsg{ type=STATE, seq=0, flags= 0, node_map { cb295870,prim=1,un=0,last_seq=3,last_prim=view_id(PRIM,cb295870,1),to_seq=2,weight=1,segment=0
|
}}
|
,current_view=view(view_id(REG,cb295870,34) memb {
|
cb295870,0
|
e2e8f67a,0
|
} joined {
|
e2e8f67a,0
|
} left {
|
} partitioned {
|
}),pc_view=view(view_id(PRIM,cb295870,1) memb {
|
cb295870,0
|
} joined {
|
} left {
|
} partitioned {
|
}),mtu=32636}
|
2018-02-22 23:47:52 139922786739968 [Note] WSREP: {v=0,t=1,ut=255,o=4,s=0,sr=0,as=-1,f=4,src=e2e8f67a,srcvid=view_id(REG,cb295870,34),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=340547,nl=(
|
)
|
} 64
|
2018-02-22 23:47:52 139922786739968 [ERROR] WSREP: exception caused by message: {v=0,t=3,ut=255,o=1,s=0,sr=-1,as=0,f=4,src=e2e8f67a,srcvid=view_id(REG,cb295870,34),insvid=view_id(UNKNOWN,00000000,0),ru=00000000,r=[-1,-1],fs=340549,nl=(
|
)
|
}
|
state after handling message: evs::proto(evs::proto(cb295870, OPERATIONAL, view_id(REG,cb295870,34)), OPERATIONAL) {
|
current_view=view(view_id(REG,cb295870,34) memb {
|
cb295870,0
|
e2e8f67a,0
|
} joined {
|
} left {
|
} partitioned {
|
}),
|
input_map=evs::input_map: {aru_seq=0,safe_seq=0,node_index=node: {idx=0,range=[1,0],safe_seq=0} node: {idx=1,range=[1,0],safe_seq=0} },
|
fifo_seq=17,
|
last_sent=0,
|
known:
|
cb295870 at
|
{o=1,s=0,i=1,fs=-1,}
|
e2e8f67a at tcp://172.31.24.27:4567
|
{o=1,s=0,i=1,fs=340549,}
|
}2018-02-22 23:47:52 139922786739968 [ERROR] WSREP: exception from gcomm, backend must be restarted: cb295870 aborting due to conflicting prims: older overrides (FATAL)
|
at gcomm/src/pc_proto.cpp:handle_state():982
|
2018-02-22 23:47:52 139922786739968 [Note] WSREP: gcomm: terminating thread
|
2018-02-22 23:47:52 139922786739968 [Note] WSREP: gcomm: joining thread
|
2018-02-22 23:47:52 139922786739968 [Note] WSREP: gcomm: closing backend
|
2018-02-22 23:47:52 139922786739968 [Note] WSREP: Forced PC close
|
2018-02-22 23:47:52 139922786739968 [Warning] WSREP: discarding 1 messages from message index
|
2018-02-22 23:47:52 139922786739968 [Note] WSREP: gcomm: closed
|
2018-02-22 23:47:52 139922778347264 [ERROR] WSREP: gcs/src/gcs_core.cpp:core_handle_comp_msg():718: Malformed component message. Ignoring
|
But MariaDB and InnoDB kept going:
2018-02-22 23:47:52 139925613557888 [Note] InnoDB: 128 out of 128 rollback segments are active.
|
2018-02-22 23:47:52 139925613557888 [Note] InnoDB: Creating shared tablespace for temporary tables
|
2018-02-22 23:47:52 139925613557888 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
|
2018-02-22 23:47:52 139925613557888 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
|
2018-02-22 23:47:52 139925613557888 [Note] InnoDB: 5.7.20 started; log sequence number 6217279687
|
2018-02-22 23:47:52 139899550291712 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
|
180222 23:47:52 server_audit: MariaDB Audit Plugin version 1.4.3 STARTED.
|
2018-02-22 23:47:52 139925613557888 [Note] Server socket created on IP: '::'.
|
2018-02-22 23:47:53 139922720352000 [Note] Event Scheduler: scheduler thread started with id 10
|
2018-02-22 23:47:53 139925613557888 [Note] Reading of all Master_info entries succeded
|
2018-02-22 23:47:53 139925613557888 [Note] Added new Master_info '' to hash table
|
2018-02-22 23:47:53 139925613557888 [Note] /usr/sbin/mysqld: ready for connections.
|
Version: '10.2.12-MariaDB-log' socket: '/var/lib/mysql/mysql.sock' port: 3306 MariaDB Server
|
2018-02-22 23:47:53 139925002704640 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2018-02-22 23:47:53 139925002704640 [Note] WSREP: REPL Protocols: 7 (3, 2)
|
2018-02-22 23:47:53 139925002704640 [Note] WSREP: Assign initial position for certification: 551339, protocol version: 3
|
2018-02-22 23:47:53 139925036578560 [Note] WSREP: Service thread queue flushed.
|
2018-02-22 23:47:53 139925002704640 [Note] WSREP: GCache history reset: 4ee72403-e1e2-11e7-998a-73a130e7e5cb:0 -> 4ee72403-e1e2-11e7-998a-73a130e7e5cb:551339
|
2018-02-22 23:47:53 139925002704640 [Note] WSREP: Synchronized with group, ready for connections
|
2018-02-22 23:47:53 139925002704640 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
But all transactions failed due to an ambiguous "Software caused connection abort" error:
2018-02-22 23:48:04 139899236554496 [Warning] WSREP: Send action {(nil), 1082, TORDERED} returned -103 (Software caused connection abort)
|
2018-02-22 23:48:04 139899236554496 [Note] WSREP: cluster conflict due to certification failure for threads:
|
2018-02-22 23:48:04 139899236554496 [Note] WSREP: Victim thread:
|
THD: 30, mode: local, state: executing, conflict: cert failure, seqno: -1
|
SQL: COMMIT
|
2018-02-22 23:48:13 139899236251392 [Warning] WSREP: Send action {(nil), 1133, TORDERED} returned -103 (Software caused connection abort)
|
2018-02-22 23:48:13 139899236251392 [Note] WSREP: cluster conflict due to certification failure for threads:
|
2018-02-22 23:48:13 139899236251392 [Note] WSREP: Victim thread:
|
THD: 31, mode: local, state: executing, conflict: cert failure, seqno: -1
|
SQL: COMMIT
|
2018-02-22 23:48:13 139899236251392 [Warning] WSREP: Send action {(nil), 624, TORDERED} returned -103 (Software caused connection abort)
|
2018-02-22 23:48:13 139899236251392 [Note] WSREP: cluster conflict due to certification failure for threads:
|
2018-02-22 23:48:13 139899236251392 [Note] WSREP: Victim thread:
|
THD: 31, mode: local, state: executing, conflict: cert failure, seqno: -1
|
SQL: COMMIT
|
The user tried to shutdown mysqld, but it hung during the shutdown process, so mysqld had to be killed.
It looks like the problem was caused by Galera receiving unexpected messages over the network. Shouldn't the whole server have been terminated in this case, instead of just the Galera threads?
Attachments
Issue Links
- relates to
-
MDEV-14971 Mariadb stops working when second galera nodes joins
-
- Closed
-
- links to