Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.6.15
-
None
-
None
-
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS"
Description
We observed following warning message WSREP: gcs_caused() returned -103 (Software caused connection abort) on one of the nodes in 3 node galera cluster, after which that node wasnt able to join back. We had to restart it manually to add it back to the cluster.
Logs-
Node - 1 (flx11)
2024-07-16 6:55:13 14 [Note] WSREP: MDL conflict db=db_01 table=GD_Change ticket=3 solved by abort
|
2024-07-16 6:55:14 2 [Note] WSREP: MDL BF-BF conflict
|
schema: db_01
|
request: (2 seqno 15519555 wsrep (high priority, exec, executing) cmd 0 161 UPDATE `GD_Change` SET updated_at='2024-07-16 06:55:14.764867', gd_change_id=20188 WHERE `GD_Change`.id = 1R?f)
|
granted: (9 seqno 15519554 wsrep (toi, exec, committed) cmd 0 9 DROP TABLE `GD_Config`)
|
2024-07-16 6:55:14 2 [ERROR] Aborting
|
2024-07-16 6:55:58 116470 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
|
2024-07-16 6:56:01 116472 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
|
2024-07-16 6:56:04 116474 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
|
2024-07-16 6:56:17 116479 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
|
2024-07-16 6:56:19 116480 [Warning] WSREP: gcs_caused() returned -103 (Software caused connection abort)
|
Node - 2 (flx01)
2024-07-16 6:55:14 127263 [Note] WSREP: MDL conflict db=db_01 table=GD_Config ticket=3 solved by abort
|
2024-07-16 6:55:14 127263 [Note] WSREP: MDL conflict db=db_01 table=GD_Config ticket=3 solved by abort
|
2024-07-16 6:55:14 0 [Note] WSREP: declaring ad6be48b-9120 at ssl://172.27.97.135:4567 stable
|
2024-07-16 6:55:14 0 [Note] WSREP: forgetting 8fbd9334-8c5b (ssl://172.27.164.171:4567)
|
2024-07-16 6:55:14 0 [Note] WSREP: Node 42b71455-b540 state prim
|
2024-07-16 6:55:14 0 [Note] WSREP: view(view_id(PRIM,42b71455-b540,1488) memb {
|
42b71455-b540,0
|
ad6be48b-9120,0
|
} joined {
|
} left {
|
} partitioned {
|
8fbd9334-8c5b,0
|
})
|
2024-07-16 6:55:14 0 [Note] WSREP: save pc into disk
|
2024-07-16 6:55:14 0 [Note] WSREP: forgetting 8fbd9334-8c5b (ssl://172.27.164.171:4567)
|
2024-07-16 6:55:14 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
|
2024-07-16 6:55:14 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: 5aa71c0c-4340-11ef-ad1c-9ed910f5095e
|
2024-07-16 6:55:14 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 5aa71c0c-4340-11ef-ad1c-9ed910f5095e
|
2024-07-16 6:55:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 5aa71c0c-4340-11ef-ad1c-9ed910f5095e from 0 (flx01)
|
2024-07-16 6:55:14 0 [Note] WSREP: STATE EXCHANGE: got state msg: 5aa71c0c-4340-11ef-ad1c-9ed910f5095e from 1 (garb)
|
2024-07-16 6:55:14 0 [Note] WSREP: 'garb' demoted SYNCED->PRIMARY due to gap in history: 15398793 - 15519556
|
2024-07-16 6:55:14 0 [Note] WSREP: Quorum results:
|
version = 6,
|
component = PRIMARY,
|
conf_id = 18,
|
members = 1/2 (joined/total),
|
act_id = 15519556,
|
last_appl. = 15519482,
|
protocols = 2/10/4 (gcs/repl/appl),
|
vote policy= 0,
|
group UUID = 401d046b-ebcd-11ec-9284-2e763fda7f1a
|
2024-07-16 6:55:14 0 [Note] WSREP: Flow-control interval: [424, 424]
|
2024-07-16 6:55:14 16 [Note] WSREP: ####### processing CC 15519557, local, ordered
|
2024-07-16 6:55:14 0 [Note] WSREP: Member 1.0 (garb) requested state transfer from '*any*'. Selected 0.0 (flx01)(SYNCED) as donor.
|
2024-07-16 6:55:14 0 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 15519557)
|
2024-07-16 6:55:14 0 [Note] WSREP: 1.0 (garb): State transfer from 0.0 (flx01) complete.
|
2024-07-16 6:55:14 0 [Note] WSREP: Member 1.0 (garb) synced with group.
|
2024-07-16 6:55:14 16 [Note] WSREP: ####### My UUID: 42b71455-3906-11ef-b540-2aac76f44406
|
2024-07-16 6:55:14 16 [Note] WSREP: Skipping cert index reset
|
2024-07-16 6:55:14 16 [Note] WSREP: REPL Protocols: 10 (5)
|
2024-07-16 6:55:14 16 [Note] WSREP: ####### Adjusting cert position: 15519556 -> 15519557
|
2024-07-16 6:55:14 0 [Note] WSREP: Service thread queue flushed.
|
2024-07-16 6:55:15 16 [Note] WSREP: ================================================
|
View:
|
id: 401d046b-ebcd-11ec-9284-2e763fda7f1a:15519557
|
status: primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(2):
|
0: 42b71455-3906-11ef-b540-2aac76f44406, flx01.bos01.corp.akama
|
1: ad6be48b-388e-11ef-9120-ff0c187ecadb, garb
|
=================================================
|
2024-07-16 6:55:15 16 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2024-07-16 6:55:15 16 [Note] WSREP: Lowest cert index boundary for CC from group: 15519483
|
2024-07-16 6:55:15 16 [Note] WSREP: Min available from gcache for CC from group: 15282316
|
2024-07-16 6:55:15 16 [Note] WSREP: Detected STR version: 0, req_len: 9, req: trivial
|
2024-07-16 6:55:15 0 [Note] WSREP: 0.0 (flx01): State transfer to 1.0 (garb) complete.
|
2024-07-16 6:55:15 0 [Note] WSREP: Shifting DONOR/DESYNCED -> JOINED (TO: 15519558)
|
2024-07-16 6:55:15 0 [Note] WSREP: Processing event queue:... -nan% (0/0 events) complete.
|
2024-07-16 6:55:15 0 [Note] WSREP: Member 0.0 (flx01) synced with group.
|
2024-07-16 6:55:15 0 [Note] WSREP: Processing event queue:...100.0% (1/1 events) complete.
|
2024-07-16 6:55:15 0 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 15519558)
|
2024-07-16 6:55:15 13 [Note] WSREP: Server flx01 synced with group
|
2024-07-16 6:55:20 0 [Note] WSREP: cleaning up 8fbd9334-8c5b (ssl://172.27.164.171:4567)
|
2024-07-16 6:56:03 127271 [Note] WSREP: MDL conflict db=db_01 table=GD_Config ticket=3 solved by abort
|
Some more observations from Node 1 -
root@flx11:~# ps -ef | grep mysql
|
root 3075 2583 0 09:36 pts/0 00:00:00 grep --color=auto mysql
|
mysql 9511 1 0 Jul03 ? 00:21:26 /usr/sbin/mariadbd --wsrep_start_position=401d046b-ebcd-11ec-9284-2e763fda7f1a:15262135,0-1-12403509
|
root@flx11:~#
|
|
root@flx11:~# mysql -uroot -p'XXXX' -A --protocol=TCP -P3308 -e "show processlist; "
|
+--------+------------------+--------------------------------------------+------------------+---------+---------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| Id | User | Host | db | Command | Time | State | Info | Progress |
|
+--------+------------------+--------------------------------------------+------------------+---------+---------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
|
| 1 | system user | | NULL | Sleep | 1132171 | wsrep aborter idle | NULL | 0.000 |
|
| 2 | system user | | NULL | Sleep | 9557 | Opening tables | UPDATE `GD_Change` SET updated_at='2024-07-16 06:55:14.764867', gd_change_id=20188 WHERE ` | 0.000 |
|
| 6 | system user | | NULL | Sleep | 9557 | After apply log event | NULL | 0.000 |
|
| 7 | system user | | NULL | Sleep | 9557 | wsrep applier committed | NULL | 0.000 |
|
| 8 | system user | | NULL | Sleep | 9557 | After apply log event | NULL | 0.000 |
|
| 10 | system user | | NULL | Sleep | 9557 | wsrep applier committed | NULL | 0.000 |
|
| 9 | system user | | db_01 | Sleep | 9557 | Commit implicit | DROP TABLE `GD_Config` | 0.000 |
|
| 13 | system user | | NULL | Sleep | 9557 | After apply log event | NULL | 0.000 |
|
| 14 | system user | | NULL | Sleep | 9557 | After apply log event | NULL | 0.000 |
|
| 116468 | _sentinel | flx11:34148 | NULL | Sleep | 20 | | NULL | 0.000 |
|
| 117153 | db_02 | 198.19.18.88:43795 | db_02 | Sleep | 202 | | NULL | 0.000 |
|
| 117400 | root | localhost:36090 | NULL | Query | 0 | starting | show processlist | 0.000 |
|
+--------+------------------+--------------------------------------------+------------------+---------+---------+-------------------------+------------------------------------------------------------------------------------------------------+----------+
|
root@flx11:~#
|
|
|
root@flx11:~# mysql -uroot -p'XXXX' -A --protocol=TCP -P3308 -e "show global status like 'wsrep%'"
|
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
|
| Variable_name | Value |
|
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
|
| wsrep_local_state_uuid | 401d046b-ebcd-11ec-9284-2e763fda7f1a |
|
| wsrep_protocol_version | 10 |
|
| wsrep_last_committed | 15519553 |
|
| wsrep_replicated | 120763 |
|
| wsrep_replicated_bytes | 139136808 |
|
| wsrep_repl_keys | 1413208 |
|
| wsrep_repl_keys_bytes | 14204600 |
|
| wsrep_repl_data_bytes | 116989796 |
|
| wsrep_repl_other_bytes | 0 |
|
| wsrep_received | 139487 |
|
| wsrep_received_bytes | 150254984 |
|
| wsrep_local_commits | 101989 |
|
| wsrep_local_cert_failures | 2 |
|
| wsrep_local_replays | 9 |
|
| wsrep_local_send_queue | 0 |
|
| wsrep_local_send_queue_max | 2 |
|
| wsrep_local_send_queue_min | 0 |
|
| wsrep_local_send_queue_avg | 0.000675643 |
|
| wsrep_local_recv_queue | 1 |
|
| wsrep_local_recv_queue_max | 5 |
|
| wsrep_local_recv_queue_min | 0 |
|
| wsrep_local_recv_queue_avg | 0.00232995 |
|
| wsrep_local_cached_downto | 15283108 |
|
| wsrep_flow_control_paused_ns | 0 |
|
| wsrep_flow_control_paused | 0 |
|
| wsrep_flow_control_sent | 0 |
|
| wsrep_flow_control_recv | 0 |
|
| wsrep_flow_control_active | false |
|
| wsrep_flow_control_requested | false |
|
| wsrep_cert_deps_distance | 12.5772 |
|
| wsrep_apply_oooe | 0.122729 |
|
| wsrep_apply_oool | 0.00144511 |
|
| wsrep_apply_window | 1.12939 |
|
| wsrep_apply_waits | 384 |
|
| wsrep_commit_oooe | 0 |
|
| wsrep_commit_oool | 0 |
|
| wsrep_commit_window | 1.03898 |
|
| wsrep_local_state | 4 |
|
| wsrep_local_state_comment | Synced |
|
| wsrep_cert_index_size | 46 |
|
| wsrep_causal_reads | 194551 |
|
| wsrep_cert_interval | 771.058 |
|
| wsrep_open_transactions | 0 |
|
| wsrep_open_connections | 0 |
|
| wsrep_incoming_addresses | 172.27.97.134:3308,172.27.164.171:3308, |
|
| wsrep_applier_thread_count | 8 |
|
| wsrep_cluster_capabilities | |
|
| wsrep_cluster_conf_id | 18 |
|
| wsrep_cluster_size | 3 |
|
| wsrep_cluster_state_uuid | 401d046b-ebcd-11ec-9284-2e763fda7f1a |
|
| wsrep_cluster_status | non-Primary |
|
| wsrep_connected | ON |
|
| wsrep_local_bf_aborts | 54 |
|
| wsrep_local_index | 1 |
|
| wsrep_provider_capabilities | :MULTI_MASTER:CERTIFICATION:PARALLEL_APPLYING:TRX_REPLAY:ISOLATION:PAUSE:CAUSAL_READS:INCREMENTAL_WRITESET:UNORDERED:PREORDERED:STREAMING:NBO: |
|
| wsrep_provider_name | Galera |
|
| wsrep_provider_vendor | Codership Oy info@codership.com |
|
| wsrep_provider_version | 26.4.14(r06a0c285) |
|
| wsrep_ready | OFF |
|
| wsrep_rollbacker_thread_count | 1 |
|
| wsrep_thread_count | 9 |
|
+-------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------+
|
Can you please help us understand what has happened here.
thank you