Details
Description
Message from the customer:
Description: Hello ,
|
One of our heavy loaded 3 Nodes Galera Clusters fall in Inconsistency issue of 2 nodes from 3.
|
|
MariaDB CS 10.6.16 , Galera Provider 26.4.16.
|
|
The nodes are with following names and roles:
|
|
db3 - Application Master Node for DMLs
|
db4 - Wsrep replicator for 99.99% of the queries, despite Aggregation DML which are "offloaded" on it
|
db5 - Wsrep replicator and Async Replication Master
|
|
All application traffic goes to node db3.
|
|
We use node db4 to offload data aggregation functions, which are doing DMLs on dedicated tables, no other functionality is doing changes on those Aggr tables.
|
|
Today we start our standard procedure to perform Live Alter on one of Aggr tables. Live alter script was executed on db3 instead of db4. 10 minutes after Live Alter was started the db4 node become Inconsistent.
|
|
Live Alter script continue to work on db3.
|
|
Background Aggr functions start using db5 for doing aggregation DML. 10 minutes after that db5 become's also Inconsistent.
|
|
Live Alter is used to change DDL on big tables without cluster hang. It is working with triggers and INSERT SELECT
|
Judging by the customer's logs, we are dealing with the following failure:
From db4:
|
|
2024-02-16 12:05:30 6 [ERROR] WSREP: Failed to apply write set: gtid: a8f0f00f-842d-11eb-b5a7-7a1763e4c1e6:45662493750 server_id: 269fd913-9633-11ee-9629-87d0be11dc45 client_id: 18446744073709551615 trx_id: 48609383118 flags: 20 (rollback | pa_unsafe)
|
...
|
|
From db5:
|
|
2024-02-16 12:05:30 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
|
2024-02-16 12:05:30 0 [Note] WSREP: STATE_EXCHANGE: sent state UUID: eab3a4dc-ccb2-11ee-a3ba-c612c5ccc31d
|
2024-02-16 12:05:30 0 [Note] WSREP: STATE EXCHANGE: sent state msg: eab3a4dc-ccb2-11ee-a3ba-c612c5ccc31d
|
2024-02-16 12:05:30 0 [Note] WSREP: STATE EXCHANGE: got state msg: eab3a4dc-ccb2-11ee-a3ba-c612c5ccc31d from 0 (fx112_db5)
|
2024-02-16 12:05:30 0 [Note] WSREP: STATE EXCHANGE: got state msg: eab3a4dc-ccb2-11ee-a3ba-c612c5ccc31d from 1 (fx112_db3)
|
2024-02-16 12:05:30 0 [Note] WSREP: Quorum results:
|
version = 6,
|
component = PRIMARY,
|
conf_id = 281,
|
members = 2/2 (joined/total),
|
act_id = 45662493751,
|
last_appl. = 45662493731,
|
protocols = 2/10/4 (gcs/repl/appl),
|
vote policy= 0,
|
group UUID = a8f0f00f-842d-11eb-b5a7-7a1763e4c1e6
|
2024-02-16 12:05:30 0 [Note] WSREP: Flow-control interval: [6000, 6000]
|
2024-02-16 12:05:30 21 [Note] WSREP: ####### processing CC 45662493752, local, ordered
|
2024-02-16 12:05:30 21 [Note] WSREP: ####### My UUID: 184a029b-9622-11ee-8c61-6f91e19335c1
|
2024-02-16 12:05:30 21 [Note] WSREP: Skipping cert index reset
|
2024-02-16 12:05:30 21 [Note] WSREP: REPL Protocols: 10 (5)
|
2024-02-16 12:05:30 21 [Note] WSREP: ####### Adjusting cert position: 45662493751 -> 45662493752
|
2024-02-16 12:05:30 0 [Note] WSREP: Service thread queue flushed.
|
2024-02-16 12:05:30 21 [Note] WSREP: ================================================
|
View:
|
id: a8f0f00f-842d-11eb-b5a7-7a1763e4c1e6:45662493752
|
status: primary
|
protocol_version: 4
|
capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
|
final: no
|
own_index: 0
|
members(2):
|
0: 184a029b-9622-11ee-8c61-6f91e19335c1, fx112_db5
|
1: b8c19c60-962e-11ee-a025-af00b051f68f, fx112_db3
|
=================================================
|
2024-02-16 12:05:30 21 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
|
2024-02-16 12:05:30 21 [Note] WSREP: Lowest cert index boundary for CC from group: 45662493732
|
2024-02-16 12:05:30 21 [Note] WSREP: Min available from gcache for CC from group: 45614618803
|
2024-02-16 12:05:36 0 [Note] WSREP: cleaning up 269fd913-9629 (tcp://xxx.xxx.xxx.xxx:yyyy)
|
2024-02-16 12:15:46 35 [ERROR] WSREP: Failed to apply write set: gtid: a8f0f00f-842d-11eb-b5a7-7a1763e4c1e6:45662930080 server_id: 184a029b-9622-11ee-8c61-6f91e19335c1 client_id: 18446744073709551615 trx_id: 81448619022 flags: 20 (rollback | pa_unsafe)
|