Details
-
Bug
-
Status: Open (View Workflow)
-
Major
-
Resolution: Unresolved
-
10.11.5, 11.1.2
-
None
-
Debian 12, Rocky Linus 8, n.a.
Description
We have a Cluster consisting of 3 nodes and 1 arbitrator (yes I know, this is bad!). The Cluster is segmented into 2 different segments (gmcast.segment=n). In one segment we have 2 nodes, in the other segment we have the other node and the arbitrator.
When I restart the node in the segment where the arbitrator resides with forcing SST (rm grastate.dat) the SST will not happen:
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 0 (Oli)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 1 (christopher)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 2 (klaus)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 3 (garb)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Quorum results:
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: version = 6,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: component = PRIMARY,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: conf_id = 11,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: members = 3/4 (joined/total),
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: act_id = 2549250,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: last_appl. = 2549239,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: protocols = 2/10/4 (gcs/repl/appl),
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: vote policy= 0,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: group UUID = ff1b0394-82fc-11ee-9916-c27ee6696a8b
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Flow-control interval: [32, 32]
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2549251)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### processing CC 2549251, local, ordered
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Process first view: ff1b0394-82fc-11ee-9916-c27ee6696a8b my uuid: 182277f3-83d4-11ee-bb8f-1f581c4c5737
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server Oli connected to cluster at position ff1b0394-82fc-11ee-9916-c27ee6696a8b:2549251 with ID 182277f3-83d4-11ee-bb8f-1f581c4c5737
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server status change disconnected -> connected
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### My UUID: 182277f3-83d4-11ee-bb8f-1f581c4c5737
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Service thread queue flushed.
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: State transfer required:
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: Group state: ff1b0394-82fc-11ee-9916-c27ee6696a8b:2549251
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: Local state: 00000000-0000-0000-0000-000000000000:-1
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server status change connected -> joiner
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Joiner monitor thread started to monitor
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.200.41' --datadir '/var/lib/mysql/' --parent 1778322 --progress 0 --mysql>
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778341]: WSREP_SST: [INFO] rsync SST started on joiner (20231115 17:29:08.160)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de rsyncd[1778466]: rsyncd version 3.1.3 starting, listening on port 4444
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 2549251, STRv: 3
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: IST receiver addr using tcp://192.168.200.41:4568
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Prepared IST receiver for 0-2549251, listening at: tcp://192.168.200.41:4568
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s)
Nov 15 17:29:09 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:09 0 [Note] WSREP: (182277f3-bb8f, 'tcp://0.0.0.0:4567') turning message relay requesting off
Nov 15 17:29:09 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:09 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
Nov 15 17:29:10 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:10 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
Nov 15 17:29:11 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:11 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
...
and does not recover or fail but hangs endlessly (> 10 min).
I would expect, that the Cluster finds out, that the garbd cannot not server for SST and elect another node from the other segment as a donor.