We have a Cluster consisting of 3 nodes and 1 arbitrator (yes I know, this is bad!). The Cluster is segmented into 2 different segments (gmcast.segment=n). In one segment we have 2 nodes, in the other segment we have the other node and the arbitrator.
When I restart the node in the segment where the arbitrator resides with forcing SST (rm grastate.dat) the SST will not happen:
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 0 (Oli)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 1 (christopher)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 2 (klaus)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 3 (garb)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Quorum results:
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: version = 6,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: component = PRIMARY,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: conf_id = 11,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: members = 3/4 (joined/total),
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: act_id = 2549250,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: last_appl. = 2549239,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: protocols = 2/10/4 (gcs/repl/appl),
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: vote policy= 0,
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: group UUID = ff1b0394-82fc-11ee-9916-c27ee6696a8b
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Flow-control interval: [32, 32]
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2549251)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### processing CC 2549251, local, ordered
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Process first view: ff1b0394-82fc-11ee-9916-c27ee6696a8b my uuid: 182277f3-83d4-11ee-bb8f-1f581c4c5737
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server Oli connected to cluster at position ff1b0394-82fc-11ee-9916-c27ee6696a8b:2549251 with ID 182277f3-83d4-11ee-bb8f-1f581c4c5737
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server status change disconnected -> connected
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### My UUID: 182277f3-83d4-11ee-bb8f-1f581c4c5737
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Service thread queue flushed.
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: State transfer required:
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: Group state: ff1b0394-82fc-11ee-9916-c27ee6696a8b:2549251
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: Local state: 00000000-0000-0000-0000-000000000000:-1
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server status change connected -> joiner
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Joiner monitor thread started to monitor
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.200.41' --datadir '/var/lib/mysql/' --parent 1778322 --progress 0 --mysql>
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778341]: WSREP_SST: [INFO] rsync SST started on joiner (20231115 17:29:08.160)
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de rsyncd[1778466]: rsyncd version 3.1.3 starting, listening on port 4444
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 2549251, STRv: 3
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: IST receiver addr using tcp://192.168.200.41:4568
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Prepared IST receiver for 0-2549251, listening at: tcp://192.168.200.41:4568
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s)
Nov 15 17:29:09 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:09 0 [Note] WSREP: (182277f3-bb8f, 'tcp://0.0.0.0:4567') turning message relay requesting off
Nov 15 17:29:09 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:09 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
Nov 15 17:29:10 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:10 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
Nov 15 17:29:11 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:11 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
...
and does not recover or fail but hangs endlessly (> 10 min).
I would expect, that the Cluster finds out, that the garbd cannot not server for SST and elect another node from the other segment as a donor.
{"report":{"fcp":1437.1999998092651,"ttfb":654.8000001907349,"pageVisibility":"visible","entityId":126405,"key":"jira.project.issue.view-issue","isInitial":true,"threshold":1000,"elementTimings":{},"userDeviceMemory":8,"userDeviceProcessors":64,"apdex":0.5,"journeyId":"04aa999e-21f8-46a0-aaca-a4aa7b99b379","navigationType":0,"readyForUser":1509.3999996185303,"redirectCount":0,"resourceLoadedEnd":1883.8999996185303,"resourceLoadedStart":660.3000001907349,"resourceTiming":[{"duration":352.5999994277954,"initiatorType":"link","name":"https://jira.mariadb.org/s/2c21342762a6a02add1c328bed317ffd-CDN/lu2bv2/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/css/_super/batch.css","startTime":660.3000001907349,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":660.3000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1012.8999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":352.69999980926514,"initiatorType":"link","name":"https://jira.mariadb.org/s/7ebd35e77e471bc30ff0eba799ebc151-CDN/lu2bv2/820016/12ta74/2380add21a9a1006587582385952de73/_/download/contextbatch/css/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.css?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&slack-enabled=true","startTime":660.5,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":660.5,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1013.1999998092651,"responseStart":0,"secureConnectionStart":0},{"duration":363.29999923706055,"initiatorType":"script","name":"https://jira.mariadb.org/s/e9b27a47da5fb0f74a35acd57e9847fb-CDN/lu2bv2/820016/12ta74/0a8bac35585be7fc6c9cc5a0464cd4cf/_/download/contextbatch/js/_super/batch.js?locale=en","startTime":660.8000001907349,"connectEnd":660.8000001907349,"connectStart":660.8000001907349,"domainLookupEnd":660.8000001907349,"domainLookupStart":660.8000001907349,"fetchStart":660.8000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":660.8000001907349,"responseEnd":1024.0999994277954,"responseStart":1024.0999994277954,"secureConnectionStart":660.8000001907349},{"duration":389.6000003814697,"initiatorType":"script","name":"https://jira.mariadb.org/s/c32eb0da7ad9831253f8397e6cc26afd-CDN/lu2bv2/820016/12ta74/2380add21a9a1006587582385952de73/_/download/contextbatch/js/jira.browse.project,project.issue.navigator,jira.view.issue,jira.general,jira.global,atl.general,-_super/batch.js?agile_global_admin_condition=true&jag=true&jira.create.linked.issue=true&locale=en&slack-enabled=true","startTime":660.8999996185303,"connectEnd":660.8999996185303,"connectStart":660.8999996185303,"domainLookupEnd":660.8999996185303,"domainLookupStart":660.8999996185303,"fetchStart":660.8999996185303,"redirectEnd":0,"redirectStart":0,"requestStart":660.8999996185303,"responseEnd":1050.5,"responseStart":1050.5,"secureConnectionStart":660.8999996185303},{"duration":393.5,"initiatorType":"script","name":"https://jira.mariadb.org/s/bc0bcb146314416123c992714ee00ff7-CDN/lu2bv2/820016/12ta74/c92c0caa9a024ae85b0ebdbed7fb4bd7/_/download/contextbatch/js/atl.global,-_super/batch.js?locale=en","startTime":661,"connectEnd":661,"connectStart":661,"domainLookupEnd":661,"domainLookupStart":661,"fetchStart":661,"redirectEnd":0,"redirectStart":0,"requestStart":661,"responseEnd":1054.5,"responseStart":1054.5,"secureConnectionStart":661},{"duration":393.8999996185303,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-en/jira.webresources:calendar-en.js","startTime":661.1999998092651,"connectEnd":661.1999998092651,"connectStart":661.1999998092651,"domainLookupEnd":661.1999998092651,"domainLookupStart":661.1999998092651,"fetchStart":661.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":661.1999998092651,"responseEnd":1055.0999994277954,"responseStart":1055.0999994277954,"secureConnectionStart":661.1999998092651},{"duration":394.0999994277954,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:calendar-localisation-moment/jira.webresources:calendar-localisation-moment.js","startTime":661.5,"connectEnd":661.5,"connectStart":661.5,"domainLookupEnd":661.5,"domainLookupStart":661.5,"fetchStart":661.5,"redirectEnd":0,"redirectStart":0,"requestStart":661.5,"responseEnd":1055.5999994277954,"responseStart":1055.5999994277954,"secureConnectionStart":661.5},{"duration":395.30000019073486,"initiatorType":"link","name":"https://jira.mariadb.org/s/b04b06a02d1959df322d9cded3aeecc1-CDN/lu2bv2/820016/12ta74/a2ff6aa845ffc9a1d22fe23d9ee791fc/_/download/contextbatch/css/jira.global.look-and-feel,-_super/batch.css","startTime":661.5999994277954,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":661.5999994277954,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1056.8999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":394.5,"initiatorType":"script","name":"https://jira.mariadb.org/rest/api/1.0/shortcuts/820016/47140b6e0a9bc2e4913da06536125810/shortcuts.js?context=issuenavigation&context=issueaction","startTime":661.8000001907349,"connectEnd":661.8000001907349,"connectStart":661.8000001907349,"domainLookupEnd":661.8000001907349,"domainLookupStart":661.8000001907349,"fetchStart":661.8000001907349,"redirectEnd":0,"redirectStart":0,"requestStart":661.8000001907349,"responseEnd":1056.3000001907349,"responseStart":1056.3000001907349,"secureConnectionStart":661.8000001907349},{"duration":394.8999996185303,"initiatorType":"link","name":"https://jira.mariadb.org/s/3ac36323ba5e4eb0af2aa7ac7211b4bb-CDN/lu2bv2/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/css/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.css?jira.create.linked.issue=true","startTime":662,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":662,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1056.8999996185303,"responseStart":0,"secureConnectionStart":0},{"duration":394.80000019073486,"initiatorType":"script","name":"https://jira.mariadb.org/s/719848dd97ebe0663199f49a3936487a-CDN/lu2bv2/820016/12ta74/d176f0986478cc64f24226b3d20c140d/_/download/contextbatch/js/com.atlassian.jira.projects.sidebar.init,-_super,-project.issue.navigator,-jira.view.issue/batch.js?jira.create.linked.issue=true&locale=en","startTime":662.1999998092651,"connectEnd":662.1999998092651,"connectStart":662.1999998092651,"domainLookupEnd":662.1999998092651,"domainLookupStart":662.1999998092651,"fetchStart":662.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":662.1999998092651,"responseEnd":1057,"responseStart":1057,"secureConnectionStart":662.1999998092651},{"duration":1220.3000001907349,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-js/jira.webresources:bigpipe-js.js","startTime":663,"connectEnd":663,"connectStart":663,"domainLookupEnd":663,"domainLookupStart":663,"fetchStart":663,"redirectEnd":0,"redirectStart":0,"requestStart":663,"responseEnd":1883.3000001907349,"responseStart":1883.3000001907349,"secureConnectionStart":663},{"duration":1220.8999996185303,"initiatorType":"script","name":"https://jira.mariadb.org/s/d41d8cd98f00b204e9800998ecf8427e-CDN/lu2bv2/820016/12ta74/1.0/_/download/batch/jira.webresources:bigpipe-init/jira.webresources:bigpipe-init.js","startTime":663,"connectEnd":663,"connectStart":663,"domainLookupEnd":663,"domainLookupStart":663,"fetchStart":663,"redirectEnd":0,"redirectStart":0,"requestStart":663,"responseEnd":1883.8999996185303,"responseStart":1883.8000001907349,"secureConnectionStart":663},{"duration":634.3000001907349,"initiatorType":"xmlhttprequest","name":"https://jira.mariadb.org/rest/webResources/1.0/resources","startTime":1248.5999994277954,"connectEnd":1248.5999994277954,"connectStart":1248.5999994277954,"domainLookupEnd":1248.5999994277954,"domainLookupStart":1248.5999994277954,"fetchStart":1248.5999994277954,"redirectEnd":0,"redirectStart":0,"requestStart":1248.5999994277954,"responseEnd":1882.8999996185303,"responseStart":1882.8999996185303,"secureConnectionStart":1248.5999994277954},{"duration":461.30000019073486,"initiatorType":"script","name":"https://www.google-analytics.com/analytics.js","startTime":1430.1999998092651,"connectEnd":0,"connectStart":0,"domainLookupEnd":0,"domainLookupStart":0,"fetchStart":1430.1999998092651,"redirectEnd":0,"redirectStart":0,"requestStart":0,"responseEnd":1891.5,"responseStart":0,"secureConnectionStart":0}],"fetchStart":1,"domainLookupStart":1,"domainLookupEnd":1,"connectStart":1,"connectEnd":1,"requestStart":441,"responseStart":655,"responseEnd":658,"domLoading":659,"domInteractive":1912,"domContentLoadedEventStart":1912,"domContentLoadedEventEnd":1951,"domComplete":2891,"loadEventStart":2891,"loadEventEnd":2892,"userAgent":"Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; ClaudeBot/1.0; +claudebot@anthropic.com)","marks":[{"name":"bigPipe.sidebar-id.start","time":1893.6999998092651},{"name":"bigPipe.sidebar-id.end","time":1894.5},{"name":"bigPipe.activity-panel-pipe-id.start","time":1894.6999998092651},{"name":"bigPipe.activity-panel-pipe-id.end","time":1897},{"name":"activityTabFullyLoaded","time":1956.8999996185303}],"measures":[],"correlationId":"677e5652897661","effectiveType":"4g","downlink":9.6,"rtt":0,"serverDuration":160,"dbReadsTimeInMs":13,"dbConnsTimeInMs":22,"applicationHash":"9d11dbea5f4be3d4cc21f03a88dd11d8c8687422","experiments":[]}}