Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32812

SST fails if node joins to cluster segment where only 1 arbitrator resides

    XMLWordPrintable

Details

    Description

      We have a Cluster consisting of 3 nodes and 1 arbitrator (yes I know, this is bad!). The Cluster is segmented into 2 different segments (gmcast.segment=n). In one segment we have 2 nodes, in the other segment we have the other node and the arbitrator.
      When I restart the node in the segment where the arbitrator resides with forcing SST (rm grastate.dat) the SST will not happen:

      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: sent state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 0 (Oli)
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 1 (christopher)
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 2 (klaus)
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: STATE EXCHANGE: got state msg: 19a0632a-83d4-11ee-87cd-de2bb052f838 from 3 (garb)
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Quorum results:
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: version = 6,
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: component = PRIMARY,
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: conf_id = 11,
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: members = 3/4 (joined/total),
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: act_id = 2549250,
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: last_appl. = 2549239,
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: protocols = 2/10/4 (gcs/repl/appl),
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: vote policy= 0,
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: group UUID = ff1b0394-82fc-11ee-9916-c27ee6696a8b
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Flow-control interval: [32, 32]
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Shifting OPEN -> PRIMARY (TO: 2549251)
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### processing CC 2549251, local, ordered
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Process first view: ff1b0394-82fc-11ee-9916-c27ee6696a8b my uuid: 182277f3-83d4-11ee-bb8f-1f581c4c5737
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server Oli connected to cluster at position ff1b0394-82fc-11ee-9916-c27ee6696a8b:2549251 with ID 182277f3-83d4-11ee-bb8f-1f581c4c5737
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server status change disconnected -> connected
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### My UUID: 182277f3-83d4-11ee-bb8f-1f581c4c5737
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Cert index reset to 00000000-0000-0000-0000-000000000000:-1 (proto: 10), state transfer needed: yes
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Service thread queue flushed.
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### Assign initial position for certification: 00000000-0000-0000-0000-000000000000:-1, protocol version: -1
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: State transfer required:
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: Group state: ff1b0394-82fc-11ee-9916-c27ee6696a8b:2549251
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: Local state: 00000000-0000-0000-0000-000000000000:-1
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Server status change connected -> joiner
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Joiner monitor thread started to monitor
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Note] WSREP: Running: 'wsrep_sst_rsync --role 'joiner' --address '192.168.200.41' --datadir '/var/lib/mysql/' --parent 1778322 --progress 0 --mysql>
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778341]: WSREP_SST: [INFO] rsync SST started on joiner (20231115 17:29:08.160)
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de rsyncd[1778466]: rsyncd version 3.1.3 starting, listening on port 4444
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: ####### IST uuid:00000000-0000-0000-0000-000000000000 f: 0, l: 2549251, STRv: 3
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: IST receiver addr using tcp://192.168.200.41:4568
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Prepared IST receiver for 0-2549251, listening at: tcp://192.168.200.41:4568
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
      Nov 15 17:29:08 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:08 2 [Note] WSREP: Requesting state transfer failed: -11(Resource temporarily unavailable). Will keep retrying every 1 second(s)
      Nov 15 17:29:09 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:09 0 [Note] WSREP: (182277f3-bb8f, 'tcp://0.0.0.0:4567') turning message relay requesting off
      Nov 15 17:29:09 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:09 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
      Nov 15 17:29:10 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:10 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
      Nov 15 17:29:11 tn01-olive.heinlein-akademie.de mariadbd[1778322]: 2023-11-15 17:29:11 0 [Warning] WSREP: Member 0.2 (Oli) requested state transfer from 'any', but it is impossible to select State Transfer donor: Resource temporarily u>
      ...

      and does not recover or fail but hangs endlessly (> 10 min).

      I would expect, that the Cluster finds out, that the garbd cannot not server for SST and elect another node from the other segment as a donor.

      Attachments

        Activity

          People

            Unassigned Unassigned
            oli Oli Sennhauser
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.