Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34998

after cluster vote to evict a node that failed a transaction, current master can't commit anymore

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.26
    • 10.5
    • Galera
    • None

    Description

      After a corrupted table on one node triggers a cluster vote and that node is evicted, current write node becomes unable to commit and hangs, causing an outage.

      Relevant logs below.

      2024-05-29  9:14:30 0 [Note] WSREP: Member 3(MDB-MASTER2) initiates vote on 9b6e1db0-b6f6-11ee-ae6c-325024b331f7:1102116409,de3bca72fc9e26b6:  Table DWHTmp/MAJ_EVENEMENTS_RAPPROCHEMENT in tablespace 94453495827864 corrupted., Error_code: 126; Index for table 'MAJ_EVENEMENTS_RAPPROCHEMENT' is corrupt; try to repair it, Error_code: 1034;
      2024-05-29  9:14:30 0 [Note] WSREP: Votes over 9b6e1db0-b6f6-11ee-ae6c-325024b331f7:1102116409:
         de3bca72fc9e26b6:   1/4
      Waiting for more votes.
      2024-05-29  9:14:30 62 [Note] WSREP: Got vote request for seqno 9b6e1db0-b6f6-11ee-ae6c-325024b331f7:1102116409
      2024-05-29  9:14:30 0 [Note] WSREP: Member 1(MDB-MASTER1) responds to vote on 9b6e1db0-b6f6-11ee-ae6c-325024b331f7:1102116409,0000000000000000: Success
      2024-05-29  9:14:30 0 [Warning] WSREP: Received bogus VOTE message: 1102116409.0, from node 466cd722-ccc2-11ee-9a91-3eb4f65fbc4a, expected > 1102116419. Ignoring.
      2024-05-29  9:14:30 0 [Note] WSREP: Votes over 9b6e1db0-b6f6-11ee-ae6c-325024b331f7:1102116409:
         0000000000000000:   1/4
         de3bca72fc9e26b6:   1/4
      Waiting for more votes.
      2024-05-29  9:14:30 0 [Note] WSREP: Member 2(MDB-MASTER3) responds to vote on 9b6e1db0-b6f6-11ee-ae6c-325024b331f7:1102116409,0000000000000000: Success
      2024-05-29  9:14:30 0 [Note] WSREP: Votes over 9b6e1db0-b6f6-11ee-ae6c-325024b331f7:1102116409:
         0000000000000000:   2/4
         de3bca72fc9e26b6:   1/4
      Winner: 0000000000000000
      2024-05-29  9:14:31 0 [Note] WSREP: (466cd722-9a91, 'tcp://0.0.0.0:4567') turning message relay requesting on, nonlive peers: tcp://10.10.1.102:4567
      2024-05-29  9:14:32 0 [Note] WSREP: declaring 3b05300a-b428 at tcp://10.10.1.104:4567 stable
      2024-05-29  9:14:32 0 [Note] WSREP: declaring adc7ace6-99bf at tcp://10.10.1.103:4567 stable
      2024-05-29  9:14:32 0 [Note] WSREP: forgetting bef69cea-ba54 (tcp://10.10.1.102:4567)
      2024-05-29  9:14:32 0 [Note] WSREP: (466cd722-9a91, 'tcp://0.0.0.0:4567') turning message relay requesting off
      2024-05-29  9:14:32 0 [Note] WSREP: Node 3b05300a-b428 state prim
      2024-05-29  9:14:32 0 [Note] WSREP: view(view_id(PRIM,3b05300a-b428,146) memb {
              3b05300a-b428,0
              466cd722-9a91,0
              adc7ace6-99bf,0
      } joined {
      } left {
      } partitioned {
              bef69cea-ba54,0
      })
      2024-05-29  9:14:32 0 [Note] WSREP: save pc into disk
      2024-05-29  9:14:32 0 [Note] WSREP: forgetting bef69cea-ba54 (tcp://10.10.1.102:4567)
      2024-05-29  9:14:32 0 [Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 3
      2024-05-29  9:14:32 0 [Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
      2024-05-29  9:14:32 0 [Note] WSREP: STATE EXCHANGE: sent state msg: dc98c2db-1d9b-11ef-857f-f37815291fa4
      2024-05-29  9:14:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: dc98c2db-1d9b-11ef-857f-f37815291fa4 from 0 (MDB-MASTER4)
      2024-05-29  9:14:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: dc98c2db-1d9b-11ef-857f-f37815291fa4 from 2 (MDB-MASTER3)
      2024-05-29  9:14:32 0 [Note] WSREP: STATE EXCHANGE: got state msg: dc98c2db-1d9b-11ef-857f-f37815291fa4 from 1 (MDB-MASTER1)
      2024-05-29  9:14:32 0 [Note] WSREP: Quorum results:
              version    = 6,
              component  = PRIMARY,
              conf_id    = 51,
              members    = 3/3 (joined/total),
              act_id     = 1102116457,
              last_appl. = 1102116355,
              protocols  = 2/10/4 (gcs/repl/appl),
              vote policy= 0,
              group UUID = 9b6e1db0-b6f6-11ee-ae6c-325024b331f7
      2024-05-29  9:14:32 0 [Note] WSREP: Flow-control interval: [240, 300]
      2024-05-29  9:14:37 0 [Note] WSREP:  cleaning up bef69cea-ba54 (tcp://10.10.1.102:4567)
      

      Attachments

        Activity

          People

            sysprg Julius Goryavsky
            sysprg Julius Goryavsky
            Votes:
            1 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.