Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-37543

Galera Cluster node marked as inconsistent due to replication deadlock - no cluster transaction rollback

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.6.15
    • None
    • Galera
    • None

    Description

      We had an incident last week on our Production server where a deadlock for a replicated transaction caused the node that could not apply it being evicted and marked as inconsistent.

      To fix this I had to do a full restore on node.

      Why is this deadlock situation not handled and able to be recovered from automatically?

      The "Failed to report last committed" Warning below seems to be unrelated to this issue as this has been occurring in the logs for a while and still do.

      Aug 28 14:45:20 db1 mariadbd[2312]: 2025-08-28 14:45:20 0 [Warning] WSREP: Failed to report last committed 94a81217-9350-11e9-a666-bae2f92ef610:12317264849, -110 (Connection timed out)
      Aug 28 15:00:22 db1 systemd[1]: session-155694.scope: Consumed 1.679s CPU time.
      Aug 28 15:00:31 db1 mariadbd[2312]: 2025-08-28 15:00:31 0 [Warning] WSREP: Failed to report last committed 94a81217-9350-11e9-a666-bae2f92ef610:12317441203, -110 (Connection timed out)
      Aug 28 15:00:38 db1 systemd[1]: session-155691.scope: Deactivated successfully.
      Aug 28 15:00:38 db1 systemd[1]: session-155691.scope: Consumed 11.894s CPU time.
      Aug 28 15:00:38 db1 systemd[1]: Started Session 155697 of User clustercontrol.
      Aug 28 15:01:58 db1 mariadbd[2312]: 2025-08-28 15:01:58 0 [Warning] WSREP: Failed to report last committed 94a81217-9350-11e9-a666-bae2f92ef610:12317464803, -110 (Connection timed out)
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 12 [ERROR] Slave SQL: Could not execute Write_rows_v1 event on table pmacct.dhcp; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log FIRST, end_log_pos 474, Internal MariaDB error code: 1213
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 12 [Warning] WSREP: Event 3 Write_rows_v1 apply failed: 149, seqno 12317465549
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 0 [Note] WSREP: Member 1(db1) initiates vote on 94a81217-9350-11e9-a666-bae2f92ef610:12317465549,ec6096233b442fa2: Deadlock found when trying to get lock; try restarting transaxtion, Error_code: 1213;
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 0 [Note] WSREP: Votes over 94a81217-9350-11e9-a666-bae2f92ef610:12317465549:
      Aug 28 15:02:04 db1 mariadbd[2312]: ec6096233b442fa2: 1/3
      Aug 28 15:02:04 db1 mariadbd[2312]: Waiting for more votes.
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 0 [Note] WSREP: Member 0(db2) responds to vote on 94a81217-9350-11e9-a666-bae2f92ef610:12317465549,0000000000000000: Success
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 0 [Note] WSREP: Votes over 94a81217-9350-11e9-a666-bae2f92ef610:12317465549:
      Aug 28 15:02:04 db1 mariadbd[2312]: 0000000000000000: 1/3
      Aug 28 15:02:04 db1 mariadbd[2312]: ec6096233b442fa2: 1/3
      Aug 28 15:02:04 db1 mariadbd[2312]: Waiting for more votes.
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 0 [Note] WSREP: Member 2(db3) responds to vote on 94a81217-9350-11e9-a666-bae2f92ef610:12317465549,0000000000000000: Success
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 0 [Note] WSREP: Votes over 94a81217-9350-11e9-a666-bae2f92ef610:12317465549:
      Aug 28 15:02:04 db1 mariadbd[2312]: 0000000000000000: 2/3
      Aug 28 15:02:04 db1 mariadbd[2312]: ec6096233b442fa2: 1/3
      Aug 28 15:02:04 db1 mariadbd[2312]: Winner: 0000000000000000
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 12 [ERROR] WSREP: Inconsistency detected: Inconsistent by consensus on 94a81217-9350-11e9-a666-bae2f92ef610:12317465549
      Aug 28 15:02:04 db1 mariadbd[2312]: #011 at ./galera/src/replicator_smm.cpp:process_apply_error():1357
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 12 [Note] WSREP: Closing send monitor...
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 12 [Note] WSREP: Closed send monitor.
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 12 [Note] WSREP: gcomm: terminating thread
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 12 [Note] WSREP: gcomm: joining thread
      Aug 28 15:02:04 db1 mariadbd[2312]: 2025-08-28 15:02:04 12 [Note] WSREP: gcomm: closing backend
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: view(view_id(NON_PRIM,9b78393e-ab0c,17) memb

      { Aug 28 15:02:06 db1 mariadbd[2312]: #011ac666a25-a9db,1 Aug 28 15:02:06 db1 mariadbd[2312]: }

      joined

      { Aug 28 15:02:06 db1 mariadbd[2312]: }

      left

      { Aug 28 15:02:06 db1 mariadbd[2312]: }

      partitioned

      { Aug 28 15:02:06 db1 mariadbd[2312]: #0119b78393e-ab0c,1 Aug 28 15:02:06 db1 mariadbd[2312]: #011db6f27b8-9816,2 Aug 28 15:02:06 db1 mariadbd[2312]: }

      )
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: PC protocol downgrade 1 -> 0
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: view((empty))
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: Deferred close timer started for socket with remote endpoint: tcp://x.x.x.x:34588
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: gcomm: closed
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: Flow-control interval: [16, 16]
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: Received NON-PRIMARY.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 12317465732)
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: New SELF-LEAVE.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: Flow-control interval: [0, 0]
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 12317465732)
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Note] WSREP: RECV thread exiting 0: Success
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: recv_thread() joined.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: Closing replication queue.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: Closing slave action queue.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [ERROR] WSREP: Failed to apply write set: gtid: 94a81217-9350-11e9-a666-bae2f92ef610:12317465549 server_id: db6f27b8-be5b-11ef-9816-465c554bfe64 client_id: 19986695 trx_id: 47845349661 flags: 3 (start_transaction | commit)
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 0 [Warning] WSREP: Failed to report last committed 94a81217-9350-11e9-a666-bae2f92ef610:12317465524, -77 (File descriptor in bad state)
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: ================================================
      Aug 28 15:02:06 db1 mariadbd[2312]: View:
      Aug 28 15:02:06 db1 mariadbd[2312]: id: 94a81217-9350-11e9-a666-bae2f92ef610:12317465732
      Aug 28 15:02:06 db1 mariadbd[2312]: status: non-primary
      Aug 28 15:02:06 db1 mariadbd[2312]: protocol_version: 4
      Aug 28 15:02:06 db1 mariadbd[2312]: capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
      Aug 28 15:02:06 db1 mariadbd[2312]: final: no
      Aug 28 15:02:06 db1 mariadbd[2312]: own_index: 0
      Aug 28 15:02:06 db1 mariadbd[2312]: members(1):
      Aug 28 15:02:06 db1 mariadbd[2312]: #0110: ac666a25-40d0-11f0-a9db-c77b814ee77c, db1
      Aug 28 15:02:06 db1 mariadbd[2312]: =================================================
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: Non-primary view
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: Server status change synced -> connected
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: ================================================
      Aug 28 15:02:06 db1 mariadbd[2312]: View:
      Aug 28 15:02:06 db1 mariadbd[2312]: id: 94a81217-9350-11e9-a666-bae2f92ef610:12317465732
      Aug 28 15:02:06 db1 mariadbd[2312]: status: non-primary
      Aug 28 15:02:06 db1 mariadbd[2312]: protocol_version: 4
      Aug 28 15:02:06 db1 mariadbd[2312]: capabilities: MULTI-MASTER, CERTIFICATION, PARALLEL_APPLYING, REPLAY, ISOLATION, PAUSE, CAUSAL_READ, INCREMENTAL_WS, UNORDERED, PREORDERED, STREAMING, NBO
      Aug 28 15:02:06 db1 mariadbd[2312]: final: yes
      Aug 28 15:02:06 db1 mariadbd[2312]: own_index: -1
      Aug 28 15:02:06 db1 mariadbd[2312]: members(0):
      Aug 28 15:02:06 db1 mariadbd[2312]: =================================================
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: Non-primary view
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: Server status change connected -> disconnected
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 12 [Note] WSREP: wsrep_notify_cmd is not defined, skipping notification.
      Aug 28 15:02:06 db1 mariadbd[2312]: 2025-08-28 15:02:06 10 [Note] WSREP: Applier thread exiting ret: 6 thd: 10

      Attachments

        Activity

          People

            Unassigned Unassigned
            stephanvos Stephan Vos
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.