Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33761

WSREP: Parallel slave worker failed at wsrep_before_command() hook

    XMLWordPrintable

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.11.7
    • None
    • Replication, Server
    • None

    Description

      Replication had stopped at some point on a replica which is one node of a Galera cluster. This is replicating from one node of another cluster.

      I attempted to start it again with "start replica" and got this in the log:

      2024-03-25  8:11:16 5793934 [Note] Slave SQL thread initialized, starting replication in log '****-bin.000032' at position 253202400, relay log './*****-relay-bin.000002' position: 166600085; GTID position '254-25428-710881'
      2024-03-25  8:11:16 5793934 [Note] WSREP: ready state reached
      2024-03-25  8:11:17 5793938 [Warning] WSREP: Parallel slave worker failed at wsrep_before_command() hook
      2024-03-25  8:11:17 5793938 [Warning] Slave: Connection was killed Error_code: 1927
      2024-03-25  8:11:17 5793938 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213
      2024-03-25  8:11:17 5793938 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '*****-bin.000032' position 254472756; GTID position '254-25428-713413'
      2024-03-25  8:11:17 5793937 [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 254-25428-713415, Internal MariaDB error code: 1964
      2024-03-25  8:11:17 5793937 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2024-03-25  8:11:17 5793937 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '*****-bin.000032' position 254472756; GTID position '254-25428-713413'
      2024-03-25  8:11:17 5793936 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
      2024-03-25  8:11:17 5793936 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '*****-bin.000032' position 254472756; GTID position '254-25428-713413'                                                                                                                 2024-03-25  8:11:17 5793935 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964                                                                                                       2024-03-25  8:11:17 5793935 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '*****-bin.000032' position 254472756; GTID position '254-25428-713413'
      

      I tried setting slave_parallel_threads to 1, starting the replication again, and it worked with no problem. I've been able to set parallel back to the previous value (4) since.

      Sadly I don't have the binlogs or relaylogs available to see what the queries were that it was failing on as they were deleted from the replica shortly after being applied and deleted from the primary well before I got involved.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              psumner Phil Sumner
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.