[MDEV-33761] WSREP: Parallel slave worker failed at wsrep_before_command() hook - Jira

XML

Word

Printable

Details

Type: Bug
Status: Open (View Workflow)
Priority: Major
Resolution: Unresolved
Affects Version/s: 10.11.7
Fix Version/s: None
Component/s: Replication, Server
Labels:
None

Description

Replication had stopped at some point on a replica which is one node of a Galera cluster. This is replicating from one node of another cluster.

I attempted to start it again with "start replica" and got this in the log:

2024-03-25  8:11:16 5793934 [Note] Slave SQL thread initialized, starting replication in log '****-bin.000032' at position 253202400, relay log './*****-relay-bin.000002' position: 166600085; GTID position '254-25428-710881'

2024-03-25  8:11:16 5793934 [Note] WSREP: ready state reached

2024-03-25  8:11:17 5793938 [Warning] WSREP: Parallel slave worker failed at wsrep_before_command() hook

2024-03-25  8:11:17 5793938 [Warning] Slave: Connection was killed Error_code: 1927

2024-03-25  8:11:17 5793938 [Warning] Slave: Deadlock found when trying to get lock; try restarting transaction Error_code: 1213

2024-03-25  8:11:17 5793938 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '*****-bin.000032' position 254472756; GTID position '254-25428-713413'

2024-03-25  8:11:17 5793937 [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid 254-25428-713415, Internal MariaDB error code: 1964

2024-03-25  8:11:17 5793937 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964

2024-03-25  8:11:17 5793937 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '*****-bin.000032' position 254472756; GTID position '254-25428-713413'

2024-03-25  8:11:17 5793936 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964

2024-03-25  8:11:17 5793936 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '*****-bin.000032' position 254472756; GTID position '254-25428-713413'                                                                                                                 2024-03-25  8:11:17 5793935 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964                                                                                                       2024-03-25  8:11:17 5793935 [ERROR] Error running query, slave SQL thread aborted. Fix the problem, and restart the slave SQL thread with "SLAVE START". We stopped at log '*****-bin.000032' position 254472756; GTID position '254-25428-713413'

I tried setting slave_parallel_threads to 1, starting the replication again, and it worked with no problem. I've been able to set parallel back to the previous value (4) since.

Sadly I don't have the binlogs or relaylogs available to see what the queries were that it was failing on as they were deleted from the replica shortly after being applied and deleted from the primary well before I got involved.

Attachments

Issue Links

relates to

MDEV-34010 [ERROR] Slave SQL: Commit failed due to failure of an earlier commit on which this one depends, Gtid ..., Internal MariaDB error code: 1964

Open

Activity

People

Assignee:: Unassigned

Reporter:: Phil Sumner

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2024-03-25 08:30

Updated:: 2024-04-27 03:33

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.