Details
Description
We are experiencing a crash of all galera nodes receiving write sets. The operation is a "last resort" clean up stored procedure, that deletes many rows from the same set of related tables. It generally takes 4-5 minutes to run based on our data size, but is crashing within 10-20 seconds if it is going to go wrong.
We have been using this stored procedure, reasonably regularly, without problem on 10.1 for several years. As suggested by Enterprise support, I have also tried this on the latest 10.4 build, which they provided me with a URL to. This also exhibits the problem.
Unfortunately, I have been unable to replicate either simplified reproduction steps, or from a different system of ours. However, I have been able to take a "mariabackup" i.e. physical backup, and reproduce the fault on 2 other clusters. The original, and first replication were on VMware machines. The third system, is an AWS EC2 setup. All 3 have the same MariaDB configuration. I suspect the problem is exposed due to the particular on disk data.
Attached is the log of one of the nodes receiving the writeset.
First round of testing, I found that autocommit needs to be ON.
Due to suspecting the data, and knowing that our QA team were trying to delete rows - I started my test again and used "OPTIMIZE TABLE" on the tables that are touched. This caused
[ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table mediator.SEQUENCE; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log FIRST, end_log_pos 276, Internal MariaDB error code: 1213
|
to appear in the log, at an unusual point in the crash logging.
Because of finding that info, I have now set wsrep_slave_thread = 1, and this completes successfully. Previously the value was 12. I have also tested = 4, which also crashed.
Therefore with this additional knowledge, I am presuming that something in Galera is presuming it can apply certain writesets in parallel when it cannot.
Attachments
Issue Links
- relates to
-
MDEV-27115 10.4.22 segfault at SELECT RELEASE_LOCK() in ull_get_key (bad MDL_ticket)
-
- Closed
-
-
MDEV-27547 Galera node INCONSISTENT state on DELETE with FKs having wsrep_slave_threads > 1
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Affects Version/s | 10.4.22 [ 26031 ] |
Attachment | table-structure.sql [ 59718 ] |
Attachment | storedProcedure.sql [ 59719 ] |
Assignee | Ramesh Sivaraman [ JIRAUSER48189 ] |
Labels | need_feedback |
Labels | need_feedback |
Fix Version/s | 10.4 [ 22408 ] |
Labels | need_feedback |
Labels | need_feedback |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Affects Version/s | 10.5 [ 23123 ] | |
Affects Version/s | 10.6 [ 24028 ] |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] |
Assignee | Ramesh Sivaraman [ JIRAUSER48189 ] | Jan Lindström [ jplindst ] |
Labels | need_feedback |
Priority | Major [ 3 ] | Critical [ 2 ] |
Attachment | unable-to-read-page.fatal.log [ 60989 ] |
Attachment | gdb.txt [ 61018 ] |
Assignee | Jan Lindström [ jplindst ] | Seppo Jaakola [ seppo ] |
Attachment | second-of-crash.combined.log [ 61024 ] |
Link |
This issue relates to |
Workflow | MariaDB v3 [ 126127 ] | MariaDB v4 [ 144419 ] |
Status | Confirmed [ 10101 ] | Open [ 1 ] |
Status | Open [ 1 ] | Needs Feedback [ 10501 ] |
Labels | need_feedback |
Status | Needs Feedback [ 10501 ] | Open [ 1 ] |
Status | Open [ 1 ] | Confirmed [ 10101 ] |
Status | Confirmed [ 10101 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Assignee | Seppo Jaakola [ seppo ] | Jan Lindström [ jplindst ] |
Status | Stalled [ 10000 ] | In Review [ 10002 ] |
issue.field.resolutiondate | 2021-12-20 13:13:01.0 | 2021-12-20 13:13:01.747 |
Fix Version/s | 10.4.23 [ 26807 ] | |
Fix Version/s | 10.5.14 [ 26809 ] | |
Fix Version/s | 10.6.6 [ 26811 ] | |
Fix Version/s | 10.7.2 [ 26813 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Fix Version/s | 10.5 [ 23123 ] | |
Fix Version/s | 10.6 [ 24028 ] | |
Resolution | Fixed [ 1 ] | |
Status | In Review [ 10002 ] | Closed [ 6 ] |
Link |
This issue relates to |
Zendesk Related Tickets | 145774 |