Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26803

Galera crash - Assertion. Possible parallel writeset problem

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.4.21, 10.4.22, 10.5, 10.6
    • 10.4.23, 10.5.14, 10.6.6, 10.7.2
    • Galera
    • None
    • Ubuntu Bionic, using community packages from MariaDB repo.
      Also reproduced with build_43208 of 10.4.22

    Description

      We are experiencing a crash of all galera nodes receiving write sets. The operation is a "last resort" clean up stored procedure, that deletes many rows from the same set of related tables. It generally takes 4-5 minutes to run based on our data size, but is crashing within 10-20 seconds if it is going to go wrong.

      We have been using this stored procedure, reasonably regularly, without problem on 10.1 for several years. As suggested by Enterprise support, I have also tried this on the latest 10.4 build, which they provided me with a URL to. This also exhibits the problem.

      Unfortunately, I have been unable to replicate either simplified reproduction steps, or from a different system of ours. However, I have been able to take a "mariabackup" i.e. physical backup, and reproduce the fault on 2 other clusters. The original, and first replication were on VMware machines. The third system, is an AWS EC2 setup. All 3 have the same MariaDB configuration. I suspect the problem is exposed due to the particular on disk data.

      Attached is the log of one of the nodes receiving the writeset.

      First round of testing, I found that autocommit needs to be ON.

      Due to suspecting the data, and knowing that our QA team were trying to delete rows - I started my test again and used "OPTIMIZE TABLE" on the tables that are touched. This caused

      [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table mediator.SEQUENCE; Deadlock found when trying to get lock; try restarting transaction, Error_code: 1213; handler error HA_ERR_LOCK_DEADLOCK; the event's master log FIRST, end_log_pos 276, Internal MariaDB error code: 1213
      

      to appear in the log, at an unusual point in the crash logging.

      Because of finding that info, I have now set wsrep_slave_thread = 1, and this completes successfully. Previously the value was 12. I have also tested = 4, which also crashed.

      Therefore with this additional knowledge, I am presuming that something in Galera is presuming it can apply certain writesets in parallel when it cannot.

      Attachments

        1. gdb.txt
          205 kB
        2. mysql-receivingNode.log
          6 kB
        3. second-of-crash.combined.log
          110 kB
        4. storedProcedure.sql
          1.0 kB
        5. table-structure.sql
          2 kB
        6. unable-to-read-page.fatal.log
          4.96 MB

        Issue Links

          Activity

            brendon Brendon Abbott created issue -
            valerii Valerii Kravchuk made changes -
            Field Original Value New Value
            Affects Version/s 10.4.22 [ 26031 ]
            brendon Brendon Abbott made changes -
            Attachment table-structure.sql [ 59718 ]
            brendon Brendon Abbott made changes -
            Attachment storedProcedure.sql [ 59719 ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Ramesh Sivaraman [ JIRAUSER48189 ]
            Roel Roel Van de Paar made changes -
            Labels need_feedback
            valerii Valerii Kravchuk made changes -
            Labels need_feedback
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.4 [ 22408 ]
            Roel Roel Van de Paar made changes -
            Labels need_feedback
            valerii Valerii Kravchuk made changes -
            Labels need_feedback
            ramesh Ramesh Sivaraman made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            ramesh Ramesh Sivaraman made changes -
            Affects Version/s 10.5 [ 23123 ]
            Affects Version/s 10.6 [ 24028 ]
            ramesh Ramesh Sivaraman made changes -
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            ramesh Ramesh Sivaraman made changes -
            Assignee Ramesh Sivaraman [ JIRAUSER48189 ] Jan Lindström [ jplindst ]
            jplindst Jan Lindström (Inactive) made changes -
            Labels need_feedback
            jplindst Jan Lindström (Inactive) made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            wdoekes Walter Doekes made changes -
            Attachment unable-to-read-page.fatal.log [ 60989 ]
            jplindst Jan Lindström (Inactive) made changes -
            Attachment gdb.txt [ 61018 ]
            jplindst Jan Lindström (Inactive) made changes -
            Assignee Jan Lindström [ jplindst ] Seppo Jaakola [ seppo ]
            wdoekes Walter Doekes made changes -
            Attachment second-of-crash.combined.log [ 61024 ]
            wdoekes Walter Doekes made changes -
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 126127 ] MariaDB v4 [ 144419 ]
            julien.fritsch Julien Fritsch made changes -
            Status Confirmed [ 10101 ] Open [ 1 ]
            julien.fritsch Julien Fritsch made changes -
            Status Open [ 1 ] Needs Feedback [ 10501 ]
            julien.fritsch Julien Fritsch made changes -
            Labels need_feedback
            jplindst Jan Lindström (Inactive) made changes -
            Status Needs Feedback [ 10501 ] Open [ 1 ]
            jplindst Jan Lindström (Inactive) made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            seppo Seppo Jaakola made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]
            seppo Seppo Jaakola made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            seppo Seppo Jaakola made changes -
            Assignee Seppo Jaakola [ seppo ] Jan Lindström [ jplindst ]
            Status Stalled [ 10000 ] In Review [ 10002 ]
            jplindst Jan Lindström (Inactive) made changes -
            issue.field.resolutiondate 2021-12-20 13:13:01.0 2021-12-20 13:13:01.747
            jplindst Jan Lindström (Inactive) made changes -
            Fix Version/s 10.4.23 [ 26807 ]
            Fix Version/s 10.5.14 [ 26809 ]
            Fix Version/s 10.6.6 [ 26811 ]
            Fix Version/s 10.7.2 [ 26813 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]
            elenst Elena Stepanova made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 145774

            People

              jplindst Jan Lindström (Inactive)
              brendon Brendon Abbott
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.