Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-18393

Galera node kills after DELETE statement with foreign keys

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Cannot Reproduce
    • 10.1.37
    • N/A
    • Galera
    • None
    • Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
      MariaDB-server-10.1.37-1.el7.centos.x86_64
      galera-25.3.24-1.rhel7.el7.centos.x86_64

    Description

      Galera node kills itself after two DELETE statements in tables with a FK. This looks like a repetition of bug MDEV-15611.

      Statements run on 'master1':

      DELETE FROM `invoice_data` WHERE `project_id`=<identifier>;
      DELETE FROM `invoice` WHERE `project_id`=<identifier>;
      

      Resulting in shutdown on 'master2':

      2019-01-28  9:37:45 140253035670272 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table redacteddbname.invoice; Cannot delete or update a parent row: a foreign key constraint fails (`redacteddbname`.`invoice_data`, CONSTRAINT `invoice_data_ibfk_6` FOREIGN KEY (`invoice_id`) REFERENCES `invoice` (`id`) ON UPDATE CASCADE), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 1137, Internal MariaDB error code: 1451
      2019-01-28  9:37:45 140253035670272 [Warning] WSREP: RBR event 2 Delete_rows_v1 apply warning: 152, 534700372
      2019-01-28  9:37:45 140253035670272 [Warning] WSREP: Failed to apply app buffer: seqno: 534700372, status: 1
      	 at galera/src/trx_handle.cpp:apply():353
      Retrying 2th time
      2019-01-28  9:37:45 140253035670272 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table redacteddbname.invoice; Cannot delete or update a parent row: a foreign key constraint fails (`redacteddbname`.`invoice_data`, CONSTRAINT `invoice_data_ibfk_6` FOREIGN KEY (`invoice_id`) REFERENCES `invoice` (`id`) ON UPDATE CASCADE), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 1137, Internal MariaDB error code: 1451
      2019-01-28  9:37:45 140253035670272 [Warning] WSREP: RBR event 2 Delete_rows_v1 apply warning: 152, 534700372
      2019-01-28  9:37:45 140253035670272 [Warning] WSREP: Failed to apply app buffer: seqno: 534700372, status: 1
      	 at galera/src/trx_handle.cpp:apply():353
      Retrying 3th time
      2019-01-28  9:37:45 140253035670272 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table redacteddbname.invoice; Cannot delete or update a parent row: a foreign key constraint fails (`redacteddbname`.`invoice_data`, CONSTRAINT `invoice_data_ibfk_6` FOREIGN KEY (`invoice_id`) REFERENCES `invoice` (`id`) ON UPDATE CASCADE), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 1137, Internal MariaDB error code: 1451
      2019-01-28  9:37:45 140253035670272 [Warning] WSREP: RBR event 2 Delete_rows_v1 apply warning: 152, 534700372
      2019-01-28  9:37:45 140253035670272 [Warning] WSREP: Failed to apply app buffer: seqno: 534700372, status: 1
      	 at galera/src/trx_handle.cpp:apply():353
      Retrying 4th time
      2019-01-28  9:37:45 140253035670272 [ERROR] Slave SQL: Could not execute Delete_rows_v1 event on table redacteddbname.invoice; Cannot delete or update a parent row: a foreign key constraint fails (`redacteddbname`.`invoice_data`, CONSTRAINT `invoice_data_ibfk_6` FOREIGN KEY (`invoice_id`) REFERENCES `invoice` (`id`) ON UPDATE CASCADE), Error_code: 1451; handler error HA_ERR_ROW_IS_REFERENCED; the event's master log FIRST, end_log_pos 1137, Internal MariaDB error code: 1451
      2019-01-28  9:37:45 140253035670272 [Warning] WSREP: RBR event 2 Delete_rows_v1 apply warning: 152, 534700372
      2019-01-28  9:37:45 140253035670272 [ERROR] WSREP: Failed to apply trx: source: 3eef3e4c-184f-11e9-b37b-a3373db3c1d5 version: 4 local: 0 state: APPLYING flags: 1 conn_id: 8595856 trx_id: 7949399192 seqnos (l: 5819527, g: 534700372, s: 534700371, d: 534700359, ts: 5129740410281513)
      2019-01-28  9:37:45 140253035670272 [ERROR] WSREP: Failed to apply trx 534700372 4 times
      2019-01-28  9:37:45 140253035670272 [ERROR] WSREP: Node consistency compromised, aborting...
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: Closing send monitor...
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: Closed send monitor.
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: gcomm: terminating thread
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: gcomm: joining thread
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: gcomm: closing backend
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: view(view_id(NON_PRIM,3eef3e4c,275) memb {
      	52331508,0
      } joined {
      } left {
      } partitioned {
      	3eef3e4c,0
      	a64bdc05,0
      })
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: view((empty))
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: gcomm: closed
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Flow-control interval: [16, 16]
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Trying to continue unpaused monitor
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Received NON-PRIMARY.
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Shifting SYNCED -> OPEN (TO: 534700420)
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Received self-leave message.
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Flow-control interval: [0, 0]
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Trying to continue unpaused monitor
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Received SELF-LEAVE. Closing connection.
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: Shifting OPEN -> CLOSED (TO: 534700420)
      2019-01-28  9:37:45 140248671577856 [Note] WSREP: RECV thread exiting 0: Success
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: recv_thread() joined.
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: Closing replication queue.
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: Closing slave action queue.
      2019-01-28  9:37:45 140253035670272 [Note] WSREP: /usr/sbin/mysqld: Terminated...
      

      Attachments

        1. user.test
          6 kB
          Jan Lindström

        Issue Links

          Activity

            menmarc Marco Menzel added a comment -

            10.2.22 still crashing with same problem, only if wsrep_slave_thread > 1

            menmarc Marco Menzel added a comment - 10.2.22 still crashing with same problem, only if wsrep_slave_thread > 1
            prupert Pim Rupert added a comment -

            Hi Marco,

            Do you have an easily reproducible case? We find it incredibly hard to reproduce the problem on our 10.1 cluster.

            > only if wsrep_slave_thread > 1

            You did not have the issue when setting wsrep_slave_threads to 1?

            prupert Pim Rupert added a comment - Hi Marco, Do you have an easily reproducible case? We find it incredibly hard to reproduce the problem on our 10.1 cluster. > only if wsrep_slave_thread > 1 You did not have the issue when setting wsrep_slave_threads to 1?
            menmarc Marco Menzel added a comment - - edited

            Sorry, no reproducible case available. With wsrep_slave_threads = 1 the issue does'nt occur so far. There are some foreign key constructs without 'ON DELETE'.

            menmarc Marco Menzel added a comment - - edited Sorry, no reproducible case available. With wsrep_slave_threads = 1 the issue does'nt occur so far. There are some foreign key constructs without 'ON DELETE'.

            Tried to repeat with different kind of deletes and data (see attached test case) but failed.

            jplindst Jan Lindström (Inactive) added a comment - Tried to repeat with different kind of deletes and data (see attached test case) but failed.
            spikestabber Brendan P added a comment - - edited

            This one isn't easily reproducible at all and occurs completely at random, but is more often to occur when you stress your many wsrep applier threads, such as with a large table alter using an online ddl tool like pt-online-schema-change to alter your table. Something is not sanity checking the execution ordering of the queries with ON QUERYTYPE CASCADE FK statements.

            I've had several cases where a simple table rename gets replicated before all the queries that do triggers to that table even get applied, causing the slave to abort on a FK apply error to a table that no longer exists...

            spikestabber Brendan P added a comment - - edited This one isn't easily reproducible at all and occurs completely at random, but is more often to occur when you stress your many wsrep applier threads, such as with a large table alter using an online ddl tool like pt-online-schema-change to alter your table. Something is not sanity checking the execution ordering of the queries with ON QUERYTYPE CASCADE FK statements. I've had several cases where a simple table rename gets replicated before all the queries that do triggers to that table even get applied, causing the slave to abort on a FK apply error to a table that no longer exists...

            People

              seppo Seppo Jaakola
              prupert Pim Rupert
              Votes:
              4 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.