Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21466

BF-BF X lock conflict, replication fails, cluster unusable

Details

    Description

      The issue looks as follows: on Galera setup with 3 nodes (let's call them A, B, C) all having wsrep_slave_threads = 96, operation on the table without Primary Key (referred here as 'DB31.trigger_event') crashed cluster with logging following lines:

      2020-01-13 15:51:03 32 [C Note] InnoDB: BF-BF X lock conflict,mode: 2563 supremum: 0conflicts states: my executing locked committing
      2020-01-13 15:51:03 32 [C Note] InnoDB:  SQL1: INSERT INTO `trigger_event` (`contactlist_id`, `date`, `id`) VALUES (102, '2020-01-13 15:50:51', 32) SQL2: NULL
      2020-01-13 15:51:47 79 [C ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452
      2020-01-13 15:51:47 79 [C Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319
      2020-01-13 15:51:47 96 [B ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452
      2020-01-13 15:51:47 96 [B Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319
      2020-01-13 15:51:58 0 [A Note] WSREP: Flow-control interval: [475, 679]
      2020-01-13 15:51:58 0 [A Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2
      2020-01-13 15:51:58 0 [A Note] WSREP: Node 28075c64 state prim
      2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B)
      2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A)
      2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef
      2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: Waiting for state UUID.
      2020-01-13 15:51:58 0 [A Note] WSREP: view(view_id(PRIM,28075c64,17) memb
      2020-01-13 15:51:58 0 [B Note] WSREP: Flow-control interval: [475, 679]
      2020-01-13 15:51:58 0 [B Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2
      2020-01-13 15:51:58 0 [B Note] WSREP: Node 28075c64 state prim
      2020-01-13 15:51:58 0 [B Note] WSREP: Quorum results:
      2020-01-13 15:51:58 0 [B Note] WSREP: save pc into disk
      2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B)
      2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A)
      2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef
      2020-01-13 15:51:58 0 [B Note] WSREP: STATE_EXCHANGE: sent state UUID: 3fc0...51ef
      2020-01-13 15:51:58 0 [B Note] WSREP: view(view_id(PRIM,28075c64,17) memb
      2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [0, 0]
      2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [336, 480]
      2020-01-13 15:51:58 0 [C Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1
      2020-01-13 15:51:58 0 [C Note] WSREP: New SELF-LEAVE.
      2020-01-13 15:51:58 0 [C Note] WSREP: Received NON-PRIMARY.
      2020-01-13 15:51:58 0 [C Note] WSREP: Received SELF-LEAVE. Closing connection.
      

      For convenience, node number inserted into message type, thus [C Note] was originally '[Note]' logged on node C.

      Main problem is that operation with DB31 that was performed on server A made all the other servers unusable and in fact lead to cluster manual restart and full data resynchronization.
      Also, real error message is printed with priority 'Note' (not even 'Warning') that makes it unnoticeable while reading logs. Also, nothing was logged on the node that performed the operation (referred as A)

      Expected behavior:

      As similar behavior was also mentioned in (for example) MDEV-15227 and MDEV-15611, it would be reasonable to
      a) log warning on the node that is performing operation about altering table without PK or issue with Foreign key while working in cluster.
      b) reject such an operation on the node that requested it and keep cluster up.

      Attachments

        1. DB Variables.pdf
          84 kB
        2. errorlog_vcenturio40.zip
          307 kB
        3. errorlog_vcenturio41.zip
          113 kB
        4. insert.sql
          44 kB
        5. LogFiles.zip
          10 kB
        6. reset.sql
          0.1 kB
        7. tablecreate.sql
          0.5 kB

        Issue Links

          Activity

            philipp.jenni@exanic.ch Philipp Jenni added a comment -

            @Mario Karuza: See DB Variables.pdf

            Important: Currently i ran the application with "wsrep_slave_threads=1". With this config the problem does'nt exists. When i change this value to 4 and restart all cluster nodes, than i can reproduce the problem.

            The steps are:

            1. create an empty db
            2. run tablecreate.sql
            3. run insert.sql
            4. run reset.sql
            5. run insert.sql

            If the exception is not raised, run step 4 and 5 multiple times

            philipp.jenni@exanic.ch Philipp Jenni added a comment - @Mario Karuza: See DB Variables.pdf Important: Currently i ran the application with "wsrep_slave_threads=1". With this config the problem does'nt exists. When i change this value to 4 and restart all cluster nodes, than i can reproduce the problem. The steps are: 1. create an empty db 2. run tablecreate.sql 3. run insert.sql 4. run reset.sql 5. run insert.sql If the exception is not raised, run step 4 and 5 multiple times

            Thank you for info,

            Do you run `sql` files directly from connection or by providing files to client ?

            mkaruza Mario Karuza (Inactive) added a comment - Thank you for info, Do you run `sql` files directly from connection or by providing files to client ?
            philipp.jenni@exanic.ch Philipp Jenni added a comment -

            I run the script from heidisql client.

            philipp.jenni@exanic.ch Philipp Jenni added a comment - I run the script from heidisql client.

            Hi Philipp,

            I could not reproduce this problem with provided scripts locally.
            If you can run this again, please set wsrep_debug=3 so we could have more logging information. If you hit this again attach logs from all 3 nodes.

            mkaruza Mario Karuza (Inactive) added a comment - Hi Philipp, I could not reproduce this problem with provided scripts locally. If you can run this again, please set wsrep_debug=3 so we could have more logging information. If you hit this again attach logs from all 3 nodes.
            nkarpin Nikita Karpin added a comment -

            @Mario Karuza, I have and issue very similar to this one, could you please take a look - https://jira.mariadb.org/browse/MDEV-22766. I've attached all logs with wsrep_debug=3 and reproduce script

            nkarpin Nikita Karpin added a comment - @Mario Karuza, I have and issue very similar to this one, could you please take a look - https://jira.mariadb.org/browse/MDEV-22766 . I've attached all logs with wsrep_debug=3 and reproduce script

            People

              mkaruza Mario Karuza (Inactive)
              euglorg Eugene
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.