Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Incomplete
-
10.4.10
-
Linux 4.19.92-gentoo x86_64 AMD EPYC 7451, galera-26.4.3, mariadb-10.4.10
Description
The issue looks as follows: on Galera setup with 3 nodes (let's call them A, B, C) all having wsrep_slave_threads = 96, operation on the table without Primary Key (referred here as 'DB31.trigger_event') crashed cluster with logging following lines:
2020-01-13 15:51:03 32 [C Note] InnoDB: BF-BF X lock conflict,mode: 2563 supremum: 0conflicts states: my executing locked committing |
2020-01-13 15:51:03 32 [C Note] InnoDB: SQL1: INSERT INTO `trigger_event` (`contactlist_id`, `date`, `id`) VALUES (102, '2020-01-13 15:50:51', 32) SQL2: NULL |
2020-01-13 15:51:47 79 [C ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452 |
2020-01-13 15:51:47 79 [C Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319 |
2020-01-13 15:51:47 96 [B ERROR] Slave SQL: Could not execute Write_rows_v1 event on table DB31.contact_activity_order_items; Cannot add or update a child row: a foreign key constraint fails (`DB31`.`contact_activity_order_items`, CONSTRAINT `contact_activity_order_items_ibfk_1` FOREIGN KEY (`contact_activity_id`) REFERENCES `contact_activity` (`id`) ON DELETE CASCADE ON UP), Error_code: 1452; handler error HA_ERR_NO_REFERENCED_ROW; the event's master log FIRST, end_log_pos 1929, Internal MariaDB error code: 1452 |
2020-01-13 15:51:47 96 [B Warning] WSREP: Event 3 Write_rows_v1 apply failed: 151, seqno 25684319 |
2020-01-13 15:51:58 0 [A Note] WSREP: Flow-control interval: [475, 679] |
2020-01-13 15:51:58 0 [A Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 1, memb_num = 2 |
2020-01-13 15:51:58 0 [A Note] WSREP: Node 28075c64 state prim |
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B) |
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A) |
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef |
2020-01-13 15:51:58 0 [A Note] WSREP: STATE EXCHANGE: Waiting for state UUID. |
2020-01-13 15:51:58 0 [A Note] WSREP: view(view_id(PRIM,28075c64,17) memb |
2020-01-13 15:51:58 0 [B Note] WSREP: Flow-control interval: [475, 679] |
2020-01-13 15:51:58 0 [B Note] WSREP: New COMPONENT: primary = yes, bootstrap = no, my_idx = 0, memb_num = 2 |
2020-01-13 15:51:58 0 [B Note] WSREP: Node 28075c64 state prim |
2020-01-13 15:51:58 0 [B Note] WSREP: Quorum results: |
2020-01-13 15:51:58 0 [B Note] WSREP: save pc into disk |
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 0 (B) |
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: got state msg: 3fc0...51ef from 1 (A) |
2020-01-13 15:51:58 0 [B Note] WSREP: STATE EXCHANGE: sent state msg: 3fc0...51ef |
2020-01-13 15:51:58 0 [B Note] WSREP: STATE_EXCHANGE: sent state UUID: 3fc0...51ef |
2020-01-13 15:51:58 0 [B Note] WSREP: view(view_id(PRIM,28075c64,17) memb |
2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [0, 0] |
2020-01-13 15:51:58 0 [C Note] WSREP: Flow-control interval: [336, 480] |
2020-01-13 15:51:58 0 [C Note] WSREP: New COMPONENT: primary = no, bootstrap = no, my_idx = 0, memb_num = 1 |
2020-01-13 15:51:58 0 [C Note] WSREP: New SELF-LEAVE. |
2020-01-13 15:51:58 0 [C Note] WSREP: Received NON-PRIMARY. |
2020-01-13 15:51:58 0 [C Note] WSREP: Received SELF-LEAVE. Closing connection. |
For convenience, node number inserted into message type, thus [C Note] was originally '[Note]' logged on node C.
Main problem is that operation with DB31 that was performed on server A made all the other servers unusable and in fact lead to cluster manual restart and full data resynchronization.
Also, real error message is printed with priority 'Note' (not even 'Warning') that makes it unnoticeable while reading logs. Also, nothing was logged on the node that performed the operation (referred as A)
Expected behavior:
As similar behavior was also mentioned in (for example) MDEV-15227 and MDEV-15611, it would be reasonable to
a) log warning on the node that is performing operation about altering table without PK or issue with Foreign key while working in cluster.
b) reject such an operation on the node that requested it and keep cluster up.
Attachments
Issue Links
- relates to
-
MDEV-15227 Inserted data violates foreign key constraint and breaks (Galera and regular async) replication
- Closed
-
MDEV-18317 Mariadb Crashed due to BF-BF X lock conflict
- Closed