[MDEV-15199] Triangular FKs - Cascade delete causes broken referential integrity Created: 2018-02-03  Updated: 2021-03-09  Resolved: 2018-02-07

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Delete
Affects Version/s: 10.2.2, 10.3.0
Fix Version/s: 10.2.13, 10.3.5

Type: Bug Priority: Blocker
Reporter: Oliver Schonrock Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: delete, foreign-keys, innodb, upstream-fixed
Environment:

FreeBSD ports


Attachments: PNG File maridb_fk.png    
Issue Links:
Blocks
is blocked by MDEV-15219 FOREIGN KEY CASCADE or SET NULL opera... Closed
Duplicate
duplicates MDEV-14222 Unnecessary 'cascade' memory allocati... Closed
Relates
relates to MDEV-13902 Orphan rows despite ON DELETE CASCADE... Closed
relates to MDEV-25087 Stack overflow upon exceeding FOREIGN... Open
relates to MDEV-9663 InnoDB assertion failure: *cursor->in... Closed

 Description   

See SQL test case below for detail. See this thread for problem discovery:
https://lists.launchpad.net/maria-developers/msg11102.html

For a "triangular" table FK structure (see attached png visual) deleting parent record, should delete both children, but the ON DELETE CASCADE does not delete record in 2nd child table when 2nd child has ON DELETE SET NULL constraint to 1st child

Referential integrity is broken after parent is deleted. LEFT JOINs do not show broken FK.

MySQL 5.7.21 and MariaDB 10.1.30 do not exhibit this broken behaviour.

-- Server version: 10.2.12-MariaDB FreeBSD Ports
 
-- setup DB structure
 
SET FOREIGN_KEY_CHECKS=0;
 
CREATE TABLE member (
  id int(11) NOT NULL AUTO_INCREMENT,
  PRIMARY KEY (id)
) ENGINE=InnoDB;
 
INSERT INTO member VALUES (1);
 
CREATE TABLE address (
  id int(11) NOT NULL AUTO_INCREMENT,
  member_id int(11) NOT NULL,
  PRIMARY KEY (id),
  KEY address_FI_1 (member_id),
  CONSTRAINT address_FK_1 FOREIGN KEY (member_id) REFERENCES member (id) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB;
 
INSERT INTO address VALUES (2,1);
 
CREATE TABLE payment_method (
  id int(11) NOT NULL AUTO_INCREMENT,
  member_id int(11) NOT NULL,
  cardholder_address_id int(11) DEFAULT NULL,
  PRIMARY KEY (id),
  KEY payment_method_FI_1 (member_id),
  KEY payment_method_FI_2 (cardholder_address_id),
  CONSTRAINT payment_method_FK_1 FOREIGN KEY (member_id) REFERENCES member (id) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT payment_method_FK_2 FOREIGN KEY (cardholder_address_id) REFERENCES address (id) ON DELETE SET NULL ON UPDATE CASCADE
) ENGINE=InnoDB;
 
INSERT INTO payment_method VALUES (3,1,2);
 
SET FOREIGN_KEY_CHECKS=1;
 
-- show inserted data
 
SELECT * FROM member;
+----+
| id |
+----+
|  1 |
+----+
 
SELECT * FROM address;
+----+-----------+
| id | member_id |
+----+-----------+
|  2 |         1 |
+----+-----------+
 
SELECT * FROM payment_method;
+----+-----------+-----------------------+
| id | member_id | cardholder_address_id |
+----+-----------+-----------------------+
|  3 |         1 |                     2 |
+----+-----------+-----------------------+
 
-- finished setup now delete the central `member` record 
DELETE FROM member WHERE id = 1;
 
-- check the resulting state
 
SELECT * FROM member;
-- empty set, correct
 
SELECT * FROM address;
-- empty set, correct
 
SELECT * FROM payment_method;
+----+-----------+-----------------------+
| id | member_id | cardholder_address_id |
+----+-----------+-----------------------+
|  3 |         1 |                  NULL |
+----+-----------+-----------------------+
-- should be an empty set
-- referential integrity is broken: payment_method.member_id references a non-existent member.id
 
-- inconsistency continues during subsequent left join queries
 
SELECT
  payment_method.id, member_id
FROM
  payment_method
LEFT JOIN
  member ON member.id=payment_method.member_id
WHERE
  payment_method.member_id IS NOT NULL AND member.id IS NULL;
 
--  empty set, when it should not be
-- note that if you dump and insert the broken DB, then above left join DOES show the broken FK, which is how I discovered it



 Comments   
Comment by Elena Stepanova [ 2018-02-04 ]

Thanks for the report and the test case. Reproducible as described on 10.2, including older 10.2 releases (lowest that I tried was 10.2.4). Not reproducible on 10.1 and MySQL 5.7.

Debug builds abort with the assertion failure upon DELETE:

10.2 859fe1f24d499

mysqld: /data/src/10.2/storage/innobase/row/row0upd.cc:2839: dberr_t row_upd_clust_rec(ulint, upd_node_t*, dict_index_t*, ulint*, mem_heap_t**, que_thr_t*, mtr_t*): Assertion `!rec_get_deleted_flag(btr_cur_get_rec(btr_cur), dict_table_is_comp(index->table))' failed.
180204 14:48:46 [ERROR] mysqld got signal 6 ;
 
#7  0x00007f0d9b04cee2 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000055687e288dd3 in row_upd_clust_rec (flags=0, node=0x7f0d4003ff78, index=0x7f0d40038688, offsets=0x7f0d944b81f0, offsets_heap=0x7f0d944b8198, thr=0x7f0d40150548, mtr=0x7f0d944b8510) at /data/src/10.2/storage/innobase/row/row0upd.cc:2838
#9  0x000055687e289ee5 in row_upd_clust_step (node=0x7f0d4003ff78, thr=0x7f0d40150548) at /data/src/10.2/storage/innobase/row/row0upd.cc:3215
#10 0x000055687e28a189 in row_upd (node=0x7f0d4003ff78, thr=0x7f0d40150548) at /data/src/10.2/storage/innobase/row/row0upd.cc:3277
#11 0x000055687e28a669 in row_upd_step (thr=0x7f0d40150548) at /data/src/10.2/storage/innobase/row/row0upd.cc:3423
#12 0x000055687e22e129 in row_update_for_mysql (prebuilt=0x7f0d4014f798) at /data/src/10.2/storage/innobase/row/row0mysql.cc:1923
#13 0x000055687e0f28a2 in ha_innobase::delete_row (this=0x7f0d4014dea8, record=0x7f0d40091218 "\377\001") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:9341
#14 0x000055687dde891a in handler::ha_delete_row (this=0x7f0d4014dea8, buf=0x7f0d40091218 "\377\001") at /data/src/10.2/sql/handler.cc:6061
#15 0x000055687df77168 in mysql_delete (thd=0x7f0d40000b00, table_list=0x7f0d400125d0, conds=0x7f0d40012d80, order_list=0x7f0d40005038, limit=18446744073709551615, options=0, result=0x0) at /data/src/10.2/sql/sql_delete.cc:583
#16 0x000055687db6c052 in mysql_execute_command (thd=0x7f0d40000b00) at /data/src/10.2/sql/sql_parse.cc:4616
#17 0x000055687db763c2 in mysql_parse (thd=0x7f0d40000b00, rawbuf=0x7f0d400124e8 "DELETE FROM member WHERE id = 1", length=31, parser_state=0x7f0d944ba200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7897
#18 0x000055687db64335 in dispatch_command (command=COM_QUERY, thd=0x7f0d40000b00, packet=0x7f0d4008d171 "DELETE FROM member WHERE id = 1", packet_length=31, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1806
#19 0x000055687db62c98 in do_command (thd=0x7f0d40000b00) at /data/src/10.2/sql/sql_parse.cc:1360
#20 0x000055687dcb0f28 in do_handle_one_connection (connect=0x55688115ceb0) at /data/src/10.2/sql/sql_connect.cc:1335
#21 0x000055687dcb0cb5 in handle_one_connection (arg=0x55688115ceb0) at /data/src/10.2/sql/sql_connect.cc:1241
#22 0x000055687e0d047e in pfs_spawn_thread (arg=0x5568810c0800) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#23 0x00007f0d9cd23494 in start_thread (arg=0x7f0d944bb700) at pthread_create.c:333
#24 0x00007f0d9b10993f in clone () from /lib/x86_64-linux-gnu/libc.so.6

Comment by Oliver Schonrock [ 2018-02-04 ]

@Elena Stepanova

Thank you for testing and confirming that I am not going insane.

Happy to compile / test any patches if that helps.

Comment by Marko Mäkelä [ 2018-02-06 ]

On a related note, MDEV-13331 introduced a different form of FOREIGN KEY breakage in MariaDB 10.2.8: MDEV-15219

Comment by Marko Mäkelä [ 2018-02-06 ]

I can repeat the debug assertion failure with MySQL 5.7.20 but not MySQL 5.7.21.
I believe that it was fixed as a byproduct of
Bug #26191879 FOREIGN KEY CASCADES USE EXCESSIVE MEMORY.

Comment by Oliver Schonrock [ 2018-02-06 ]

@marko

On a related note, I have the feeling that ON UPDATE CASCADE is broken, starting with MariaDB 10.2.
If you have tables a→b→c and update a, it would cascade to b but not to c.

Tested update for the test case above like this:

-- Server version: 10.2.12-MariaDB FreeBSD Ports
 
-- same CREATE and INSERT as in original report
-- now change the PK on `member`
UPDATE member set id=10 WHERE id = 1;
 
-- resulting state
+----+
| id |
+----+
| 10 |
+----+
+----+-----------+
| id | member_id |
+----+-----------+
|  2 |        10 |
+----+-----------+
+----+-----------+-----------------------+
| id | member_id | cardholder_address_id |
+----+-----------+-----------------------+
|  3 |        10 |                     2 |
+----+-----------+-----------------------+
 
-- looks OK to me for this test case

Comment by Oliver Schonrock [ 2018-02-06 ]

@marko

I can repeat the debug assertion failure with MySQL 5.7.20 but not MySQL 5.7.21.
I believe that it was fixed as a byproduct of
Bug #26191879 FOREIGN KEY CASCADES USE EXCESSIVE MEMORY.

Thanks. Great find of the upstream change from 5.7.20 => 5.7.21

That heap based FK cascade code has been there since MySQL 5.7.2. That might suggest that this broken FK behvaiour has existed in MySQL since 5.7.2:
gurusami committed on 14 May 2013
https://github.com/mysql/mysql-server/commit/377774689bf6a16af74182753fe950d514c2c6dd

We noticed the broken Referential Integrity within 2 weeks of using an affected version. Surely we are not the only MariaDB or MySQL user who had used FKs like this since 2013?

More likely some later commit, closer to 5.7.20, interacted with the heap based, iterative FK cascade code to cause this bug and when they reverted to stack based recursion, that removed this problem also.

As you say "it was a side effect"?

Comment by Marko Mäkelä [ 2018-02-06 ]

oschonrock, I believe that the MySQL 5.7.2 change may have introduced this corruption, but nobody noticed until now.

For what it is worth, if you look at the commit, it is changing the file mysql-test/suite/innodb/r/innodb-index-online-fk.result. This result change looks acceptable to me, as there are multiple FOREIGN KEY constraints between the tables, and the order of evaluating SET NULL and CASCADE is affected based on the code change.

Comment by Oliver Schonrock [ 2018-02-06 ]

marko, you're right. Notwithstanding my limited understanding of that code, that does indeed look like it might have been the trigger for the cascade delete break.

Wow, so that means potentially many users since GA of MySQL 5.7 and MariaDB 10.2, who are using such FK structures have these "failed cascade delete" orphan records lying around. Perhaps what we are doing is more unusual than I thought, or perhaps it is down to "not being noticed".

As stated in the original report, finding the "orphaned" record is not obvious. By FK doesn't work:

MariaDB [test]> select * from payment_method;
+----+-----------+-----------------------+
| id | member_id | cardholder_address_id |
+----+-----------+-----------------------+
|  3 |         1 |                  NULL |
+----+-----------+-----------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> select * from payment_method where member_id = 1; 
Empty set (0.00 sec)

and a LEFT JOIN won't find it either as shown in original report. As I explained, the only way I found it, is by mysqldump followed by inserting that dump and then running the LEFT JOIN.

If this gets fixed by merging the upstream fix/revert to stack based recursion for FK cascade, would it make sense to alert MariaDB users (and MySQL?) that they may have these orphaned records silently lying around in their DB?

BTW: I also double confirmed that MySQL 5.7.20 exhibits the broken behaviour (while 5.7.21 does not).

Thanks for your help.

Comment by Oliver Schonrock [ 2018-02-06 ]

marko For what it's worth, once the broken CASCADE DELETE has occurred, a server restart does not restore sanity.

-- server restart performed here
 
Server version: 10.2.12-MariaDB FreeBSD Ports
 
MariaDB [test]> select * from payment_method where member_id = 1;
Empty set (0.00 sec)
 
MariaDB [test]> select * from payment_method;
+----+-----------+-----------------------+
| id | member_id | cardholder_address_id |
+----+-----------+-----------------------+
|  3 |         1 |                  NULL |
+----+-----------+-----------------------+
1 row in set (0.00 sec)
 
MariaDB [test]> select * from member;
Empty set (0.00 sec)

Comment by Oliver Schonrock [ 2018-02-06 ]

marko I had a quick try to see what other methods might detect the problem, so users can see if they have been affected. And then what options exist to fix broken DBs without dump/reload.

CHECK TABLE reports the problem:

MariaDB [test]> check table payment_method;
+---------------------+-------+----------+----------------------------------------------------------------------+
| Table               | Op    | Msg_type | Msg_text                                                             |
+---------------------+-------+----------+----------------------------------------------------------------------+
| test.payment_method | check | Warning  | InnoDB: Index 'payment_method_FI_1' contains 0 entries, should be 1. |
| test.payment_method | check | error    | Corrupt                                                              |
+---------------------+-------+----------+----------------------------------------------------------------------+
2 rows in set (0.00 sec)

and obviously, so does mysqlcheck

$ mysqlcheck -uroot -p test
test.address                                       OK
test.member                                        OK
test.payment_method
Warning  : InnoDB: Index payment_method_FI_1 is marked as corrupted
error    : Corrupt

For InnoDB, the only suggested option for repair is:

If you need to rebuild an InnoDB table because a CHECK TABLE operation indicates that a table upgrade is required, use mysqldump to create a dump file and mysql to reload the file.

What I did on our production DB (not wanting to take it down, dump & insert) is a I manually deleted the orphaned record by its PK (which is hard to know in a real dataset). Having done Some trial and error shows that this manual DELETE is easier if you do a "null alteration" first (which makes CHECK TABLE happy), then run the LEFT JOIN, which now works, and then you can easily clean up the orphans.

 
MariaDB [test]> check table payment_method;
+---------------------+-------+----------+----------------------------------------------------------------------+
| Table               | Op    | Msg_type | Msg_text                                                             |
+---------------------+-------+----------+----------------------------------------------------------------------+
| test.payment_method | check | Warning  | InnoDB: Index 'payment_method_FI_1' contains 0 entries, should be 1. |
| test.payment_method | check | error    | Corrupt                                                              |
+---------------------+-------+----------+----------------------------------------------------------------------+
2 rows in set (0.01 sec)
 
-- "null alteration"
 
MariaDB [test]> ALTER TABLE payment_method ENGINE = InnoDB;
Query OK, 0 rows affected (0.09 sec)
Records: 0  Duplicates: 0  Warnings: 0
 
MariaDB [test]> check table payment_method;
+---------------------+-------+----------+----------+
| Table               | Op    | Msg_type | Msg_text |
+---------------------+-------+----------+----------+
| test.payment_method | check | status   | OK       |
+---------------------+-------+----------+----------+
1 row in set (0.00 sec)
 
-- CHECK TABLE is happy, now we can use LEFT JOIN to find orphans
 
SELECT
  payment_method.id, member_id
FROM
  payment_method
LEFT JOIN
  member ON member.id=payment_method.member_id
WHERE
  payment_method.member_id IS NOT NULL AND member.id IS NULL;
+----+-----------+
| id | member_id |
+----+-----------+
|  3 |         1 |
+----+-----------+
1 row in set (0.00 sec)
 
-- and delete them
 
MariaDB [test]> delete from payment_method where id = 3;
Query OK, 1 row affected (0.02 sec)

Comment by Oliver Schonrock [ 2018-02-23 ]

marko Thank for fixing this so quickly. We have now rolled this out on production via 10.2.13. Works great!

By the time we rolled it out we had dozens of broken FKs on our production DB which we cleaned up by technique I described above.

I still find it astonishing that no-one since 10.2.4 (GA since > 2 years!) had come across this. I suspect there must be quite a few users out there with broken FKs without realising it (since the broken LEFT JOIN hides them).

Generated at Thu Feb 08 08:19:25 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.