Server version: 10.2.6-MariaDB-10.2.6+maria~stretch-log mariadb.org binary distribution
OS: Debian Stretch
Description
While investigating database errors logged in my application, I noticed that some records that reference a different table with an ON DELETE CASCADE constraint had not been deleted. This constraint normally works, this is the first time I've spotted stale rows.
The queries below show stale rows in matchmaking_group_users and matchmaking_group_maps. They both have a matchmaking_group_id that references the column id in the table matchmaking_groups.
MariaDB [esportal]> select * from matchmaking_group_users where matchmaking_group_id notin (select id from matchmaking_groups);
+----------------------+-----------+------------+
| matchmaking_group_id | user_id | microphone |
+----------------------+-----------+------------+
| 278066 | 182715729 | 1 |
+----------------------+-----------+------------+
1 row inset (0.00 sec)
MariaDB [esportal]> select * from matchmaking_group_maps where matchmaking_group_id notin (select id from matchmaking_groups);
+----------------------+--------+----------+
| matchmaking_group_id | map_id | priority |
+----------------------+--------+----------+
| 278066 | 1 | 4 |
| 278066 | 2 | 2 |
| 278066 | 5 | 1 |
| 278066 | 6 | 3 |
| 278066 | 7 | 6 |
| 278066 | 8 | 5 |
+----------------------+--------+----------+
6 rowsinset (0.00 sec)
MariaDB [esportal]> show createtable matchmaking_group_users;
Has SET foreign_key_checks=0 been used at any time?
Has there been any DDL for the two tables close to the UPDATE or DELETE operations that failed to cascade?
Marko Mäkelä
added a comment - Has SET foreign_key_checks=0 been used at any time?
Has there been any DDL for the two tables close to the UPDATE or DELETE operations that failed to cascade?
I do not recall ever using SET foreign_key_checks=0, definitely not anytime between the previous successful deletion and the successful deletion following this event.
The most recent DDL was yesterday, where I dropped a column in a completely unrelated table. I don't recall any DDL related to any of the two tables in the past month, and this is a fairly active database (~60 QPS).
Mark Samman
added a comment - I do not recall ever using SET foreign_key_checks=0, definitely not anytime between the previous successful deletion and the successful deletion following this event.
The most recent DDL was yesterday, where I dropped a column in a completely unrelated table. I don't recall any DDL related to any of the two tables in the past month, and this is a fairly active database (~60 QPS).
Do you have the query cache enabled?
Please provide your cnf file(s) from the instance where the problem occurs.
The error mentions WSREP (Galera), but there is no indication otherwise that you are using a cluster. Are you?
If it's a cluster, do you see the stale records on the same node where DELETE was issued, or on a different one?
Elena Stepanova
added a comment - Do you have the query cache enabled?
Please provide your cnf file(s) from the instance where the problem occurs.
The error mentions WSREP (Galera), but there is no indication otherwise that you are using a cluster. Are you?
If it's a cluster, do you see the stale records on the same node where DELETE was issued, or on a different one?
Yes, query cache is enabled, but I'm not sure if this is caused by the query cache since the reason I detected this was because my application started throwing errors because it couldn't insert a new row for the user id that was already in the table. I suppose "orphan rows" is a more appropriate name than "stale rows" for this issue.
No, I'm not using a cluster.
I've attached my.cnf. The other config files in /etc/mysql/conf.d/ are untouched. I had binlogs disabled when this happened, but was advised to enable it to create a minimal reproducible test case next time this happens. Unfortunately, I haven't spotted the error again since. my.cnf
Mark Samman
added a comment - - edited Yes, query cache is enabled, but I'm not sure if this is caused by the query cache since the reason I detected this was because my application started throwing errors because it couldn't insert a new row for the user id that was already in the table. I suppose "orphan rows" is a more appropriate name than "stale rows" for this issue.
No, I'm not using a cluster.
I've attached my.cnf. The other config files in /etc/mysql/conf.d/ are untouched. I had binlogs disabled when this happened, but was advised to enable it to create a minimal reproducible test case next time this happens. Unfortunately, I haven't spotted the error again since. my.cnf
Thank you. Yes, if you had an actual error on INSERT, it's probably unrelated to the query cache. I was asking because we had a similar problem on SELECTs with query cache.
Could you please also paste the structure for matchmaking_groups table?
Elena Stepanova
added a comment - - edited Thank you. Yes, if you had an actual error on INSERT, it's probably unrelated to the query cache. I was asking because we had a similar problem on SELECTs with query cache.
Could you please also paste the structure for matchmaking_groups table?
SELECTCOUNT(*) FROM information_schema.innodb_trx WHERE trx_state = 'LOCK WAIT'AND trx_query LIKE'%DELETE FROM `matchmaking_groups` WHERE id = 10%'AND trx_mysql_thread_id != CONNECTION_ID();
--source include/wait_condition.inc
COMMIT;
--connection con1
--reap
COMMIT;
--disconnect con1
--connection default
SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOTIN (SELECT id FROM matchmaking_groups);
SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOTIN (SELECT id FROM matchmaking_groups);
#7 0x00007f8339876312 in __GI___assert_fail (assertion=0x7f833cd4d994 "err == DB_SUCCESS", file=0x7f833cd4d3f8 "/data/src/10.2/storage/innobase/row/row0upd.cc", line=2437, function=0x7f833cd507a0 <row_upd_sec_index_entry(upd_node_t*, que_thr_t*)::__PRETTY_FUNCTION__> "dberr_t row_upd_sec_index_entry(upd_node_t*, que_thr_t*)") at assert.c:101
#8 0x00007f833c784608 in row_upd_sec_index_entry (node=0x7f82e403f2f8, thr=0x7f82e403f628) at /data/src/10.2/storage/innobase/row/row0upd.cc:2437
#9 0x00007f833c78493c in row_upd_sec_step (node=0x7f82e403f2f8, thr=0x7f82e403f628) at /data/src/10.2/storage/innobase/row/row0upd.cc:2498
#10 0x00007f833c786eed in row_upd (node=0x7f82e403f2f8, thr=0x7f82e403f628) at /data/src/10.2/storage/innobase/row/row0upd.cc:3282
#11 0x00007f833c787244 in row_upd_step (thr=0x7f82e403f628) at /data/src/10.2/storage/innobase/row/row0upd.cc:3399
#12 0x00007f833c72a56c in row_update_for_mysql_using_upd_graph (mysql_rec=0x7f82e403e368 "\377\n", prebuilt=0x7f82e403e7c8) at /data/src/10.2/storage/innobase/row/row0mysql.cc:1946
#13 0x00007f833c72b08d in row_update_for_mysql (mysql_rec=0x7f82e403e368 "\377\n", prebuilt=0x7f82e403e7c8) at /data/src/10.2/storage/innobase/row/row0mysql.cc:2165
#14 0x00007f833c5ed74f in ha_innobase::delete_row (this=0x7f82e403db78, record=0x7f82e403e368 "\377\n") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:9551
#15 0x00007f833c3a9ab0 in handler::ha_delete_row (this=0x7f82e403db78, buf=0x7f82e403e368 "\377\n") at /data/src/10.2/sql/handler.cc:6015
#16 0x00007f833c535df6 in mysql_delete (thd=0x7f82d8000b00, table_list=0x7f82d8011118, conds=0x7f82d80118c8, order_list=0x7f82d8004f68, limit=18446744073709551615, options=0, result=0x0) at /data/src/10.2/sql/sql_delete.cc:583
#17 0x00007f833c131860 in mysql_execute_command (thd=0x7f82d8000b00) at /data/src/10.2/sql/sql_parse.cc:4613
#18 0x00007f833c13bd88 in mysql_parse (thd=0x7f82d8000b00, rawbuf=0x7f82d8010fb8 "DELETE FROM `matchmaking_groups` WHERE id = 10", length=46, parser_state=0x7f83340db200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7879
#19 0x00007f833c129e34 in dispatch_command (command=COM_QUERY, thd=0x7f82d8000b00, packet=0x7f82d80087d1 "DELETE FROM `matchmaking_groups` WHERE id = 10", packet_length=46, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1817
#20 0x00007f833c128775 in do_command (thd=0x7f82d8000b00) at /data/src/10.2/sql/sql_parse.cc:1362
#21 0x00007f833c2744e7 in do_handle_one_connection (connect=0x7f833fe6f810) at /data/src/10.2/sql/sql_connect.cc:1354
#22 0x00007f833c274274 in handle_one_connection (arg=0x7f833fe6f810) at /data/src/10.2/sql/sql_connect.cc:1260
#23 0x00007f833c5bb922 in pfs_spawn_thread (arg=0x7f833fe92b30) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#24 0x00007f833b74b064 in start_thread (arg=0x7f83340dc700) at pthread_create.c:309
#25 0x00007f833993062d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Not reproducible on 10.1 and MySQL 5.6/5.7.
While fixing, please also get rid of WSREP prefix in this and similar errors, it's confusing and irrelevant.
Elena Stepanova
added a comment - - edited Thanks for the provided info. I was able to reproduce the problem.
Test case
--source include/have_innodb.inc
CREATE TABLE `users` (
`id` int (10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
` name ` varchar (32) NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_groups` (
`id` bigint (20) unsigned NOT NULL AUTO_INCREMENT,
`host_user_id` int (10) unsigned NOT NULL ,
PRIMARY KEY (`id`),
UNIQUE KEY `host_user_id` (`host_user_id`),
CONSTRAINT `matchmaking_groups_ibfk_1` FOREIGN KEY (`host_user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_group_users` (
`matchmaking_group_id` bigint (20) unsigned NOT NULL ,
`user_id` int (10) unsigned NOT NULL ,
PRIMARY KEY (`matchmaking_group_id`,`user_id`),
UNIQUE KEY `user_id` (`user_id`),
CONSTRAINT `matchmaking_group_users_ibfk_1` FOREIGN KEY (`matchmaking_group_id`) REFERENCES `matchmaking_groups` (`id`) ON DELETE CASCADE ON UPDATE CASCADE ,
CONSTRAINT `matchmaking_group_users_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_group_maps` (
`matchmaking_group_id` bigint (20) unsigned NOT NULL ,
`map_id` tinyint(2) unsigned NOT NULL ,
PRIMARY KEY (`matchmaking_group_id`,`map_id`),
CONSTRAINT `matchmaking_group_maps_ibfk_1` FOREIGN KEY (`matchmaking_group_id`) REFERENCES `matchmaking_groups` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `users` VALUES ( NULL , 'foo' ),( NULL , 'bar' );
INSERT INTO `matchmaking_groups` VALUES (10,1),(11,2);
INSERT INTO `matchmaking_group_users` VALUES (10,1),(11,2);
INSERT INTO `matchmaking_group_maps` VALUES (10,55),(11,66);
BEGIN ;
UPDATE users SET name = 'qux' WHERE id = 1;
--connect (con1,localhost,root,,)
BEGIN ;
send
DELETE FROM `matchmaking_groups` WHERE id = 10;
--connection default
let $wait_condition=
SELECT COUNT (*) FROM information_schema.innodb_trx WHERE trx_state = 'LOCK WAIT' AND trx_query LIKE '%DELETE FROM `matchmaking_groups` WHERE id = 10%' AND trx_mysql_thread_id != CONNECTION_ID();
--source include/wait_condition.inc
COMMIT ;
--connection con1
--reap
COMMIT ;
--disconnect con1
--connection default
SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOT IN ( SELECT id FROM matchmaking_groups);
SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOT IN ( SELECT id FROM matchmaking_groups);
DROP TABLE `matchmaking_group_maps`, `matchmaking_group_users`, `matchmaking_groups`, `users`;
Result on a non-debug build
disconnect con1;
connection default;
SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
matchmaking_group_id user_id
10 1
SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
matchmaking_group_id map_id
10 55
DROP TABLE `matchmaking_group_maps`, `matchmaking_group_users`, `matchmaking_groups`, `users`;
bug.t3 'innodb' [ fail ] Found warnings/errors in server log file!
Test ended at 2017-07-16 19:44:23
line
2017-07-16 19:44:22 140291494237952 [ERROR] InnoDB: WSREP: referenced FK check fail: 15 index host_user_id table test/matchmaking_groups
^ Found warnings in /data/bld/10.2-rel/mysql-test/var/log/mysqld.1.err
Assertion failure on 10.2 debug a76c05bba060f4da5ec7c56e5634eaafb6d2ebfe
2017-07-16 19:45:48 140201490761472 [ERROR] InnoDB: WSREP: referenced FK check fail: 15 index host_user_id table test/matchmaking_groups
mysqld: /data/src/10.2/storage/innobase/row/row0upd.cc:2437: dberr_t row_upd_sec_index_entry(upd_node_t*, que_thr_t*): Assertion `err == DB_SUCCESS' failed.
170716 19:45:48 [ERROR] mysqld got signal 6 ;
#7 0x00007f8339876312 in __GI___assert_fail (assertion=0x7f833cd4d994 "err == DB_SUCCESS", file=0x7f833cd4d3f8 "/data/src/10.2/storage/innobase/row/row0upd.cc", line=2437, function=0x7f833cd507a0 <row_upd_sec_index_entry(upd_node_t*, que_thr_t*)::__PRETTY_FUNCTION__> "dberr_t row_upd_sec_index_entry(upd_node_t*, que_thr_t*)") at assert.c:101
#8 0x00007f833c784608 in row_upd_sec_index_entry (node=0x7f82e403f2f8, thr=0x7f82e403f628) at /data/src/10.2/storage/innobase/row/row0upd.cc:2437
#9 0x00007f833c78493c in row_upd_sec_step (node=0x7f82e403f2f8, thr=0x7f82e403f628) at /data/src/10.2/storage/innobase/row/row0upd.cc:2498
#10 0x00007f833c786eed in row_upd (node=0x7f82e403f2f8, thr=0x7f82e403f628) at /data/src/10.2/storage/innobase/row/row0upd.cc:3282
#11 0x00007f833c787244 in row_upd_step (thr=0x7f82e403f628) at /data/src/10.2/storage/innobase/row/row0upd.cc:3399
#12 0x00007f833c72a56c in row_update_for_mysql_using_upd_graph (mysql_rec=0x7f82e403e368 "\377\n", prebuilt=0x7f82e403e7c8) at /data/src/10.2/storage/innobase/row/row0mysql.cc:1946
#13 0x00007f833c72b08d in row_update_for_mysql (mysql_rec=0x7f82e403e368 "\377\n", prebuilt=0x7f82e403e7c8) at /data/src/10.2/storage/innobase/row/row0mysql.cc:2165
#14 0x00007f833c5ed74f in ha_innobase::delete_row (this=0x7f82e403db78, record=0x7f82e403e368 "\377\n") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:9551
#15 0x00007f833c3a9ab0 in handler::ha_delete_row (this=0x7f82e403db78, buf=0x7f82e403e368 "\377\n") at /data/src/10.2/sql/handler.cc:6015
#16 0x00007f833c535df6 in mysql_delete (thd=0x7f82d8000b00, table_list=0x7f82d8011118, conds=0x7f82d80118c8, order_list=0x7f82d8004f68, limit=18446744073709551615, options=0, result=0x0) at /data/src/10.2/sql/sql_delete.cc:583
#17 0x00007f833c131860 in mysql_execute_command (thd=0x7f82d8000b00) at /data/src/10.2/sql/sql_parse.cc:4613
#18 0x00007f833c13bd88 in mysql_parse (thd=0x7f82d8000b00, rawbuf=0x7f82d8010fb8 "DELETE FROM `matchmaking_groups` WHERE id = 10", length=46, parser_state=0x7f83340db200, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7879
#19 0x00007f833c129e34 in dispatch_command (command=COM_QUERY, thd=0x7f82d8000b00, packet=0x7f82d80087d1 "DELETE FROM `matchmaking_groups` WHERE id = 10", packet_length=46, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1817
#20 0x00007f833c128775 in do_command (thd=0x7f82d8000b00) at /data/src/10.2/sql/sql_parse.cc:1362
#21 0x00007f833c2744e7 in do_handle_one_connection (connect=0x7f833fe6f810) at /data/src/10.2/sql/sql_connect.cc:1354
#22 0x00007f833c274274 in handle_one_connection (arg=0x7f833fe6f810) at /data/src/10.2/sql/sql_connect.cc:1260
#23 0x00007f833c5bb922 in pfs_spawn_thread (arg=0x7f833fe92b30) at /data/src/10.2/storage/perfschema/pfs.cc:1862
#24 0x00007f833b74b064 in start_thread (arg=0x7f83340dc700) at pthread_create.c:309
#25 0x00007f833993062d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
Not reproducible on 10.1 and MySQL 5.6/5.7.
While fixing, please also get rid of WSREP prefix in this and similar errors, it's confusing and irrelevant.
elenst, I run your test against my MDEV-13498 fix, and it no longer crashes the debug build. But the condition timed out like this:
connect con1,localhost,root,,;
BEGIN;
DELETE FROM `matchmaking_groups` WHERE id = 10;
connection default;
Timeout in wait_condition.inc for SELECT COUNT(*) FROM information_schema.innodb_trx WHERE trx_state = 'LOCK WAIT' AND trx_query LIKE '%DELETE FROM `matchmaking_groups` WHERE id = 10%' AND trx_mysql_thread_id != CONNECTION_ID()
Id User Host db Command Time State Info Progress
2 system user NULL Daemon NULL NULL 0.000
1 system user NULL Daemon NULL NULL 0.000
3 system user NULL Daemon NULL NULL 0.000
4 system user NULL Daemon NULL NULL 0.000
5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000
9 root localhost test Query 0 init show full processlist 0.000
10 root localhost test Sleep 31 NULL 0.000
COMMIT;
connection con1;
COMMIT;
disconnect con1;
connection default;
SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
matchmaking_group_id user_id
SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
matchmaking_group_id map_id
DROP TABLE `matchmaking_group_maps`, `matchmaking_group_users`, `matchmaking_groups`, `users`;
As you can see from the SELECT result, the ON DELETE CASCADE was properly processed.
Without the fix, I am getting the assertion failure:
2017-08-14 16:23:49 140048273094400 [ERROR] InnoDB: WSREP: referenced FK check fail: 15 index host_user_id table test/matchmaking_groups
So, this bug seems to have been fixed by my MDEV-13498 fix.
Can you revise the test so that it uses a condition that will not time out? I’d then include it in the MDEV-13498 fix.
Marko Mäkelä
added a comment - elenst , I run your test against my MDEV-13498 fix, and it no longer crashes the debug build. But the condition timed out like this:
connect con1,localhost,root,,;
BEGIN;
DELETE FROM `matchmaking_groups` WHERE id = 10;
connection default;
Timeout in wait_condition.inc for SELECT COUNT(*) FROM information_schema.innodb_trx WHERE trx_state = 'LOCK WAIT' AND trx_query LIKE '%DELETE FROM `matchmaking_groups` WHERE id = 10%' AND trx_mysql_thread_id != CONNECTION_ID()
Id User Host db Command Time State Info Progress
2 system user NULL Daemon NULL NULL 0.000
1 system user NULL Daemon NULL NULL 0.000
3 system user NULL Daemon NULL NULL 0.000
4 system user NULL Daemon NULL NULL 0.000
5 system user NULL Daemon NULL InnoDB shutdown handler NULL 0.000
9 root localhost test Query 0 init show full processlist 0.000
10 root localhost test Sleep 31 NULL 0.000
COMMIT;
connection con1;
COMMIT;
disconnect con1;
connection default;
SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
matchmaking_group_id user_id
SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOT IN (SELECT id FROM matchmaking_groups);
matchmaking_group_id map_id
DROP TABLE `matchmaking_group_maps`, `matchmaking_group_users`, `matchmaking_groups`, `users`;
As you can see from the SELECT result, the ON DELETE CASCADE was properly processed.
Without the fix, I am getting the assertion failure:
2017-08-14 16:23:49 140048273094400 [ERROR] InnoDB: WSREP: referenced FK check fail: 15 index host_user_id table test/matchmaking_groups
mysqld: /mariadb/10.2/storage/innobase/row/row0upd.cc:2468: dberr_t row_upd_sec_index_entry(upd_node_t *, que_thr_t *): Assertion `err == DB_SUCCESS' failed.
170814 16:23:49 [ERROR] mysqld got signal 6 ;
So, this bug seems to have been fixed by my MDEV-13498 fix.
Can you revise the test so that it uses a condition that will not time out? I’d then include it in the MDEV-13498 fix.
marko, I will amend the test, it's clear why it fails; but first let's make sure that the code actually works as expected now, I'm not sure about that.
The relevant part of the test does the following:
1. Connection 1 starts transaction and updates a row X in table 1;
2. Connection 2 deletes a row from table 2, which references table 1, and the operation should cascade-delete row X from the table 1.
Without the fix (apart from the assertion failure, e.g. on a release build, or on 10.1), the DELETE query in connection 2 starts waiting, the process list says
10 root localhost test Query 1 updating DELETE FROM `matchmaking_groups` WHERE id = 10
and information_schema.innodb_trx says
1299 LOCK WAIT 2017-08-14 17:22:05 1299:4:3:2 2017-08-14 17:22:05 15 10 DELETE FROM `matchmaking_groups` WHERE id = 10 updating or deleting 36 14 1136 8 1 0 REPEATABLE READ 1 1 NULL 0 0 0
which is what the test waits for.
After the fix, the DELETE query succeeds immediately – no waiting, no transaction waiting for a lock.
It seems strange. Is it expected?
Elena Stepanova
added a comment - marko , I will amend the test, it's clear why it fails; but first let's make sure that the code actually works as expected now, I'm not sure about that.
The relevant part of the test does the following:
1. Connection 1 starts transaction and updates a row X in table 1;
2. Connection 2 deletes a row from table 2, which references table 1, and the operation should cascade-delete row X from the table 1.
Without the fix (apart from the assertion failure, e.g. on a release build, or on 10.1), the DELETE query in connection 2 starts waiting, the process list says
10 root localhost test Query 1 updating DELETE FROM `matchmaking_groups` WHERE id = 10
and information_schema.innodb_trx says
1299 LOCK WAIT 2017-08-14 17:22:05 1299:4:3:2 2017-08-14 17:22:05 15 10 DELETE FROM `matchmaking_groups` WHERE id = 10 updating or deleting 36 14 1136 8 1 0 REPEATABLE READ 1 1 NULL 0 0 0
which is what the test waits for.
After the fix, the DELETE query succeeds immediately – no waiting, no transaction waiting for a lock.
It seems strange. Is it expected?
elenst, I changed the SELECT queries at the end of the test, getting the following result:
SELECT * FROM matchmaking_group_users;
matchmaking_group_id user_id
11 2
SELECT * FROM matchmaking_group_maps;
matchmaking_group_id map_id
11 66
SELECT * FROM users;
id name
1 qux
2 bar
DROP TABLE `matchmaking_group_maps`, `matchmaking_group_users`, `matchmaking_groups`, `users`;
Now, what do we actually expect from the test?
At the time of the DELETE operation, there is one active transaction in the system, holding a lock on the users record (1,'foo') which it updated to ('1','qux'). Because this UPDATE statement is only changing the non-indexed column users.name, it can perform this operation in-place, while holding an exclusive lock on this record. There is no need to acquire any locks on the FOREIGN KEY child tables either.
What should the DELETE do? It must delete the matchmaking_groups record (10,1), prompting an ON DELETE CASCADE to the matchmaking_group_maps record (10,55) and the matchmaking_group_users record (10,1). According to the SELECT statements, all these records were deleted. The DELETE should not cascade to the users table, because that table does not contain any CONSTRAINT FOREIGN KEY…ON DELETE CASCADE that would instruct so.
The result looks perfectly valid to me.
Maybe earlier MariaDB versions did some unnecessary locking in the FOREIGN KEY processing. For example, the UPDATE could lock non-matching rows. (Not locking non-matching rows was the ‘semi-consistent read’ feature that I implemented in MySQL 5.0 as MySQL Bug #3300.) Perhaps this feature is not enabled by default before MariaDB 10.2? I seem to remember that MySQL 5.7 introduced some change to the innodb_locks_unsafe_for_binlog parameter, or to the related logic. Perhaps the relaxed locking is now enabled in the default TRANSACTION ISOLATION LEVEL REPEATABLE READ?
Can you determine what locking conflict would emerge on 10.0 or 10.1, and if it is possible to get the same result by setting innodb_locks_unsafe_for_binlog=1?
Marko Mäkelä
added a comment - elenst , I changed the SELECT queries at the end of the test, getting the following result:
SELECT * FROM matchmaking_group_users;
matchmaking_group_id user_id
11 2
SELECT * FROM matchmaking_group_maps;
matchmaking_group_id map_id
11 66
SELECT * FROM users;
id name
1 qux
2 bar
DROP TABLE `matchmaking_group_maps`, `matchmaking_group_users`, `matchmaking_groups`, `users`;
Now, what do we actually expect from the test?
At the time of the DELETE operation, there is one active transaction in the system, holding a lock on the users record (1,'foo') which it updated to ('1','qux'). Because this UPDATE statement is only changing the non-indexed column users.name, it can perform this operation in-place, while holding an exclusive lock on this record. There is no need to acquire any locks on the FOREIGN KEY child tables either.
What should the DELETE do? It must delete the matchmaking_groups record (10,1), prompting an ON DELETE CASCADE to the matchmaking_group_maps record (10,55) and the matchmaking_group_users record (10,1). According to the SELECT statements, all these records were deleted. The DELETE should not cascade to the users table, because that table does not contain any CONSTRAINT FOREIGN KEY…ON DELETE CASCADE that would instruct so.
The result looks perfectly valid to me.
Maybe earlier MariaDB versions did some unnecessary locking in the FOREIGN KEY processing. For example, the UPDATE could lock non-matching rows. (Not locking non-matching rows was the ‘semi-consistent read’ feature that I implemented in MySQL 5.0 as MySQL Bug #3300 .) Perhaps this feature is not enabled by default before MariaDB 10.2? I seem to remember that MySQL 5.7 introduced some change to the innodb_locks_unsafe_for_binlog parameter, or to the related logic. Perhaps the relaxed locking is now enabled in the default TRANSACTION ISOLATION LEVEL REPEATABLE READ?
Can you determine what locking conflict would emerge on 10.0 or 10.1, and if it is possible to get the same result by setting innodb_locks_unsafe_for_binlog=1?
The DELETE should not cascade to the users table, because that table does not contain any CONSTRAINT FOREIGN KEY…ON DELETE CASCADE that would instruct so.
The result looks perfectly valid to me.
Sorry, you're right of course. I mixed up the semantics of referenced/referencing tables while writing the previous comment; and the test case was written based on the existing behavior rather than on expectations.
Can you determine what locking conflict would emerge on 10.0 or 10.1, and if it is possible to get the same result by setting innodb_locks_unsafe_for_binlog=1?
If you mean the DELETE command waiting on a lock after UPDATE, this locking behavior happens on 10.1, with and without innodb_locks_unsafe_for_binlog=1.
10.0 is not affected.
Elena Stepanova
added a comment - marko ,
The DELETE should not cascade to the users table, because that table does not contain any CONSTRAINT FOREIGN KEY…ON DELETE CASCADE that would instruct so.
The result looks perfectly valid to me.
Sorry, you're right of course. I mixed up the semantics of referenced/referencing tables while writing the previous comment; and the test case was written based on the existing behavior rather than on expectations.
Can you determine what locking conflict would emerge on 10.0 or 10.1, and if it is possible to get the same result by setting innodb_locks_unsafe_for_binlog=1?
If you mean the DELETE command waiting on a lock after UPDATE, this locking behavior happens on 10.1, with and without innodb_locks_unsafe_for_binlog=1 .
10.0 is not affected.
10.0 - passes
10.1 - fails with the timeout at line 44 (because of the locking problem)
10.2 debug - fails with the assertion failure from this bug report
10.2 non-debug - fails with wrong result from this bug report (it does not fail with timeout, possibly related to MDEV-13331)
bb-10.2-marko - passes
Elena Stepanova
added a comment - I suggest to amend the test this way:
Test
--source include/have_innodb.inc
--enable_connect_log
CREATE TABLE `users` (
`id` int (10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
` name ` varchar (32) NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_groups` (
`id` bigint (20) unsigned NOT NULL AUTO_INCREMENT,
`host_user_id` int (10) unsigned NOT NULL ,
PRIMARY KEY (`id`),
UNIQUE KEY `host_user_id` (`host_user_id`),
CONSTRAINT `matchmaking_groups_ibfk_1` FOREIGN KEY (`host_user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_group_users` (
`matchmaking_group_id` bigint (20) unsigned NOT NULL ,
`user_id` int (10) unsigned NOT NULL ,
PRIMARY KEY (`matchmaking_group_id`,`user_id`),
UNIQUE KEY `user_id` (`user_id`),
CONSTRAINT `matchmaking_group_users_ibfk_1` FOREIGN KEY (`matchmaking_group_id`) REFERENCES `matchmaking_groups` (`id`) ON DELETE CASCADE ON UPDATE CASCADE ,
CONSTRAINT `matchmaking_group_users_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_group_maps` (
`matchmaking_group_id` bigint (20) unsigned NOT NULL ,
`map_id` tinyint(2) unsigned NOT NULL ,
PRIMARY KEY (`matchmaking_group_id`,`map_id`),
CONSTRAINT `matchmaking_group_maps_ibfk_1` FOREIGN KEY (`matchmaking_group_id`) REFERENCES `matchmaking_groups` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `users` VALUES ( NULL , 'foo' ),( NULL , 'bar' );
INSERT INTO `matchmaking_groups` VALUES (10,1),(11,2);
INSERT INTO `matchmaking_group_users` VALUES (10,1),(11,2);
INSERT INTO `matchmaking_group_maps` VALUES (10,55),(11,66);
BEGIN ;
UPDATE users SET name = 'qux' WHERE id = 1;
--connect (con1,localhost,root,,)
SET innodb_lock_wait_timeout= 1;
DELETE FROM `matchmaking_groups` WHERE id = 10;
--connection default
COMMIT ;
--disconnect con1
--connection default
--sorted_result
SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOT IN ( SELECT id FROM matchmaking_groups);
--sorted_result
SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOT IN ( SELECT id FROM matchmaking_groups);
--sorted_result
SELECT * FROM users;
DROP TABLE `matchmaking_group_maps`, `matchmaking_group_users`, `matchmaking_groups`, `users`;
Result
CREATE TABLE `users` (
`id` int (10) unsigned NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`id`),
` name ` varchar (32) NOT NULL DEFAULT ''
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_groups` (
`id` bigint (20) unsigned NOT NULL AUTO_INCREMENT,
`host_user_id` int (10) unsigned NOT NULL ,
PRIMARY KEY (`id`),
UNIQUE KEY `host_user_id` (`host_user_id`),
CONSTRAINT `matchmaking_groups_ibfk_1` FOREIGN KEY (`host_user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_group_users` (
`matchmaking_group_id` bigint (20) unsigned NOT NULL ,
`user_id` int (10) unsigned NOT NULL ,
PRIMARY KEY (`matchmaking_group_id`,`user_id`),
UNIQUE KEY `user_id` (`user_id`),
CONSTRAINT `matchmaking_group_users_ibfk_1` FOREIGN KEY (`matchmaking_group_id`) REFERENCES `matchmaking_groups` (`id`) ON DELETE CASCADE ON UPDATE CASCADE ,
CONSTRAINT `matchmaking_group_users_ibfk_2` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TABLE `matchmaking_group_maps` (
`matchmaking_group_id` bigint (20) unsigned NOT NULL ,
`map_id` tinyint(2) unsigned NOT NULL ,
PRIMARY KEY (`matchmaking_group_id`,`map_id`),
CONSTRAINT `matchmaking_group_maps_ibfk_1` FOREIGN KEY (`matchmaking_group_id`) REFERENCES `matchmaking_groups` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `users` VALUES ( NULL , 'foo' ),( NULL , 'bar' );
INSERT INTO `matchmaking_groups` VALUES (10,1),(11,2);
INSERT INTO `matchmaking_group_users` VALUES (10,1),(11,2);
INSERT INTO `matchmaking_group_maps` VALUES (10,55),(11,66);
BEGIN ;
UPDATE users SET name = 'qux' WHERE id = 1;
connect con1,localhost,root,,;
SET innodb_lock_wait_timeout= 1;
DELETE FROM `matchmaking_groups` WHERE id = 10;
connection default ;
COMMIT ;
disconnect con1;
connection default ;
SELECT * FROM matchmaking_group_users WHERE matchmaking_group_id NOT IN ( SELECT id FROM matchmaking_groups);
matchmaking_group_id user_id
SELECT * FROM matchmaking_group_maps WHERE matchmaking_group_id NOT IN ( SELECT id FROM matchmaking_groups);
matchmaking_group_id map_id
SELECT * FROM users;
id name
1 qux
2 bar
DROP TABLE `matchmaking_group_maps`, `matchmaking_group_users`, `matchmaking_groups`, `users`;
Currently the test works this way:
10.0 - passes
10.1 - fails with the timeout at line 44 (because of the locking problem)
10.2 debug - fails with the assertion failure from this bug report
10.2 non-debug - fails with wrong result from this bug report (it does not fail with timeout, possibly related to MDEV-13331 )
bb-10.2-marko - passes
Has SET foreign_key_checks=0 been used at any time?
Has there been any DDL for the two tables close to the UPDATE or DELETE operations that failed to cascade?