[MDEV-13498] DELETE with CASCADE constraints takes long time and then crashes the instance with Signal 6 Created: 2017-08-11  Updated: 2020-08-25  Resolved: 2017-08-15

Status: Closed
Project: MariaDB Server
Component/s: Data Manipulation - Delete, Storage Engine - InnoDB
Affects Version/s: 10.2.2, 10.2.7, 10.2
Fix Version/s: 10.2.8

Type: Bug Priority: Major
Reporter: Claudio Nanni Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None
Environment:

linux


Attachments: File mdev13498.test.gz     File test-case.sql.gz    
Issue Links:
Blocks
blocks MDEV-13678 DELETE with CASCADE takes a long time... Closed
Relates
relates to MDEV-13246 Stale rows despite ON DELETE CASCADE ... Closed
relates to MDEV-15611 Due to the failure of foreign key det... Closed

 Description   

Running DELETE on a table with a single ROW which causes a cascade constraint to trigger hangs for long time (about 20 minutes in the reported case) and then crashes the instance with:

InnoDB: ###### Diagnostic info printed to the standard error stream
2017-08-10 19:08:35 140481422710528 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server becau
se it appears to be hung.
170810 19:08:35 [ERROR] mysqld got signal 6 ;

To reproduce:

1) Import attached dump

2) Execute:

DELETE FROM BRc273a13dfda91b1add89;

3) Wait ~20 minutes

On 10.1 the same completes successfully in 15 seconds.



 Comments   
Comment by Alice Sherepa [ 2017-08-11 ]

Thanks for the report!
Reproducible on versions >=10.2.2
error log (10.2.7):

============================
InnoDB: ###### Diagnostic info printed to the standard error stream
2017-08-11 14:50:48 140259762132736 [Warning] InnoDB: A long semaphore wait:
--Thread 140259778918144 has waited at srv0srv.cc line 2144 for 923.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x48f2a10 created in file dict0dict.cc line 1112
a writer (thread id 140259778918144) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file row0ins.cc line 1469
Last time write locked in file srv0srv.cc line 2144
2017-08-11 14:50:48 140259762132736 [Note] InnoDB: A semaphore wait:
--Thread 140259778918144 has waited at srv0srv.cc line 2144 for 923.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x48f2a10 created in file dict0dict.cc line 1112
a writer (thread id 140259778918144) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file row0ins.cc line 1469
Last time write locked in file srv0srv.cc line 2144
InnoDB: ###### Starts InnoDB Monitor for 30 secs to print diagnostic info:
InnoDB: Pending reads 0, writes 0
 
=====================================
2017-08-11 14:51:06 0x7f90c4cc4700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 197 srv_active, 0 srv_shutdown, 3187 srv_idle
srv_master_thread log flush and writes: 3383
----------
SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 73807
--Thread 140259787310848 has waited at buf0flu.cc line 1248 for 0.00 seconds the semaphore:
SX-lock on RW-latch at 0x7f90cbf04540 created in file buf0buf.cc line 1471
a writer (thread id 140260423001856) has reserved it in mode  exclusive
number of readers 0, waiters flag 1, lock_word: 0
Last time read locked in file row0ins.cc line 1475
Last time write locked in file row0upd.cc line 3077
--Thread 140259778918144 has waited at srv0srv.cc line 2144 for 941.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0x48f2a10 created in file dict0dict.cc line 1112
a writer (thread id 140259778918144) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file row0ins.cc line 1469
Last time write locked in file srv0srv.cc line 2144
OS WAIT ARRAY INFO: signal count 172129
RW-shared spins 0, rounds 150450, OS waits 12867
RW-excl spins 0, rounds 777537, OS waits 10007
RW-sx spins 97496, rounds 1930979, OS waits 45357
Spin rounds per wait: 150450.00 RW-shared, 777537.00 RW-excl, 19.81 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 8856
Purge done for trx's n:o < 8854 undo n:o < 0 state: running but idle
History list length 12
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421735400322840, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 8855, ACTIVE 995 sec updating or deleting
mysql tables in use 3, locked 3
27689 lock struct(s), heap size 3317968, 1563936 row lock(s), undo log entries 264206
MySQL thread id 9, OS thread handle 0x7f90ecb06700, query id 620 localhost root updating
DELETE FROM BRc273a13dfda91b1add89
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (read thread)
I/O thread 5 state: waiting for completed aio requests (read thread)
I/O thread 6 state: waiting for completed aio requests (write thread)
I/O thread 7 state: waiting for completed aio requests (write thread)
I/O thread 8 state: waiting for completed aio requests (write thread)
I/O thread 9 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0, 0, 0] , aio writes: [0, 0, 0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
20376 OS file reads, 68064 OS file writes, 15463 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 13.20 writes/s, 7.65 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 546, seg size 548, 723 merges
merged operations:
 insert 0, delete mark 176754, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 34679, node heap has 4 buffer(s)
Hash table size 34679, node heap has 4 buffer(s)
Hash table size 34679, node heap has 9 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 1 buffer(s)
Hash table size 34679, node heap has 208 buffer(s)
Hash table size 34679, node heap has 3 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 4418308559
Log flushed up to   4418298159
Pages flushed up to 4418284524
Last checkpoint at  4384550662
0 pending log flushes, 0 pending chkp writes
5927 log i/o's done, 2.90 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 137560064
Dictionary memory allocated 118116
Buffer pool size   8192
Free buffers       0
Database pages     7759
Old database pages 2844
Modified db pages  2
Percent of dirty pages(LRU & free pages): 0.026
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 2, single page 0
Pages made young 27108, not young 3400054
2.80 youngs/s, 0.00 non-youngs/s
Pages read 20338, created 24551, written 58406
0.00 reads/s, 1.85 creates/s, 8.45 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 4 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 7759, unzip_LRU len: 0
I/O sum[6879]:cur[151], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=8894, Main thread ID=140259778918144, state: enforcing dict cache limit
Number of rows inserted 1678464, updated 0, deleted 988977, read 4
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
InnoDB: ###### Diagnostic info printed to the standard error stream
2017-08-11 14:51:18 140259762132736 [ERROR] [FATAL] InnoDB: Semaphore wait has lasted > 600 seconds. We intentionally crash the server because it appears to be hung.
170811 14:51:18 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.7-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=7
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467207 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/10.2.7/bin/mysqld(my_print_stacktrace+0x2e)[0xddc02e]
/10.2.7/bin/mysqld(handle_fatal_signal+0x444)[0x7d5984]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f90f1b2f390]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x38)[0x7f90f08b8428]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x16a)[0x7f90f08ba02a]
/10.2.7/bin/mysqld[0xba63e9]
/10.2.7/bin/mysqld[0xb4f1b0]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f90f1b256ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f90f098a3dd]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.

Comment by Elena Stepanova [ 2017-08-12 ]

There is no need to wait for 20 minutes, we have innodb-fatal-semaphore-wait-threshold=N option (600 seconds by default), if it's set to a lower value, the failure will happen earlier. Also, if you set it to a bigger value, you won't get the failure at all – the query doesn't hang or anything, it's just extremely slow. If it's given enough time, like ~20 min on my machine, it eventually deletes everything and succeeds (it still throws intermediate "long semaphore" warnings with increasing wait, they are just not fatal if the threshold is high). If another connection with READ UNCOMMITTED checks the contents of the referencing tables, it's obvious that the rows keep being deleted. It does have to cascade-delete ~800K rows in two tables, but it's not an excuse for taking 20 minutes – on unaffected versions it finishes in ~1 minute.

An MTR-friendly test case is attached. It's pretty much the same test as in the initial attachment, only unrelated tables are removed, the contents re-formatted for MTR, and tables are renamed to t1, t2, t3. Run it with --mysqld=--innodb-fatal-semaphore-wait-threshold=60 to get the failure in reasonable time.

The test case can be reduced, but since it takes different time on different builds and machines, it's unclear how much data is safe to remove until the test becomes unreliable.

10.2 is affected, including the latest build. 10.1, 5.5, MySQL 5.6, MySQL 5.7 are not affected.

5.7.18 release

MySQL [test]> DELETE FROM BRc273a13dfda91b1add89;
Query OK, 1 row affected (1 min 10.11 sec)

5.7.19 release

MySQL [test]> DELETE FROM BRc273a13dfda91b1add89;
Query OK, 1 row affected (1 min 11.06 sec)

5.6.37 release

MySQL [test]> DELETE FROM BRc273a13dfda91b1add89;
Query OK, 1 row affected (1 min 15.41 sec)

10.1.26 release

MariaDB [test]> DELETE FROM BRc273a13dfda91b1add89;
Query OK, 1 row affected (32.03 sec)

10.2.8 pre-release

MariaDB [test]> DELETE FROM BRc273a13dfda91b1add89;
Query OK, 1 row affected (18 min 23.64 sec)

Comment by Claudio Nanni [ 2017-08-12 ]

I need to wait around 20 minutes even with default 600 seconds for innodb_fatal_semaphore_wait_threshold.

MariaDB [i17]> select now(); DELETE FROM BRc273a13dfd; select now();
+---------------------+
| now()               |
+---------------------+
| 2017-08-12 23:22:03 |
+---------------------+
1 row in set (0.00 sec)

ERROR 2013 (HY000): Lost connection to MySQL server during query
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
ERROR 2002 (HY000): Can't connect to MySQL server on '127.0.0.1' (115)
ERROR: Can't connect to the server
 
unknown [i17]> 
unknown [i17]> Bye
[claudio@tossanc ~]$ date
Sat Aug 12 23:44:28 CEST 2017

If you look at the SHOW ENGINE INNODB STATUS just before the crash you see transaction time to be close to real time, semaphore wait being less than real time, but more than 600 seconds:

SEMAPHORES
----------
OS WAIT ARRAY INFO: reservation count 58240
--Thread 139979397363456 has waited at srv0srv.cc line 2144 for 951.00 seconds the semaphore:
X-lock (wait_ex) on RW-latch at 0xb628ee950 created in file dict0dict.cc line 1112
a writer (thread id 139979397363456) has reserved it in mode  wait exclusive
number of readers 1, waiters flag 0, lock_word: ffffffffffffffff
Last time read locked in file row0ins.cc line 1469
Last time write locked in file srv0srv.cc line 2144
OS WAIT ARRAY INFO: signal count 58121
RW-shared spins 0, rounds 1723, OS waits 684
RW-excl spins 0, rounds 92894, OS waits 3775
RW-sx spins 53576, rounds 1606747, OS waits 53534
Spin rounds per wait: 1723.00 RW-shared, 92894.00 RW-excl, 29.99 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 12284
Purge done for trx's n:o < 12284 undo n:o < 0 state: running but idle
History list length 0
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 421457560813080, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 421457560812184, not started
0 lock struct(s), heap size 1136, 0 row lock(s)
---TRANSACTION 12275, ACTIVE 1322 sec updating or deleting
mysql tables in use 3, locked 3
27677 lock struct(s), heap size 3317968, 1563932 row lock(s), undo log entries 405753
MySQL thread id 11, OS thread handle 0x7f503c04f700, query id 333 localhost 127.0.0.1 root updating
DELETE FROM BRc273a13dfd

Comment by Elena Stepanova [ 2017-08-12 ]

I need to wait around 20 minutes even with default 600 seconds for innodb_fatal_semaphore_wait_threshold.

Right, because the problem, and hence the counting, apparently doesn't start right away, and also timing is not absolutely precise (that is, InnoDB doesn't abort immediately when the timeout is exceeded).
Still, if you decrease the threshold, you'll get the problem faster – still not exactly in N seconds if you set the value to N, but in N+something.

Comment by Marko Mäkelä [ 2017-08-14 ]

I did not observe a hang, but only slowness, when running the test case on a debug binary.

sudo perf top -g

pointed me out a very costly std::find() call, in a condition was being evaluated even though the Galera cluster is not enabled!
There is a similar condition in 10.1, but using a simpler predicate. The std::find() call was introduced in MariaDB 10.2.2 by a commit whose purpose was to merge changes between MySQL 5.7.9 and 5.7.14.
I would propose the following patch. It avoids the costly std::find() call when the Galera cluster is not enabled, and it also removes some redundant comparisons.

@@ -2414,29 +2414,24 @@ row_upd_sec_index_entry(
 		row_ins_sec_index_entry() below */
 		if (!rec_get_deleted_flag(
 			    rec, dict_table_is_comp(index->table))) {
-
-#ifdef WITH_WSREP
-			que_node_t *parent = que_node_get_parent(node);
-#endif
 			err = btr_cur_del_mark_set_sec_rec(
 				flags, btr_cur, TRUE, thr, &mtr);
 			if (err != DB_SUCCESS) {
 				break;
 			}
 #ifdef WITH_WSREP
-			if (err == DB_SUCCESS && !referenced                  &&
-			    !(parent && que_node_get_type(parent) ==
-				QUE_NODE_UPDATE                               &&
-			      (std::find(((upd_node_t*)parent)->cascade_upd_nodes->begin(),
-                                         ((upd_node_t*)parent)->cascade_upd_nodes->end(),
-                                         node) ==
-                               ((upd_node_t*)parent)->cascade_upd_nodes->end()))  &&
-                            foreign
-                        ) {
-				ulint*	offsets =
-					rec_get_offsets(
-						rec, index, NULL, ULINT_UNDEFINED,
-						&heap);
+			upd_node_t *parent = static_cast<upd_node_t*>(
+				que_node_get_parent(node));
+			if (!referenced && foreign
+			    && que_node_get_type(parent) == QUE_NODE_UPDATE
+			    && wsrep_on(trx->mysql_thd)
+			    && std::find(parent->cascade_upd_nodes->begin(),
+					 parent->cascade_upd_nodes->end(),
+					 node)
+			    == parent->cascade_upd_nodes->end()) {
+				ulint*	offsets = rec_get_offsets(
+					rec, index, NULL, ULINT_UNDEFINED,
+					&heap);
 
 				err = wsrep_row_upd_check_foreign_constraints(
 					node, &pcur, index->table,
@@ -2450,19 +2445,19 @@ row_upd_sec_index_entry(
 				case DB_DEADLOCK:
 					if (wsrep_debug) {
 						ib::warn() << "WSREP: sec index FK check fail for deadlock"
-							   << " index " << index->name()
-							   << " table " << index->table->name.m_name;
+							   << " index " << index->name
+							   << " table " << index->table->name;
 					}
 					break;
 				default:
-					ib::error() << "WSREP: referenced FK check fail: " << err
-						    << " index " << index->name()
-						    << " table " << index->table->name.m_name;
+					ib::error() << "WSREP: referenced FK check fail: " << ut_strerr(err)
+						    << " index " << index->name
+						    << " table " << index->table->name;
 
 					break;
 				}
-			}
 #endif /* WITH_WSREP */
+			}
 		}
 
 		ut_ad(err == DB_SUCCESS);

Even with this patch applied, the test case takes a very long time (several minutes, and counting) on a non-debug server. So, I wonder if further fixes are needed.

Comment by Marko Mäkelä [ 2017-08-14 ]

bb-10.2-marko
My commit comment describes why I believe that these conditions were not only slow but also wrong.
I do not know if the introduced predicate wsrep_must_process_fk() is correct; I hope seppo can review it and point me to the tests that should exercise this code.

Comment by Marko Mäkelä [ 2017-08-14 ]

My observation of the predicate being not only slow but also possibly wrong could explain
MDEV-13246 Stale rows despite ON DELETE CASCADE constraint

There was a compilation error in my patch when WITH_WSREP was not enabled. Revised patch on bb-10.2-marko

Comment by Jan Lindström (Inactive) [ 2017-08-15 ]

ok to push with change discussed on slack.

Comment by Marko Mäkelä [ 2017-08-15 ]

The change that was discussed: pick up the !wsrep_thd_is_BF() condition from Galera MW-369 FK fixes, which was missing from MariaDB 10.2.
Side note: MariaDB 10.2 may also be missing some other changes from the Galera 5.7 branch.

Comment by Claudio Nanni [ 2017-08-29 ]

Now it's much faster on a single instance, but if Galera is enabled it's exactly like before, lasting so long it will eventually render the cluster unusable.

Comment by Marko Mäkelä [ 2017-08-30 ]

claudio.nanni, you are right, the Galera logic was not fixed yet. I filed MDEV-13678 to track that fix.

Generated at Thu Feb 08 08:06:03 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.