[MDEV-22769] Long semaphore wait upon DROP DATABASE with XA holding locks Created: 2020-06-02  Updated: 2020-06-08  Resolved: 2020-06-05

Status: Closed
Project: MariaDB Server
Component/s: Locking, Storage Engine - InnoDB, XA
Affects Version/s: 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.5.4, 10.2.33, 10.3.24, 10.4.14

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: crash, shutdown

Attachments: Text File threads.txt    
Issue Links:
Problem/Incident
causes MDEV-22827 InnoDB: Failing assertion: purge_sys-... Closed
Relates
relates to MDEV-22733 XA PREPARE breaks MDL in pseudo_slave... Stalled

 Description   

When an uncommitted XA is using a table, DROP DATABASE executed in another connection doesn't respect lock_wait_timeout, it hangs until the server does not get killed by long semaphore wait.

It looks quite similar to MDEV-22733, except that there is no shutdown here, and this one is reproducible on 10.2+, it's not 10.5-specific. So I guess it's somehow different.

--source include/have_innodb.inc
 
CREATE DATABASE db;
USE db;
CREATE TABLE t1 (a INT) ENGINE=InnoDB;
XA BEGIN 'x';
UPDATE t1 SET a = 1;
--error ER_XAER_RMFAIL
ALTER TABLE t2;
 
--connect (con1,localhost,root,,test)
SET lock_wait_timeout= 1;
--error ER_LOCK_WAIT_TIMEOUT
DROP DATABASE db;
 
# Cleanup
--disconnect con1
--connection default
XA END 'x';
XA ROLLBACK 'x';
DROP DATABASE db;

The test case above hangs on DROP DATABASE db instead of aborting it with ER_LOCK_WAIT_TIMEOUT.
innodb_lock_wait_timeout doesn't have any effect either (I've removed it from the test case, as it probably shouldn't affect it anyway).

Stack trace from the hanging server:

10.2 4832b751

Thread 28 (Thread 0x7fab6cba6700 (LWP 12848)):
#0  0x0000563c93a77eb5 in rw_lock_validate (lock=0x563c961d2e50) at /data/src/10.2/storage/innobase/sync/sync0rw.cc:860
#1  0x0000563c93a78278 in rw_lock_own (lock=0x563c961d2e50, lock_type=4) at /data/src/10.2/storage/innobase/sync/sync0rw.cc:998
#2  0x0000563c93b29c93 in buf_pool_watch_is_sentinel (buf_pool=0x563c9617ffb0, bpage=0x7fab6cec1a58) at /data/src/10.2/storage/innobase/buf/buf0buf.cc:3343
#3  0x0000563c93b2c640 in buf_page_get_low (page_id=..., page_size=..., rw_latch=1, guess=0x0, mode=10, file=0x563c94080920 "/data/src/10.2/storage/innobase/row/row0sel.cc", line=1338, mtr=0x7fab6cba2720, err=0x7fab6cba1598) at /data/src/10.2/storage/innobase/buf/buf0buf.cc:4283
#4  0x0000563c93b2df72 in buf_page_get_gen (page_id=..., page_size=..., rw_latch=1, guess=0x7fab6cec1a58, mode=10, file=0x563c94080920 "/data/src/10.2/storage/innobase/row/row0sel.cc", line=1338, mtr=0x7fab6cba2720, err=0x7fab6cba1598) at /data/src/10.2/storage/innobase/buf/buf0buf.cc:4882
#5  0x0000563c93af410a in btr_cur_search_to_nth_level_func (index=0x7fab44013890, level=0, tuple=0x7fab1803dc00, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7fab1803d910, has_search_latch=0, file=0x563c94080920 "/data/src/10.2/storage/innobase/row/row0sel.cc", line=1338, mtr=0x7fab6cba2720, autoinc=0) at /data/src/10.2/storage/innobase/btr/btr0cur.cc:1261
#6  0x0000563c93a28fa0 in btr_pcur_open_with_no_init_func (index=0x7fab44013890, tuple=0x7fab1803dc00, mode=PAGE_CUR_GE, latch_mode=1, cursor=0x7fab1803d910, has_search_latch=0, file=0x563c94080920 "/data/src/10.2/storage/innobase/row/row0sel.cc", line=1338, mtr=0x7fab6cba2720) at /data/src/10.2/storage/innobase/include/btr0pcur.ic:521
#7  0x0000563c93a2c973 in row_sel_open_pcur (plan=0x7fab1803d900, has_search_latch=0, mtr=0x7fab6cba2720) at /data/src/10.2/storage/innobase/row/row0sel.cc:1338
#8  0x0000563c93a2d3c8 in row_sel (node=0x7fab1803d810, thr=0x7fab1803dd50) at /data/src/10.2/storage/innobase/row/row0sel.cc:1715
#9  0x0000563c93a2e7ae in row_sel_step (thr=0x7fab1803dd50) at /data/src/10.2/storage/innobase/row/row0sel.cc:2399
#10 0x0000563c939aecd4 in que_thr_step (thr=0x7fab1803dd50) at /data/src/10.2/storage/innobase/que/que0que.cc:1020
#11 0x0000563c939aefee in que_run_threads_low (thr=0x7fab1803dd50) at /data/src/10.2/storage/innobase/que/que0que.cc:1104
#12 0x0000563c939af1a3 in que_run_threads (thr=0x7fab1803dd50) at /data/src/10.2/storage/innobase/que/que0que.cc:1144
#13 0x0000563c939af44c in que_eval_sql (info=0x7fab18041c10, sql=0x563c94108c48 "PROCEDURE DELETE_FROM_TABLE_STATS () IS\nBEGIN\nDELETE FROM \"mysql/innodb_table_stats\" WHERE\ndatabase_name = :database_name AND\ntable_name = :table_name;\nEND;\n", reserve_dict_mutex=0, trx=0x7fab6d7b94b8) at /data/src/10.2/storage/innobase/que/que0que.cc:1221
#14 0x0000563c93b97e99 in dict_stats_exec_sql (pinfo=0x7fab18041c10, sql=0x563c94108c48 "PROCEDURE DELETE_FROM_TABLE_STATS () IS\nBEGIN\nDELETE FROM \"mysql/innodb_table_stats\" WHERE\ndatabase_name = :database_name AND\ntable_name = :table_name;\nEND;\n", trx=0x7fab6d7b94b8) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:305
#15 0x0000563c93b9ebed in dict_stats_delete_from_table_stats (database_name=0x7fab6cba2e30 "db", table_name=0x7fab6cba2f00 "#mysql50##sql-ib19") at /data/src/10.2/storage/innobase/dict/dict0stats.cc:3469
#16 0x0000563c93b9ee0a in dict_stats_drop_table (db_and_table=0x7fab18031dd0 "db/#sql-ib19", errstr=0x7fab6cba33e0 "", errstr_sz=1024) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:3554
#17 0x0000563c93a03308 in row_drop_table_for_mysql (name=0x7fab18031dd0 "db/#sql-ib19", trx=0x7fab6d7b8390, sqlcom=SQLCOM_DROP_DB, create_failed=false, nonatomic=true) at /data/src/10.2/storage/innobase/row/row0mysql.cc:3399
#18 0x0000563c93a04b6b in row_drop_database_for_mysql (name=0x7fab18041a80 "db/", trx=0x7fab6d7b8390, found=0x7fab6cba4240) at /data/src/10.2/storage/innobase/row/row0mysql.cc:4008
#19 0x0000563c938c9c24 in innobase_drop_database (hton=0x563c960cc6c0, path=0x7fab6cba47c0 "./db/") at /data/src/10.2/storage/innobase/handler/ha_innodb.cc:13542
#20 0x0000563c936c02f3 in dropdb_handlerton (unused1=0x0, plugin=0x7fab1803ce80, path=0x7fab6cba47c0) at /data/src/10.2/sql/handler.cc:742
#21 0x0000563c9346201d in plugin_foreach_with_mask (thd=0x0, func=0x563c936c029a <dropdb_handlerton(THD*, plugin_ref, void*)>, type=1, state_mask=8, arg=0x7fab6cba47c0) at /data/src/10.2/sql/sql_plugin.cc:2402
#22 0x0000563c936c0328 in ha_drop_database (path=0x7fab6cba47c0 "./db/") at /data/src/10.2/sql/handler.cc:749
#23 0x0000563c93413571 in mysql_rm_db_internal (thd=0x7fab18000af0, db=0x7fab18011120 "db", if_exists=false, silent=false) at /data/src/10.2/sql/sql_db.cc:937
#24 0x0000563c93413c4d in mysql_rm_db (thd=0x7fab18000af0, db=0x7fab18011120 "db", if_exists=false) at /data/src/10.2/sql/sql_db.cc:1071
#25 0x0000563c9344ab65 in mysql_execute_command (thd=0x7fab18000af0) at /data/src/10.2/sql/sql_parse.cc:4736
#26 0x0000563c9345427b in mysql_parse (thd=0x7fab18000af0, rawbuf=0x7fab18011088 "DROP DATABASE db", length=16, parser_state=0x7fab6cba5610, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:7741
#27 0x0000563c934425a8 in dispatch_command (command=COM_QUERY, thd=0x7fab18000af0, packet=0x7fab180088a1 "DROP DATABASE db", packet_length=16, is_com_multi=false, is_next_command=false) at /data/src/10.2/sql/sql_parse.cc:1832
#28 0x0000563c93441023 in do_command (thd=0x7fab18000af0) at /data/src/10.2/sql/sql_parse.cc:1385
#29 0x0000563c93596259 in do_handle_one_connection (connect=0x563c96675a00) at /data/src/10.2/sql/sql_connect.cc:1336
#30 0x0000563c93595fc4 in handle_one_connection (arg=0x563c96675a00) at /data/src/10.2/sql/sql_connect.cc:1241
#31 0x0000563c93dabc64 in pfs_spawn_thread (arg=0x563c965d1c30) at /data/src/10.2/storage/perfschema/pfs.cc:1869
#32 0x00007fab7452c4a4 in start_thread (arg=0x7fab6cba6700) at pthread_create.c:456
#33 0x00007fab72660d0f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97

All threads are attached.

Eventually it ends up with this:

=====================================
2020-06-02 03:12:49 0x7fab5f7fe700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 20 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 1 srv_active, 0 srv_shutdown, 0 srv_idle
srv_master_thread log flush and writes: 0
----------
SEMAPHORES
----------
-------------
RW-LATCH INFO
-------------
RW-LOCK: 0x7fab44012468 
RW-LOCK: 0x563c94808300 
Locked: thread 140374240290560 file row0mysql.cc line 3921  X-LOCK
RW-LOCK: 0x7fab6cea61b0 
Total number of rw-locks 1075
OS WAIT ARRAY INFO: reservation count 26
--Thread 140374009960192 has waited at dict0dict.cc line 997 for 899.00 seconds the semaphore:
Mutex at 0x563c96273b70, Mutex DICT_SYS created dict0dict.cc:1087, lock var 2
Last time reserved in file /data/src/10.2/storage/innobase/row/row0mysql.cc line 2185
OS WAIT ARRAY INFO: signal count 22
RW-shared spins 4, rounds 36, OS waits 1
RW-excl spins 5, rounds 93, OS waits 3
RW-sx spins 0, rounds 0, OS waits 0
Spin rounds per wait: 9.00 RW-shared, 18.60 RW-excl, 0.00 RW-sx
------------
TRANSACTIONS
------------
Trx id counter 23867771
Purge done for trx's n:o < 1289 undo n:o < 0 state: running but idle
History list length 0
Total number of lock structs in row lock hash table 1
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 23867768, ACTIVE 0 sec dropping table
0 lock struct(s), heap size 1152, 0 row lock(s)
MySQL thread id 10, OS thread handle 140374240290560, query id 26 localhost root Unlocking tables
DROP DATABASE db
---TRANSACTION 421849229652584, not started
0 lock struct(s), heap size 1152, 0 row lock(s)
---TRANSACTION 1284, ACTIVE 899 sec
2 lock struct(s), heap size 1152, 1 row lock(s)
MySQL thread id 9, OS thread handle 140374240597760, query id 24 localhost root 
---TRANSACTION 23867770, ACTIVE 0 sec
0 lock struct(s), heap size 1152, 0 row lock(s)
--------
FILE I/O
--------
I/O thread 0 state: waiting for completed aio requests (insert buffer thread)
I/O thread 1 state: waiting for completed aio requests (log thread)
I/O thread 2 state: waiting for completed aio requests (read thread)
I/O thread 3 state: waiting for completed aio requests (read thread)
I/O thread 4 state: waiting for completed aio requests (write thread)
I/O thread 5 state: waiting for completed aio requests (write thread)
Pending normal aio reads: [0, 0] , aio writes: [0, 0] ,
 ibuf aio reads:, log i/o's:, sync i/o's:
Pending flushes (fsync) log: 0; buffer pool: 0
188 OS file reads, 173 OS file writes, 14 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
 insert 0, delete mark 0, delete 0
discarded operations:
 insert 0, delete mark 0, delete 0
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
Hash table size 2267, node heap has 1 buffer(s)
Hash table size 2267, node heap has 0 buffer(s)
8915.60 hash searches/s, 17831.11 non-hash searches/s
---
LOG
---
Log sequence number 2595218
Log flushed up to   1708106
Pages flushed up to 1708106
Last checkpoint at  1628870
0 pending log flushes, 0 pending chkp writes
14 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 8798208
Dictionary memory allocated 29184
Buffer pool size   512
Free buffers       201
Database pages     310
Old database pages 0
Modified db pages  1
Percent of dirty pages(LRU & free pages): 0.195
Max dirty pages percent: 75.000
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 174, created 136, written 162
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 310, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 queries inside InnoDB, 0 queries in queue
0 read views open inside InnoDB
Process ID=12816, Main thread ID=140374009960192, state: doing background drop tables
Number of rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
2020-06-02  3:12:50 140374240597760 [Warning] Aborted connection 9 to db: 'db' user: 'root' host: 'localhost' (Got an error reading communication packets)
200602  3:12:50 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.2.33-MariaDB-debug-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=2
max_threads=153
thread_count=8
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63103 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
mysys/stacktrace.c:267(my_print_stacktrace)[0x563c93e0885b]
sql/signal_handler.cc:209(handle_fatal_signal)[0x563c936bda94]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0)[0x7fab745360e0]
unix/syscall-template.S:84(__poll)[0x7fab726578bd]
2020-06-02  3:12:50 140374240290560 [Warning] Aborted connection 10 to db: 'test' user: 'root' host: 'localhost' (Got an error writing communication packets)
sql/mysqld.cc:6667(handle_connections_sockets())[0x563c93359b50]
sql/mysqld.cc:6138(mysqld_main(int, char**))[0x563c93358e32]
sql/main.cc:26(main)[0x563c9334d6d0]
csu/libc-start.c:325(__libc_start_main)[0x7fab725982e1]
/data/bld/10.2-debug/bin/mysqld(_start+0x2a)[0x563c9334d5aa]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /data/bld/10.2-debug/mysql-test/var/mysqld.1/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             128123               128123               processes 
Max open files            1024                 1024                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       128123               128123               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: core



 Comments   
Comment by Vladislav Vaintroub [ 2020-06-02 ]

at some point, the name of the temp table changes to weird name. I do not think "#mysql50##sql-ib19" is correct in frame #15

#15 0x0000563c93b9ebed in dict_stats_delete_from_table_stats (database_name=0x7fab6cba2e30 "db", table_name=0x7fab6cba2f00 "#mysql50##sql-ib19") at /data/src/10.2/storage/innobase/dict/dict0stats.cc:3469
#16 0x0000563c93b9ee0a in dict_stats_drop_table (db_and_table=0x7fab18031dd0 "db/#sql-ib19", errstr=0x7fab6cba33e0 "", errstr_sz=1024) at /data/src/10.2/storage/innobase/dict/dict0stats.cc:3554

Comment by Marko Mäkelä [ 2020-06-05 ]

I think that the following test change is more future-proof (until recovery will resurrect MDL):

diff --git a/mysql-test/suite/innodb/t/xa_recovery.test b/mysql-test/suite/innodb/t/xa_recovery.test
index 20bb52c22f2..487d3834db5 100644
--- a/mysql-test/suite/innodb/t/xa_recovery.test
+++ b/mysql-test/suite/innodb/t/xa_recovery.test
@@ -13,6 +13,9 @@ CREATE TABLE t1 (a INT) ENGINE=InnoDB;
 INSERT INTO t1 VALUES (1);
 connect (con1,localhost,root);
 XA START 'x'; UPDATE t1 set a=2; XA END 'x'; XA PREPARE 'x';
+connect (con2,localhost,root);
+CREATE TABLE t2 (a INT) ENGINE=InnoDB;
+XA START 'y'; INSERT INTO t2 VALUES (1); XA END 'y'; XA PREPARE 'y';
 connection default;
 
 # innodb_force_recovery=2 prevents the purge and tests that the fix of
@@ -25,6 +28,7 @@ connection default;
 --let $shutdown_timeout=
 
 disconnect con1;
+disconnect con2;
 connect (con1,localhost,root);
 --send SELECT * FROM t1 LOCK IN SHARE MODE
 
@@ -35,6 +39,9 @@ let $wait_condition=
         info = 'SELECT * FROM t1 LOCK IN SHARE MODE';
 --source include/wait_condition.inc
 
+DROP TABLE t2;
+
+SET GLOBAL innodb_fast_shutdown=0;
 --source include/restart_mysqld.inc
 
 disconnect con1;

In 10.2, this will lead to a crash on shutdown:

10.2 29ed04cb6d8e930c2b2bad7baf4cdae02cf712cd

2020-06-05  9:50:44 139813668603648 [Note] InnoDB: Starting shutdown...
2020-06-05  9:50:44 139813441226496 [Note] InnoDB: Dumping buffer pool(s) to /dev/shm/10.2o/mysql-test/var/mysqld.1/data/ib_buffer_pool
2020-06-05  9:50:44 139813441226496 [Note] InnoDB: Instance 0, restricted to 128 pages due to innodb_buf_pool_dump_pct=25
2020-06-05  9:50:44 139813441226496 [Note] InnoDB: Buffer pool(s) dump completed at 200605  9:50:44
2020-06-05 09:50:46 0x7f28e80d4700  InnoDB: Assertion failure in file /mariadb/10.2o/storage/innobase/row/row0mysql.cc line 4820
InnoDB: Failing assertion: UT_LIST_GET_LEN(row_mysql_drop_list) == 0

Some later version (at least 10.5) would hang.

Comment by Marko Mäkelä [ 2020-06-05 ]

It turns out that my test change above is running with innodb_force_recovery=2. In that case, the master thread is not started at all, and nothing will be emptying the background drop table queue. We must relax that failing assertion.

Generated at Thu Feb 08 09:17:19 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.