Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.6, 10.7(EOL), 10.8(EOL), 10.9(EOL), 10.10(EOL), 10.11, 11.0(EOL), 11.1(EOL)
Description
elenst provided an rr replay trace where InnoDB hangs:
10.6 0976afec889d8914326f9e71b15ea215470dadba |
2023-04-25 15:29:40 0 [Note] /data/MDEV-31084/10.6d/sql/mysqld: ready for connections.
|
Version: '10.6.13-MariaDB-debug-log' socket: '/data/MDEV-31084/var3/mysql.sock' port: 13000 Source distribution
|
2023-04-25 15:30:24 17 [Note] Semi-sync replication initialized for transactions.
|
2023-04-25 15:30:24 17 [Note] Semi-sync replication enabled on the master.
|
2023-04-25 15:30:24 0 [Note] Starting ack receiver thread
|
2023-04-25 15:31:17 0 [Note] InnoDB: Dumping buffer pool(s) to /data/MDEV-31084/var3/data/ib_buffer_pool
|
2023-04-25 15:31:17 0 [Note] InnoDB: Restricted to 510 pages due to innodb_buf_pool_dump_pct=25
|
2023-04-25 15:31:17 0 [Note] InnoDB: Buffer pool(s) dump completed at 230425 15:31:17
|
2023-04-25 15:41:43 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
|
At the final state, one locked block exists in the buf_pool.flush_list, with an oldest_modification_ that requires this page to be written out, so that a log checkpoint can be made and the circular ib_logfile0 is writable again.
The problematic page latch is being held by a purge thread, which is waiting for lock_sys.latch. The exclusive lock_sys.latch is being held by a user thread that is executing the DROP TABLE part of a CREATE OR REPLACE TABLE statement. Here is how I debugged it:
ssh rr
|
rr replay /data/MDEV-31084/var3/rr_profile_1682436579
|
run 1364731
|
break buf_flush_sync_for_checkpoint
|
continue
|
thread apply 18 backtrace
|
thread apply 1 backtrace
|
10.6 0976afec889d8914326f9e71b15ea215470dadba |
Thread 18 (Thread 424366.424399):
|
#0 FreeState (cs=0x604754000ce0, free_state=1) at /data/MDEV-31084/10.6d/dbug/dbug.c:1645
|
#1 0x000055f7328f8a55 in _db_pop_ () at /data/MDEV-31084/10.6d/dbug/dbug.c:935
|
#2 0x000055f7328d4802 in safe_mutex_lock (mp=0x55f733e55a50 <buf_dblwr+16>, my_flags=0, file=0x55f732d8c8f0 "/data/MDEV-31084/10.6d/storage/innobase/include/buf0dblwr.h", line=161) at /data/MDEV-31084/10.6d/mysys/thr_mutex.c:398
|
#3 0x000055f73255bdd1 in inline_mysql_mutex_lock (that=0x55f733e55a50 <buf_dblwr+16>, src_file=0x55f732d8c8f0 "/data/MDEV-31084/10.6d/storage/innobase/include/buf0dblwr.h", src_line=161) at /data/MDEV-31084/10.6d/include/mysql/psi/mysql_thread.h:750
|
#4 0x000055f7325636bc in buf_dblwr_t::wait_flush_buffered_writes (this=0x55f733e55a40 <buf_dblwr>) at /data/MDEV-31084/10.6d/storage/innobase/include/buf0dblwr.h:161
|
#5 0x000055f732561ad2 in os_aio_wait_until_no_pending_writes (declare=false) at /data/MDEV-31084/10.6d/storage/innobase/os/os0file.cc:3665
|
#6 0x000055f732746470 in buf_flush_wait (lsn=21395826) at /data/MDEV-31084/10.6d/storage/innobase/buf/buf0flu.cc:1865
|
#7 0x000055f732746967 in buf_flush_wait_flushed (sync_lsn=21395826) at /data/MDEV-31084/10.6d/storage/innobase/buf/buf0flu.cc:1911
|
#8 0x000055f73251cf09 in log_checkpoint_margin () at /data/MDEV-31084/10.6d/storage/innobase/log/log0log.cc:1046
|
#9 0x000055f73251cf48 in log_check_margins () at /data/MDEV-31084/10.6d/storage/innobase/log/log0log.cc:1062
|
#10 0x000055f7324d8af1 in log_free_check () at /data/MDEV-31084/10.6d/storage/innobase/include/log0log.inl:309
|
#11 0x000055f7324eb915 in ibuf_delete_for_discarded_space (space=484) at /data/MDEV-31084/10.6d/storage/innobase/ibuf/ibuf0ibuf.cc:4461
|
#12 0x000055f7327b3177 in fil_delete_tablespace (id=484) at /data/MDEV-31084/10.6d/storage/innobase/fil/fil0fil.cc:1692
|
#13 0x000055f7327a8522 in trx_t::commit (this=0x153fe3029880, deleted=std::vector of length 0, capacity 0) at /data/MDEV-31084/10.6d/storage/innobase/dict/drop.cc:274
|
#14 0x000055f73245d2ce in ha_innobase::delete_table (this=0x604754316210, name=0x42486683ab50 "./test1/updates_deletes_tbl11") at /data/MDEV-31084/10.6d/storage/innobase/handler/ha_innodb.cc:13757
|
…
|
#27 0x000055f731d7d592 in mysql_parse (thd=0x604754000d48, rawbuf=0x6047540130c0 "CREATE /* QNO 11738 CON_ID 17 */ OR REPLACE TABLE updates_deletes_tbl11 AS SELECT tbl1.* FROM test1.updates_deletes_tbl8 AS tbl1 WHERE tbl1.`col_enum_utf8` < 'primjer' OR 'phục vụ' != tbl1.`col_ye"..., length=253, parser_state=0x42486683d390) at /data/MDEV-31084/10.6d/sql/sql_parse.cc:8036
|
10.6 0976afec889d8914326f9e71b15ea215470dadba |
Thread 1 (Thread 424366.424380):
|
…
|
#16 0x000055f73250f1be in lock_sys_t::rd_lock (this=0x55f733e16440 <lock_sys>) at /data/MDEV-31084/10.6d/storage/innobase/include/lock0lock.h:809
|
#17 0x000055f7324f4e06 in LockGuard::LockGuard (this=0x1657406ddbc0, hash=@0x55f733e16498: {n_cells = 10477, array = 0x55f7358358c0}, id={m_id = 11}) at /data/MDEV-31084/10.6d/storage/innobase/lock/lock0lock.cc:201
|
#18 0x000055f7324ffca4 in lock_update_delete (block=0xd7e2e786340, rec=0xd7e2e8343e4 "") at /data/MDEV-31084/10.6d/storage/innobase/lock/lock0lock.cc:3174
|
#19 0x000055f732701eab in btr_cur_optimistic_delete (cursor=0x55f7358894a0, flags=0, mtr=0x1657406de1c0) at /data/MDEV-31084/10.6d/storage/innobase/btr/btr0cur.cc:4579
|
#20 0x000055f73262662b in row_purge_remove_clust_if_poss_low (node=0x55f735889400, mode=BTR_MODIFY_LEAF) at /data/MDEV-31084/10.6d/storage/innobase/row/row0purge.cc:215
|
I see two possible fixes:
- We must disallow holding lock_sys.latch while log_free_check() is being executed. This should only be violated in DDL operations. The fix might be as simple as executing log_free_check() only once in DDL operations, right before acquiring lock_sys.latch.
- In any operation that needs to acquire lock_sys.latch while holding buffer page latches, we must implement a timed wait, and on timeout we have to restart the whole mini-transaction. This would be very complicated, could seriously hurt DML performance, and increase the amount of "fake hangs" due to starvation when running with rr record.
Attachments
Issue Links
- relates to
-
MDEV-22090 Change buffer is not freed after dropping or rebuilding table
- Closed
-
MDEV-25506 Atomic DDL: .frm file is removed and orphan InnoDB tablespace is left behind upon crash recovery
- Closed