[MDEV-25801] DROP DATABASE very slow after innodb undo log truncate Created: 2021-05-27  Updated: 2021-06-23  Resolved: 2021-06-23

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.10, 10.6.1
Fix Version/s: 10.5.12, 10.6.3

Type: Bug Priority: Major
Reporter: VAROQUI Stephane Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: performance

Attachments: File stack.dropdatabase    
Issue Links:
PartOf
is part of MDEV-25773 sysbench-TPCC failed to prepare with ... Closed
Problem/Incident
is caused by MDEV-23399 10.5 performance regression with IO-b... Closed
Relates
relates to MDEV-25800 DROP TABLE unnecessarily accesses .ib... Open

 Description   

(gdb) print buf_pool.mutex
bq. $1 = {m_mutex = {__data = {__lock = 2, __count = 0, __owner = 219, __nusers = 1, __kind = 3, __spins = 8, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
bq.     __size = "\002\000\000\000\000\000\000\000\333\000\000\000\001\000\000\000\003\000\000\000\b", '\000' <repeats 18 times>, __align = 2}, m_psi = 0x7fe0094c2bc0}
bq.

(gdb) print buf_pool.flush_list_mutex
$2 = {m_mutex = {__data = {__lock = 2, __count = 0, __owner = 44, __nusers = 2, __kind = 3, __spins = 10, __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
    __size = "\002\000\000\000\000\000\000\000,\000\000\000\002\000\000\000\003\000\000\000\n", '\000' <repeats 18 times>, __align = 2}, m_psi = 0x7fe0094c2c40}

(gdb) thread apply all backtrace 

https://pastebin.com/kLZmu3f1



 Comments   
Comment by Marko Mäkelä [ 2021-05-27 ]

Thank you. Is this really a hang, or just very slow execution? I see that the buf_pool.flush_list_mutex is held by the following thread:

Thread 6 (Thread 0x7f7845403700 (LWP 44)):
#0  0x000055e003f5c503 in buf_flush_dirty_pages (id=2) at ./storage/innobase/include/buf0types.h:138
#1  0x000055e0037b758a in trx_purge_truncate_history () at ./storage/innobase/trx/trx0purge.cc:685
#2  0x000055e003eee38c in trx_purge (n_tasks=<optimized out>, truncate=<optimized out>) at ./storage/innobase/trx/trx0purge.cc:1327
#3  0x000055e003ee56b7 in srv_do_purge (n_total_purged=<synthetic pointer>) at ./storage/innobase/srv/srv0srv.cc:1902
#4  purge_coordinator_callback_low () at ./storage/innobase/srv/srv0srv.cc:1992
#5  purge_coordinator_callback () at ./storage/innobase/srv/srv0srv.cc:2021

That thread should not be able to hang. But, as noted in MDEV-25773 (which I think that this case is exactly duplicating), it is not good to keep frequently scanning the entire buf_pool.flush_list while holding buf_pool.flush_list_mutex, because it will block the write completion callbacks from doing useful work.

Comment by Marko Mäkelä [ 2021-05-28 ]

For the record, stephane@skysql.com reported today morning that is indeed was extremely slow and not actually hung:

drop database sbtest;
Query OK, 90 rows affected, 2 warnings (5 hours 58 min 40.079 sec)

Generated at Thu Feb 08 09:40:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.