Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
10.4.13
Description
Shutting down al large windows instance (innodb_pool_buffer_size=384GB, > 500K tables) takes much longer with 10.4 than it did with 10.3.
Shutdown takes around 10 minutes even with innodb_fast_shutdown=1 or =3.
innodb_defragment was originally enabled, but even after turning that off shutdown still took several minutes, even though less than with defragmentation enabled.
Nothing special in the error log besides [Note] InnoDB: Waiting for master threadto exit once per minute.
Attachments
Issue Links
- relates to
-
MDEV-29291 Assertion `!table->fts' failed in dict_table_can_be_evicted on SHUTDOWN
-
- Closed
-
Activity
The bulk of time is spent here
It looks like O(n^3) , where n is the number of elements in data dictionary
for every element in data dictionary (O(n))
|
for all its mutexes (O(1))
|
erase latch counter in std::vector(this should be O(n^2))
|
mysqld.exe!std::_Find_unchecked1<LatchCounter::Count * *,LatchCounter::Count *>(LatchCounter::Count * * _First=0x000001537a99f3b0, LatchCounter::Count * * const _Last=0x000001537addb258, LatchCounter::Count * const & _Val=0x00000153a7a3cf68, std::integral_constant<bool,0> __formal={...}) Line 4640 C++
|
mysqld.exe!std::remove<std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>>,LatchCounter::Count *>(std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>> _First={...}, const std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>> _Last={...}, LatchCounter::Count * const & _Val=0x00000153a7a3cf68) Line 2005 C++
|
> [Inline Frame] mysqld.exe!LatchCounter::single_deregister(LatchCounter::Count * count=0x00000153a7a3cf68) Line 643 C++
|
[Inline Frame] mysqld.exe!GenericPolicy<OSTrackMutex<GenericPolicy>>::destroy() Line 232 C++
|
[Inline Frame] mysqld.exe!PolicyMutex<OSTrackMutex<GenericPolicy>>::destroy() Line 673 C++
|
mysqld.exe!mutex_destroy<PolicyMutex<OSTrackMutex<GenericPolicy>>>(PolicyMutex<OSTrackMutex<GenericPolicy>> * mutex) Line 193 C++
|
mysqld.exe!dict_mem_index_free(dict_index_t * index=0x00000153a7a3cde8) Line 1097 C++
|
mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x00000153a3d88cc8, bool lru, bool keep) Line 1685 C++
|
mysqld.exe!dict_sys_t::close() Line 5893 C++
|
mysqld.exe!innodb_shutdown() Line 2520 C++
|
mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340 C++
|
|
Freeing small allocations, some millions of times
ntdll.dll!RtlpHeapFindListLookupEntry() Unknown
|
ntdll.dll!RtlpInsertFreeBlock() Unknown
|
ntdll.dll!RtlpDeCommitFreeBlock() Unknown
|
ntdll.dll!RtlpFreeHeap() Unknown
|
ntdll.dll!RtlFreeHeap() Unknown
|
ntdll.dll!RtlpFreeUserBlockToHeap() Unknown
|
ntdll.dll!RtlpFreeUserBlock() Unknown
|
ntdll.dll!RtlFreeHeap() Unknown
|
ucrtbase.dll!_free_base() Unknown
|
> [Inline Frame] mysqld.exe!mem_heap_free(mem_block_info_t * heap=0x000001506481b5e0) Line 417 C++
|
mysqld.exe!dict_mem_index_free(dict_index_t * index) Line 1107 C++
|
mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x00000150434f2fe8, bool lru, bool keep) Line 1685 C++
|
mysqld.exe!dict_sys_t::close() Line 5893 C++
|
mysqld.exe!innodb_shutdown() Line 2520 C++
|
mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340 C++
|
|
Freeing buffer pool rwlocks
ntdll.dll!RtlpHeapFindListLookupEntry() Unknown
|
ntdll.dll!RtlpInsertFreeBlock() Unknown
|
ntdll.dll!RtlpDeCommitFreeBlock() Unknown
|
ntdll.dll!RtlpFreeHeap() Unknown
|
ntdll.dll!RtlFreeHeap() Unknown
|
ntdll.dll!RtlpFreeUserBlockToHeap() Unknown
|
ntdll.dll!RtlpFreeUserBlock() Unknown
|
ntdll.dll!RtlFreeHeap() Unknown
|
ucrtbase.dll!_free_base() Unknown
|
> mysqld.exe!os_event_destroy(os_event * & event) Line 514 C++
|
mysqld.exe!rw_lock_free_func(rw_lock_t * lock=0x000001114d26ed70) Line 263 C++
|
mysqld.exe!buf_pool_free_instance(buf_pool_t * buf_pool=0x000000e7ff891720) Line 2062 C++
|
mysqld.exe!buf_pool_free(unsigned __int64 n_instances=12) Line 2140 C++
|
mysqld.exe!innodb_shutdown() Line 2539 C++
|
mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340 C++
|
|
wlad, thank you for the analysis. Could you please try to repeat this yourself too?
It is unclear to me why the shutdown would take longer in 10.4 compared to 10.3. This code has not been changed much since the 10.2 GA release. MDEV-22523 recently removed rtr_ssn_mutex, which might alleviate this problem a little bit.
Could it be that in 10.3, table_definition_cache ends up being smaller in the reported environment than in 10.4? Since MariaDB Server 10.0 (and MySQL 5.6), InnoDB implements a LRU policy for evicting table definitions. The target size of InnoDB’s internal table definition cache is controlled by the SQL-layer table_definition_cache parameter (tdc_size). If the table cache eviction in InnoDB was more ‘eager’ in 10.3, then there would be fewer dictionary objects to destroy at shutdown.
tdc_size is the same in 10.3 and 10.4 . I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet.
Let us ignore the implied claim that this is a regression in 10.4, and instead believe the analysis.
I believe that the reason to call os_event_destroy() on all event-based mutexes and rw-locks in InnoDB is to satisfy the LatchDebug checks.
Furthermore, I also believe that resource leaks can serve as a proxy for even more serious problems, such as failure to properly shut down a critical subsystem. That is why I would rather preserve the current behavior by default.
That being said, I think that with innodb_fast_shutdown=3 (which was introduced in 10.3 by MDEV-15832) and innodb_fast_shutdown=2 it would make sense to skip the ‘fold up the tents’ steps in non-debug builds. That would still give us the opportunity to catch errors in most cases, whie the affected users could set innodb_fast_shutdown=3 to get ultra-fast shutdown starting in MariaDB 10.3 and later.
> I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet.
I'm pretty sure I did, but unfortunately I can't find any proof of that ...
I apparently didn't ... but things are getting even more strange. On 10.4, I had shutdown take ~2 minutes with a 500K table definition cache fully used, 30sec with a 200K one.
Now with 10.3 I'm already waiting for 25 minutes for shutdown to complete with the 500K test case ...
I'll retry 10.4 once more next, but it will take some time ...
Ignore last comment, 10.3 shutdown is fast, even with
Open_table_definitions | 500000
it only takes 4 seconds to shut down. Previous shutdown
was blocked by something else ...
Now re-running 10.4 test ...
10.4.13 -> two minutes to shutown, so it is indeed a 10.4 specific problem, and my memory has not failed me.
Attachted dump creates almost 500K empty tables in the "test" database (warning, takes a few hours, I can also share a mariabackup, but that's about 45GB).
The only non-default configuration I use is:
[mysqld]
|
innodb_buffer_pool_size=16G
|
innodb_flush_log_at_trx_commit=2
|
table_definition_cache=500000
|
After restoring the dump is complete I restart mysqld to get a clean state, then run
mysqldump --skip-lock-tables test >/dev/null
to make sure every table is opened and read from, verify this with:
SHOW TABLES LIKE 'Open_table_definitions';
|
+--------------------------+--------+
|
| Variable_name | Value |
|
+--------------------------+--------+
|
| Open_table_definitions | 474293 |
|
+--------------------------+--------+
|
After that I simply do mysqladmin shutdown and check the error log for the time between "Normal shutdown" and "Shutdown complete" log lines.
For faster setup, here's a copy of my 10.3.23 test instances datadir:
I reproduced it, it took just 43 minutes to load, with dummy tables,
DELIMETER //
|
FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat("create table t",i), " (i int)"); end for; // |
here is what VS profiler tells me, about shutdown after the above create 500000 tables is finished:
Function Name | Total CPU (unit, %) | Self CPU (unit, %) | Module |
---|---|---|---|
LatchCounter::single_deregister | 19860 (45,11 %) | 19838 (45,06 %) | mysqld.exe |
mutex_destroy<PolicyMutex<TTASEventMutex<GenericPolicy> > > | 19688 (44,71 %) | 19642 (44,61 %) | mysqld.exe |
_free_base | 1654 (3,76 %) | 1654 (3,76 %) | ucrtbase.dll |
sync_file_created_deregister | 879 (2,00 %) | 641 (1,46 %) | mysqld.exe |
VirtualFree | 641 (1,46 %) | 641 (1,46 %) | KernelBase.dll |
RtlDeleteCriticalSection | 349 (0,79 %) | 349 (0,79 %) | ntdll.dll |
The whole time of this mutex_destroy is spend in the inlined PolicyMutex<TTASEventMutex<GenericPolicy>>::destroy , according to
And there is yet another mutex_destroy<PolicyMutex<OSTrackMutex<GenericPolicy> > >
mysqld.exe , that takes another 45% of the whole time, in LatchCounter::single_deregister
If not for these mutex_destroys, shutdown would be 10 times faster.
Note, that I did the experiment with the innodb_file_per_table=0, as the original reporter .
big picture is that 96.58% of the whole shutdown time is taken by the dict_sys.remove(prev_table); inside dict_sys_t::close(), but we knew this all the time
I ran the same test now in 10.3, and it turns out, there is indeed nothing of this kind.
The most expensive individual functions on shutdown (which is indeed much faster) are
Function Name | Total CPU (unit, %) | Self CPU (unit, %) | Module |
---|---|---|---|
RtlFreeHeap | 1938 (31,10 %) | 1938 (31,10 %) | ntdll.dll |
System Call - ntoskrnl.exe | 1527 (24,50 %) | 1527 (24,50 %) | ntoskrnl.exe |
VirtualFree | 672 (10,78 %) | 672 (10,78 %) | KernelBase.dll |
RtlDeleteCriticalSection | 539 (8,65 %) | 539 (8,65 %) | ntdll.dll |
dict_table_remove_from_cache_low | 3429 (55,02 %) | 194 (3,11 %) | mysqld.exe |
I guess, a bloated PolicyMutex (ib_mutex_t?), or several were added somewhere into the dictionary object in 10.4 timeline, and this is the actual cause of the slow shutdown.
I think, OSMutex might suffice, or std::mutex perhaps. They are less bloated, do not spin too much, and are much cheaper to destroy.
The 10.4s culprit, mutex_destroy does exist in 10.3 shutdown profiler, but does not cost nearly as much
mutex_destroy<PolicyMutex<TTASEventMutex<BlockMutexPolicy> > > takes 3,35 % inclusive time and 0.3% exclusive time, and is called solely from within buf_pool_free_instance().
The profiler is actually being helpful to identify the culprit.
So here are the 2 mutexes, which were lazily used prior to 10.4 , if I'm not mistaken.
- mutex_free(&table->autoinc_mutex); is taking 45% CPU on shutdown. this one used to be lazily allocated before https://github.com/mariadb/server/commit/e9a5f288f21c
another one is
- mutex_free(&index->zip_pad.mutex); is taking 46% of CPU on shutdown. this comes from
from https://github.com/mariadb/server/commit/adb640e25ad3
hholzgra and wlad, thank you for confirming this as a 10.4 regression.
I think that we could just remove the zip_pad altogether. The code was originally added to improve ROW_FORMAT=COMPRESSED write performance, back when Facebook was an active user, before they invested effort in MyRocks. I have been thinking to remove write support for ROW_FORMAT=COMPRESSED in MariaDB Server 10.6 (MDEV-22367), to simplify the buffer pool code further. I think that it should be OK to lose some write performance on that obscure format.
It looks like the dict_table_t::autoinc_mutex could be eliminated by changing the data type of dict_table_t::autoinc to Atomic_relaxed<uint64_t>. The trickiest part to replace seems to be ha_innobase::innobase_lock_autoinc().
@Marko, maybe you can eliminate the "mutex deregister" stuff on shutdown, for the latch counters and what not, and instead of removing entries from vectors one-by-one, clear the vectors at once , or just let C++ call the static destructors.
Removing zip_pad would also disable two configuration parameters: innodb_compression_failure_threshold_pct and innodb_compression_pad_pct_max. It feels wrong to do that in a GA release.
It does not seem easy to guarantee the correctness of ha_innobase::innobase_lock_autoinc() and its callers if we remove dict_table_t::autoinc_mutex.
I think that the least intrusive change would be to replace the two problematic mutexes with std::mutex or something that does not need explicit or time-consuming destruction. Their typical hold time is very short, and the InnoDB instrumentation for deadlocks should not be needed.
The RelWithDebInfo build completed the following test case in 16 seconds (not counting bootstrap) when using the MyISAM engine on /dev/shm:
--source include/have_innodb.inc
|
DELIMITER //;
|
FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat("create table t",i), " (i int) ENGINE=MyISAM;"); end for; // |
DELIMITER ;//
|
--source include/restart_mysqld.inc |
With ENGINE=InnoDB and my fix of using std::mutex, it completed in 62 seconds.
With ENGINE=InnoDB and without my fix, it completed in 69 seconds.
Creating the tables takes the majority of the time (59 seconds).
The shutdown and restart without the fix would take about 9 seconds, and with the fix, about 3 seconds.
I think that implementing this fix is sufficient for now. We might look at speeding up innodb_fast_shutdown=2 and innodb_fast_shutdown=3 in non-debug builds later if it is truly needed.
ntdll.dll!RtlFreeHeap() Unknown
ucrtbase.dll!_free_base() Unknown
> [Inline Frame] mysqld.exe!ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1>::deallocate(std::_Tree_node<dict_foreign_t *,void *> *) Line 438 C++
[Inline Frame] mysqld.exe!std::_Normal_allocator_traits<ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1>>::deallocate(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 526 C++
[Inline Frame] mysqld.exe!std::_Tree_node<dict_foreign_t *,void *>::_Freenode0(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 373 C++
[Inline Frame] mysqld.exe!std::_Tree_val<std::_Tree_simple_types<dict_foreign_t *>>::_Erase_head(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 753 C++
[Inline Frame] mysqld.exe!std::_Tree<std::_Tset_traits<dict_foreign_t *,dict_foreign_compare,ut_allocator<dict_foreign_t *,1>,0>>::{dtor}() Line 1189 C++
mysqld.exe!dict_mem_table_free(dict_table_t * table=0x0000014f5c0d40f8) Line 228 C++
mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x0000014f5c0d40f8, bool lru, bool keep) Line 1731 C++
mysqld.exe!dict_sys_t::close() Line 5893 C++
mysqld.exe!innodb_shutdown() Line 2520 C++
mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340 C++