[MDEV-22778] Slow InnoDB shutdown on large instance Created: 2020-06-02  Updated: 2022-08-11  Resolved: 2020-07-23

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.4.13
Fix Version/s: 10.4.14, 10.5.5

Type: Bug Priority: Major
Reporter: Hartmut Holzgraefe Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: Windows, innodb, shutdown

Attachments: PNG File image-2020-07-14-19-18-54-693.png     PNG File image-2020-07-14-19-18-55-706.png     File mdev-11778-workaround.patch     File mdev-22778.sql.gz     PNG File screenshot-1.png     PNG File screenshot-2.png    
Issue Links:
Relates
relates to MDEV-29291 Assertion `!table->fts' failed in dic... Closed

 Description   

Shutting down al large windows instance (innodb_pool_buffer_size=384GB, > 500K tables) takes much longer with 10.4 than it did with 10.3.

Shutdown takes around 10 minutes even with innodb_fast_shutdown=1 or =3.

innodb_defragment was originally enabled, but even after turning that off shutdown still took several minutes, even though less than with defragmentation enabled.

Nothing special in the error log besides [Note] InnoDB: Waiting for master threadto exit once per minute.



 Comments   
Comment by Vladislav Vaintroub [ 2020-06-02 ]

 	ntdll.dll!RtlFreeHeap()	Unknown
 	ucrtbase.dll!_free_base()	Unknown
>	[Inline Frame] mysqld.exe!ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1>::deallocate(std::_Tree_node<dict_foreign_t *,void *> *) Line 438	C++
 	[Inline Frame] mysqld.exe!std::_Normal_allocator_traits<ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1>>::deallocate(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 526	C++
 	[Inline Frame] mysqld.exe!std::_Tree_node<dict_foreign_t *,void *>::_Freenode0(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 373	C++
 	[Inline Frame] mysqld.exe!std::_Tree_val<std::_Tree_simple_types<dict_foreign_t *>>::_Erase_head(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 753	C++
 	[Inline Frame] mysqld.exe!std::_Tree<std::_Tset_traits<dict_foreign_t *,dict_foreign_compare,ut_allocator<dict_foreign_t *,1>,0>>::{dtor}() Line 1189	C++
 	mysqld.exe!dict_mem_table_free(dict_table_t * table=0x0000014f5c0d40f8) Line 228	C++
 	mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x0000014f5c0d40f8, bool lru, bool keep) Line 1731	C++
 	mysqld.exe!dict_sys_t::close() Line 5893	C++
 	mysqld.exe!innodb_shutdown() Line 2520	C++
 	mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340	C++

Comment by Vladislav Vaintroub [ 2020-06-02 ]

The bulk of time is spent here
It looks like O(n^3) , where n is the number of elements in data dictionary

   for every element in data dictionary (O(n))
     for all its mutexes (O(1))
     erase latch counter in std::vector(this should be O(n^2))

 	mysqld.exe!std::_Find_unchecked1<LatchCounter::Count * *,LatchCounter::Count *>(LatchCounter::Count * * _First=0x000001537a99f3b0, LatchCounter::Count * * const _Last=0x000001537addb258, LatchCounter::Count * const & _Val=0x00000153a7a3cf68, std::integral_constant<bool,0> __formal={...}) Line 4640	C++
 	mysqld.exe!std::remove<std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>>,LatchCounter::Count *>(std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>> _First={...}, const std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>> _Last={...}, LatchCounter::Count * const & _Val=0x00000153a7a3cf68) Line 2005	C++
>	[Inline Frame] mysqld.exe!LatchCounter::single_deregister(LatchCounter::Count * count=0x00000153a7a3cf68) Line 643	C++
 	[Inline Frame] mysqld.exe!GenericPolicy<OSTrackMutex<GenericPolicy>>::destroy() Line 232	C++
 	[Inline Frame] mysqld.exe!PolicyMutex<OSTrackMutex<GenericPolicy>>::destroy() Line 673	C++
 	mysqld.exe!mutex_destroy<PolicyMutex<OSTrackMutex<GenericPolicy>>>(PolicyMutex<OSTrackMutex<GenericPolicy>> * mutex) Line 193	C++
 	mysqld.exe!dict_mem_index_free(dict_index_t * index=0x00000153a7a3cde8) Line 1097	C++
 	mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x00000153a3d88cc8, bool lru, bool keep) Line 1685	C++
 	mysqld.exe!dict_sys_t::close() Line 5893	C++
 	mysqld.exe!innodb_shutdown() Line 2520	C++
 	mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340	C++

Comment by Vladislav Vaintroub [ 2020-06-02 ]

Freeing small allocations, some millions of times

 	ntdll.dll!RtlpHeapFindListLookupEntry()	Unknown
 	ntdll.dll!RtlpInsertFreeBlock()	Unknown
 	ntdll.dll!RtlpDeCommitFreeBlock()	Unknown
 	ntdll.dll!RtlpFreeHeap()	Unknown
 	ntdll.dll!RtlFreeHeap()	Unknown
 	ntdll.dll!RtlpFreeUserBlockToHeap()	Unknown
 	ntdll.dll!RtlpFreeUserBlock()	Unknown
 	ntdll.dll!RtlFreeHeap()	Unknown
 	ucrtbase.dll!_free_base()	Unknown
>	[Inline Frame] mysqld.exe!mem_heap_free(mem_block_info_t * heap=0x000001506481b5e0) Line 417	C++
 	mysqld.exe!dict_mem_index_free(dict_index_t * index) Line 1107	C++
 	mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x00000150434f2fe8, bool lru, bool keep) Line 1685	C++
 	mysqld.exe!dict_sys_t::close() Line 5893	C++
 	mysqld.exe!innodb_shutdown() Line 2520	C++
 	mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340	C++

Comment by Vladislav Vaintroub [ 2020-06-02 ]

Freeing buffer pool rwlocks

 	ntdll.dll!RtlpHeapFindListLookupEntry()	Unknown
 	ntdll.dll!RtlpInsertFreeBlock()	Unknown
 	ntdll.dll!RtlpDeCommitFreeBlock()	Unknown
 	ntdll.dll!RtlpFreeHeap()	Unknown
 	ntdll.dll!RtlFreeHeap()	Unknown
 	ntdll.dll!RtlpFreeUserBlockToHeap()	Unknown
 	ntdll.dll!RtlpFreeUserBlock()	Unknown
 	ntdll.dll!RtlFreeHeap()	Unknown
 	ucrtbase.dll!_free_base()	Unknown
>	mysqld.exe!os_event_destroy(os_event * & event) Line 514	C++
 	mysqld.exe!rw_lock_free_func(rw_lock_t * lock=0x000001114d26ed70) Line 263	C++
 	mysqld.exe!buf_pool_free_instance(buf_pool_t * buf_pool=0x000000e7ff891720) Line 2062	C++
 	mysqld.exe!buf_pool_free(unsigned __int64 n_instances=12) Line 2140	C++
 	mysqld.exe!innodb_shutdown() Line 2539	C++
 	mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340	C++

Comment by Marko Mäkelä [ 2020-07-13 ]

wlad, thank you for the analysis. Could you please try to repeat this yourself too?

It is unclear to me why the shutdown would take longer in 10.4 compared to 10.3. This code has not been changed much since the 10.2 GA release. MDEV-22523 recently removed rtr_ssn_mutex, which might alleviate this problem a little bit.

Could it be that in 10.3, table_definition_cache ends up being smaller in the reported environment than in 10.4? Since MariaDB Server 10.0 (and MySQL 5.6), InnoDB implements a LRU policy for evicting table definitions. The target size of InnoDB’s internal table definition cache is controlled by the SQL-layer table_definition_cache parameter (tdc_size). If the table cache eviction in InnoDB was more ‘eager’ in 10.3, then there would be fewer dictionary objects to destroy at shutdown.

Comment by Vladislav Vaintroub [ 2020-07-13 ]

tdc_size is the same in 10.3 and 10.4 . I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet.

Comment by Marko Mäkelä [ 2020-07-13 ]

Let us ignore the implied claim that this is a regression in 10.4, and instead believe the analysis.

I believe that the reason to call os_event_destroy() on all event-based mutexes and rw-locks in InnoDB is to satisfy the LatchDebug checks.

Furthermore, I also believe that resource leaks can serve as a proxy for even more serious problems, such as failure to properly shut down a critical subsystem. That is why I would rather preserve the current behavior by default.

That being said, I think that with innodb_fast_shutdown=3 (which was introduced in 10.3 by MDEV-15832) and innodb_fast_shutdown=2 it would make sense to skip the ‘fold up the tents’ steps in non-debug builds. That would still give us the opportunity to catch errors in most cases, whie the affected users could set innodb_fast_shutdown=3 to get ultra-fast shutdown starting in MariaDB 10.3 and later.

Comment by Hartmut Holzgraefe [ 2020-07-13 ]

> I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet.

I'm pretty sure I did, but unfortunately I can't find any proof of that ...

Comment by Hartmut Holzgraefe [ 2020-07-14 ]

I apparently didn't ... but things are getting even more strange. On 10.4, I had shutdown take ~2 minutes with a 500K table definition cache fully used, 30sec with a 200K one.

Now with 10.3 I'm already waiting for 25 minutes for shutdown to complete with the 500K test case ...

I'll retry 10.4 once more next, but it will take some time ...

Comment by Hartmut Holzgraefe [ 2020-07-14 ]

Ignore last comment, 10.3 shutdown is fast, even with

Open_table_definitions | 500000

it only takes 4 seconds to shut down. Previous shutdown
was blocked by something else ...

Now re-running 10.4 test ...

Comment by Hartmut Holzgraefe [ 2020-07-14 ]

10.4.13 -> two minutes to shutown, so it is indeed a 10.4 specific problem, and my memory has not failed me.

Comment by Vladislav Vaintroub [ 2020-07-14 ]

hholzgra, can you share your repro script?

Comment by Hartmut Holzgraefe [ 2020-07-14 ]

Attachted dump creates almost 500K empty tables in the "test" database (warning, takes a few hours, I can also share a mariabackup, but that's about 45GB).

The only non-default configuration I use is:

[mysqld]
innodb_buffer_pool_size=16G
innodb_flush_log_at_trx_commit=2
table_definition_cache=500000

After restoring the dump is complete I restart mysqld to get a clean state, then run

mysqldump --skip-lock-tables test >/dev/null

to make sure every table is opened and read from, verify this with:

SHOW TABLES LIKE 'Open_table_definitions';
+--------------------------+--------+
| Variable_name            | Value  |
+--------------------------+--------+
| Open_table_definitions   | 474293 |
+--------------------------+--------+

After that I simply do mysqladmin shutdown and check the error log for the time between "Normal shutdown" and "Shutdown complete" log lines.

Comment by Hartmut Holzgraefe [ 2020-07-14 ]

For faster setup, here's a copy of my 10.3.23 test instances datadir:

http://get-map.org/MDEV-22776-10.3.23.tar.gz

Comment by Hartmut Holzgraefe [ 2020-07-14 ]

and for 10.4 http://get-map.org/MDEV-22776-10.4.13.tar.gz

Comment by Vladislav Vaintroub [ 2020-07-14 ]

I reproduced it, it took just 43 minutes to load, with dummy tables,

DELIMETER //
FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat("create table t",i), " (i int)"); end for; //

here is what VS profiler tells me, about shutdown after the above create 500000 tables is finished:

Function Name Total CPU (unit, %) Self CPU (unit, %) Module
LatchCounter::single_deregister 19860 (45,11 %) 19838 (45,06 %) mysqld.exe
mutex_destroy<PolicyMutex<TTASEventMutex<GenericPolicy> > > 19688 (44,71 %) 19642 (44,61 %) mysqld.exe
_free_base 1654 (3,76 %) 1654 (3,76 %) ucrtbase.dll
sync_file_created_deregister 879 (2,00 %) 641 (1,46 %) mysqld.exe
VirtualFree 641 (1,46 %) 641 (1,46 %) KernelBase.dll
RtlDeleteCriticalSection 349 (0,79 %) 349 (0,79 %) ntdll.dll
Comment by Vladislav Vaintroub [ 2020-07-14 ]

The whole time of this mutex_destroy is spend in the inlined PolicyMutex<TTASEventMutex<GenericPolicy>>::destroy , according to

Comment by Vladislav Vaintroub [ 2020-07-14 ]

And there is yet another mutex_destroy<PolicyMutex<OSTrackMutex<GenericPolicy> > >

mysqld.exe , that takes another 45% of the whole time, in LatchCounter::single_deregister

If not for these mutex_destroys, shutdown would be 10 times faster.

Comment by Vladislav Vaintroub [ 2020-07-14 ]

Note, that I did the experiment with the innodb_file_per_table=0, as the original reporter .

Comment by Vladislav Vaintroub [ 2020-07-14 ]

big picture is that 96.58% of the whole shutdown time is taken by the dict_sys.remove(prev_table); inside dict_sys_t::close(), but we knew this all the time

Comment by Vladislav Vaintroub [ 2020-07-14 ]

I ran the same test now in 10.3, and it turns out, there is indeed nothing of this kind.

The most expensive individual functions on shutdown (which is indeed much faster) are

Function Name Total CPU (unit, %) Self CPU (unit, %) Module
RtlFreeHeap 1938 (31,10 %) 1938 (31,10 %) ntdll.dll
System Call - ntoskrnl.exe 1527 (24,50 %) 1527 (24,50 %) ntoskrnl.exe
VirtualFree 672 (10,78 %) 672 (10,78 %) KernelBase.dll
RtlDeleteCriticalSection 539 (8,65 %) 539 (8,65 %) ntdll.dll
dict_table_remove_from_cache_low 3429 (55,02 %) 194 (3,11 %) mysqld.exe

I guess, a bloated PolicyMutex (ib_mutex_t?), or several were added somewhere into the dictionary object in 10.4 timeline, and this is the actual cause of the slow shutdown.

I think, OSMutex might suffice, or std::mutex perhaps. They are less bloated, do not spin too much, and are much cheaper to destroy.

The 10.4s culprit, mutex_destroy does exist in 10.3 shutdown profiler, but does not cost nearly as much

mutex_destroy<PolicyMutex<TTASEventMutex<BlockMutexPolicy> > > takes 3,35 % inclusive time and 0.3% exclusive time, and is called solely from within buf_pool_free_instance().

Comment by Vladislav Vaintroub [ 2020-07-14 ]

The profiler is actually being helpful to identify the culprit.
So here are the 2 mutexes, which were lazily used prior to 10.4 , if I'm not mistaken.

Comment by Marko Mäkelä [ 2020-07-14 ]

hholzgra and wlad, thank you for confirming this as a 10.4 regression.

I think that we could just remove the zip_pad altogether. The code was originally added to improve ROW_FORMAT=COMPRESSED write performance, back when Facebook was an active user, before they invested effort in MyRocks. I have been thinking to remove write support for ROW_FORMAT=COMPRESSED in MariaDB Server 10.6 (MDEV-22367), to simplify the buffer pool code further. I think that it should be OK to lose some write performance on that obscure format.

It looks like the dict_table_t::autoinc_mutex could be eliminated by changing the data type of dict_table_t::autoinc to Atomic_relaxed<uint64_t>. The trickiest part to replace seems to be ha_innobase::innobase_lock_autoinc().

Comment by Marko Mäkelä [ 2020-07-22 ]

I will try to see if I can eliminate the two mutexes.

Comment by Vladislav Vaintroub [ 2020-07-22 ]

@Marko, maybe you can eliminate the "mutex deregister" stuff on shutdown, for the latch counters and what not, and instead of removing entries from vectors one-by-one, clear the vectors at once , or just let C++ call the static destructors.

Comment by Marko Mäkelä [ 2020-07-22 ]

Removing zip_pad would also disable two configuration parameters: innodb_compression_failure_threshold_pct and innodb_compression_pad_pct_max. It feels wrong to do that in a GA release.

It does not seem easy to guarantee the correctness of ha_innobase::innobase_lock_autoinc() and its callers if we remove dict_table_t::autoinc_mutex.

I think that the least intrusive change would be to replace the two problematic mutexes with std::mutex or something that does not need explicit or time-consuming destruction. Their typical hold time is very short, and the InnoDB instrumentation for deadlocks should not be needed.

Comment by Marko Mäkelä [ 2020-07-22 ]

The RelWithDebInfo build completed the following test case in 16 seconds (not counting bootstrap) when using the MyISAM engine on /dev/shm:

--source include/have_innodb.inc
DELIMITER //;
FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat("create table t",i), " (i int) ENGINE=MyISAM;"); end for; //
DELIMITER ;//
--source include/restart_mysqld.inc

With ENGINE=InnoDB and my fix of using std::mutex, it completed in 62 seconds.
With ENGINE=InnoDB and without my fix, it completed in 69 seconds.
Creating the tables takes the majority of the time (59 seconds).
The shutdown and restart without the fix would take about 9 seconds, and with the fix, about 3 seconds.

I think that implementing this fix is sufficient for now. We might look at speeding up innodb_fast_shutdown=2 and innodb_fast_shutdown=3 in non-debug builds later if it is truly needed.

Generated at Thu Feb 08 09:17:24 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.