[MDEV-22778] Slow InnoDB shutdown on large instance Created: 2020-06-02 Updated: 2022-08-11 Resolved: 2020-07-23 |
|
| Status: | Closed |
| Project: | MariaDB Server |
| Component/s: | Storage Engine - InnoDB |
| Affects Version/s: | 10.4.13 |
| Fix Version/s: | 10.4.14, 10.5.5 |
| Type: | Bug | Priority: | Major |
| Reporter: | Hartmut Holzgraefe | Assignee: | Marko Mäkelä |
| Resolution: | Fixed | Votes: | 0 |
| Labels: | Windows, innodb, shutdown | ||
| Attachments: |
|
||||||||
| Issue Links: |
|
||||||||
| Description |
|
Shutting down al large windows instance (innodb_pool_buffer_size=384GB, > 500K tables) takes much longer with 10.4 than it did with 10.3. Shutdown takes around 10 minutes even with innodb_fast_shutdown=1 or =3. innodb_defragment was originally enabled, but even after turning that off shutdown still took several minutes, even though less than with defragmentation enabled. Nothing special in the error log besides [Note] InnoDB: Waiting for master threadto exit once per minute. |
| Comments |
| Comment by Vladislav Vaintroub [ 2020-06-02 ] | ||||||||||||||||||||||||||||||
|
| ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-06-02 ] | ||||||||||||||||||||||||||||||
|
The bulk of time is spent here
| ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-06-02 ] | ||||||||||||||||||||||||||||||
|
Freeing small allocations, some millions of times
| ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-06-02 ] | ||||||||||||||||||||||||||||||
|
Freeing buffer pool rwlocks
| ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-13 ] | ||||||||||||||||||||||||||||||
|
wlad, thank you for the analysis. Could you please try to repeat this yourself too? It is unclear to me why the shutdown would take longer in 10.4 compared to 10.3. This code has not been changed much since the 10.2 GA release. Could it be that in 10.3, table_definition_cache ends up being smaller in the reported environment than in 10.4? Since MariaDB Server 10.0 (and MySQL 5.6), InnoDB implements a LRU policy for evicting table definitions. The target size of InnoDB’s internal table definition cache is controlled by the SQL-layer table_definition_cache parameter (tdc_size). If the table cache eviction in InnoDB was more ‘eager’ in 10.3, then there would be fewer dictionary objects to destroy at shutdown. | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-13 ] | ||||||||||||||||||||||||||||||
|
tdc_size is the same in 10.3 and 10.4 . I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet. | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-13 ] | ||||||||||||||||||||||||||||||
|
Let us ignore the implied claim that this is a regression in 10.4, and instead believe the analysis. I believe that the reason to call os_event_destroy() on all event-based mutexes and rw-locks in InnoDB is to satisfy the LatchDebug checks. Furthermore, I also believe that resource leaks can serve as a proxy for even more serious problems, such as failure to properly shut down a critical subsystem. That is why I would rather preserve the current behavior by default. That being said, I think that with innodb_fast_shutdown=3 (which was introduced in 10.3 by | ||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2020-07-13 ] | ||||||||||||||||||||||||||||||
|
> I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet. I'm pretty sure I did, but unfortunately I can't find any proof of that ... | ||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
I apparently didn't ... but things are getting even more strange. On 10.4, I had shutdown take ~2 minutes with a 500K table definition cache fully used, 30sec with a 200K one. Now with 10.3 I'm already waiting for 25 minutes for shutdown to complete with the 500K test case ... I'll retry 10.4 once more next, but it will take some time ... | ||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
Ignore last comment, 10.3 shutdown is fast, even with Open_table_definitions | 500000 it only takes 4 seconds to shut down. Previous shutdown Now re-running 10.4 test ... | ||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
10.4.13 -> two minutes to shutown, so it is indeed a 10.4 specific problem, and my memory has not failed me. | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
hholzgra, can you share your repro script? | ||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
Attachted dump creates almost 500K empty tables in the "test" database (warning, takes a few hours, I can also share a mariabackup, but that's about 45GB). The only non-default configuration I use is:
After restoring the dump is complete I restart mysqld to get a clean state, then run mysqldump --skip-lock-tables test >/dev/null to make sure every table is opened and read from, verify this with:
After that I simply do mysqladmin shutdown and check the error log for the time between "Normal shutdown" and "Shutdown complete" log lines. | ||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
For faster setup, here's a copy of my 10.3.23 test instances datadir: | ||||||||||||||||||||||||||||||
| Comment by Hartmut Holzgraefe [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
and for 10.4 http://get-map.org/MDEV-22776-10.4.13.tar.gz | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
I reproduced it, it took just 43 minutes to load, with dummy tables,
here is what VS profiler tells me, about shutdown after the above create 500000 tables is finished:
| ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
The whole time of this mutex_destroy is spend in the inlined PolicyMutex<TTASEventMutex<GenericPolicy>>::destroy , according to | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
And there is yet another mutex_destroy<PolicyMutex<OSTrackMutex<GenericPolicy> > > mysqld.exe , that takes another 45% of the whole time, in LatchCounter::single_deregister If not for these mutex_destroys, shutdown would be 10 times faster. | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
Note, that I did the experiment with the innodb_file_per_table=0, as the original reporter . | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
big picture is that 96.58% of the whole shutdown time is taken by the dict_sys.remove(prev_table); inside dict_sys_t::close(), but we knew this all the time | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
I ran the same test now in 10.3, and it turns out, there is indeed nothing of this kind. The most expensive individual functions on shutdown (which is indeed much faster) are
I guess, a bloated PolicyMutex (ib_mutex_t?), or several were added somewhere into the dictionary object in 10.4 timeline, and this is the actual cause of the slow shutdown. I think, OSMutex might suffice, or std::mutex perhaps. They are less bloated, do not spin too much, and are much cheaper to destroy. The 10.4s culprit, mutex_destroy does exist in 10.3 shutdown profiler, but does not cost nearly as much mutex_destroy<PolicyMutex<TTASEventMutex<BlockMutexPolicy> > > takes 3,35 % inclusive time and 0.3% exclusive time, and is called solely from within buf_pool_free_instance(). | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
The profiler is actually being helpful to identify the culprit.
| ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-14 ] | ||||||||||||||||||||||||||||||
|
hholzgra and wlad, thank you for confirming this as a 10.4 regression. I think that we could just remove the zip_pad altogether. The code was originally added to improve ROW_FORMAT=COMPRESSED write performance, back when Facebook was an active user, before they invested effort in MyRocks. I have been thinking to remove write support for ROW_FORMAT=COMPRESSED in MariaDB Server 10.6 ( It looks like the dict_table_t::autoinc_mutex could be eliminated by changing the data type of dict_table_t::autoinc to Atomic_relaxed<uint64_t>. The trickiest part to replace seems to be ha_innobase::innobase_lock_autoinc(). | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-22 ] | ||||||||||||||||||||||||||||||
|
I will try to see if I can eliminate the two mutexes. | ||||||||||||||||||||||||||||||
| Comment by Vladislav Vaintroub [ 2020-07-22 ] | ||||||||||||||||||||||||||||||
|
@Marko, maybe you can eliminate the "mutex deregister" stuff on shutdown, for the latch counters and what not, and instead of removing entries from vectors one-by-one, clear the vectors at once , or just let C++ call the static destructors. | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-22 ] | ||||||||||||||||||||||||||||||
|
Removing zip_pad would also disable two configuration parameters: innodb_compression_failure_threshold_pct and innodb_compression_pad_pct_max. It feels wrong to do that in a GA release. It does not seem easy to guarantee the correctness of ha_innobase::innobase_lock_autoinc() and its callers if we remove dict_table_t::autoinc_mutex. I think that the least intrusive change would be to replace the two problematic mutexes with std::mutex or something that does not need explicit or time-consuming destruction. Their typical hold time is very short, and the InnoDB instrumentation for deadlocks should not be needed. | ||||||||||||||||||||||||||||||
| Comment by Marko Mäkelä [ 2020-07-22 ] | ||||||||||||||||||||||||||||||
|
The RelWithDebInfo build completed the following test case in 16 seconds (not counting bootstrap) when using the MyISAM engine on /dev/shm:
With ENGINE=InnoDB and my fix of using std::mutex, it completed in 62 seconds. I think that implementing this fix is sufficient for now. We might look at speeding up innodb_fast_shutdown=2 and innodb_fast_shutdown=3 in non-debug builds later if it is truly needed. |