Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-22778

Slow InnoDB shutdown on large instance

Details

    Description

      Shutting down al large windows instance (innodb_pool_buffer_size=384GB, > 500K tables) takes much longer with 10.4 than it did with 10.3.

      Shutdown takes around 10 minutes even with innodb_fast_shutdown=1 or =3.

      innodb_defragment was originally enabled, but even after turning that off shutdown still took several minutes, even though less than with defragmentation enabled.

      Nothing special in the error log besides [Note] InnoDB: Waiting for master threadto exit once per minute.

      Attachments

        1. mdev-22778.sql.gz
          9.38 MB
        2. screenshot-1.png
          screenshot-1.png
          535 kB
        3. screenshot-2.png
          screenshot-2.png
          851 kB
        4. image-2020-07-14-19-18-54-693.png
          image-2020-07-14-19-18-54-693.png
          851 kB
        5. image-2020-07-14-19-18-55-706.png
          image-2020-07-14-19-18-55-706.png
          851 kB
        6. mdev-11778-workaround.patch
          5 kB

        Issue Links

          Activity

            hholzgra Hartmut Holzgraefe created issue -
            hholzgra Hartmut Holzgraefe made changes -
            Field Original Value New Value
            Description Shutting down al large windows instance ({{innodb_pool_buffer_size=384GB}}, > 500K tables) takes much longer with 10.4 than it did with 10.3.

            Shutdown takes around 10 minutes even with {{innodb_fast_shutdown=1}} or {{=3}}.

            {{innodb_defragment}} was originally enabled, but even after turning that off shutdown still took several minutes, even though less than with defragmentation enabled.

            Nothing special in the error log besides {{[Note] InnoDB: Waiting for master threadto exit}} once per minute.

             	ntdll.dll!RtlFreeHeap()	Unknown
             	ucrtbase.dll!_free_base()	Unknown
            >	[Inline Frame] mysqld.exe!ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1>::deallocate(std::_Tree_node<dict_foreign_t *,void *> *) Line 438	C++
             	[Inline Frame] mysqld.exe!std::_Normal_allocator_traits<ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1>>::deallocate(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 526	C++
             	[Inline Frame] mysqld.exe!std::_Tree_node<dict_foreign_t *,void *>::_Freenode0(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 373	C++
             	[Inline Frame] mysqld.exe!std::_Tree_val<std::_Tree_simple_types<dict_foreign_t *>>::_Erase_head(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 753	C++
             	[Inline Frame] mysqld.exe!std::_Tree<std::_Tset_traits<dict_foreign_t *,dict_foreign_compare,ut_allocator<dict_foreign_t *,1>,0>>::{dtor}() Line 1189	C++
             	mysqld.exe!dict_mem_table_free(dict_table_t * table=0x0000014f5c0d40f8) Line 228	C++
             	mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x0000014f5c0d40f8, bool lru, bool keep) Line 1731	C++
             	mysqld.exe!dict_sys_t::close() Line 5893	C++
             	mysqld.exe!innodb_shutdown() Line 2520	C++
             	mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340	C++
            

            wlad Vladislav Vaintroub added a comment - ntdll.dll!RtlFreeHeap() Unknown ucrtbase.dll!_free_base() Unknown > [Inline Frame] mysqld.exe!ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1>::deallocate(std::_Tree_node<dict_foreign_t *,void *> *) Line 438 C++ [Inline Frame] mysqld.exe!std::_Normal_allocator_traits<ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1>>::deallocate(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 526 C++ [Inline Frame] mysqld.exe!std::_Tree_node<dict_foreign_t *,void *>::_Freenode0(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 373 C++ [Inline Frame] mysqld.exe!std::_Tree_val<std::_Tree_simple_types<dict_foreign_t *>>::_Erase_head(ut_allocator<std::_Tree_node<dict_foreign_t *,void *>,1> &) Line 753 C++ [Inline Frame] mysqld.exe!std::_Tree<std::_Tset_traits<dict_foreign_t *,dict_foreign_compare,ut_allocator<dict_foreign_t *,1>,0>>::{dtor}() Line 1189 C++ mysqld.exe!dict_mem_table_free(dict_table_t * table=0x0000014f5c0d40f8) Line 228 C++ mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x0000014f5c0d40f8, bool lru, bool keep) Line 1731 C++ mysqld.exe!dict_sys_t::close() Line 5893 C++ mysqld.exe!innodb_shutdown() Line 2520 C++ mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340 C++
            wlad Vladislav Vaintroub added a comment - - edited

            The bulk of time is spent here
            It looks like O(n^3) , where n is the number of elements in data dictionary

               for every element in data dictionary (O(n))
                 for all its mutexes (O(1))
                 erase latch counter in std::vector(this should be O(n^2))
            

             	mysqld.exe!std::_Find_unchecked1<LatchCounter::Count * *,LatchCounter::Count *>(LatchCounter::Count * * _First=0x000001537a99f3b0, LatchCounter::Count * * const _Last=0x000001537addb258, LatchCounter::Count * const & _Val=0x00000153a7a3cf68, std::integral_constant<bool,0> __formal={...}) Line 4640	C++
             	mysqld.exe!std::remove<std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>>,LatchCounter::Count *>(std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>> _First={...}, const std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>> _Last={...}, LatchCounter::Count * const & _Val=0x00000153a7a3cf68) Line 2005	C++
            >	[Inline Frame] mysqld.exe!LatchCounter::single_deregister(LatchCounter::Count * count=0x00000153a7a3cf68) Line 643	C++
             	[Inline Frame] mysqld.exe!GenericPolicy<OSTrackMutex<GenericPolicy>>::destroy() Line 232	C++
             	[Inline Frame] mysqld.exe!PolicyMutex<OSTrackMutex<GenericPolicy>>::destroy() Line 673	C++
             	mysqld.exe!mutex_destroy<PolicyMutex<OSTrackMutex<GenericPolicy>>>(PolicyMutex<OSTrackMutex<GenericPolicy>> * mutex) Line 193	C++
             	mysqld.exe!dict_mem_index_free(dict_index_t * index=0x00000153a7a3cde8) Line 1097	C++
             	mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x00000153a3d88cc8, bool lru, bool keep) Line 1685	C++
             	mysqld.exe!dict_sys_t::close() Line 5893	C++
             	mysqld.exe!innodb_shutdown() Line 2520	C++
             	mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340	C++
            
            

            wlad Vladislav Vaintroub added a comment - - edited The bulk of time is spent here It looks like O(n^3) , where n is the number of elements in data dictionary for every element in data dictionary (O(n)) for all its mutexes (O(1)) erase latch counter in std::vector(this should be O(n^2)) mysqld.exe!std::_Find_unchecked1<LatchCounter::Count * *,LatchCounter::Count *>(LatchCounter::Count * * _First=0x000001537a99f3b0, LatchCounter::Count * * const _Last=0x000001537addb258, LatchCounter::Count * const & _Val=0x00000153a7a3cf68, std::integral_constant<bool,0> __formal={...}) Line 4640 C++ mysqld.exe!std::remove<std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>>,LatchCounter::Count *>(std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>> _First={...}, const std::_Vector_iterator<std::_Vector_val<std::_Simple_types<LatchCounter::Count *>>> _Last={...}, LatchCounter::Count * const & _Val=0x00000153a7a3cf68) Line 2005 C++ > [Inline Frame] mysqld.exe!LatchCounter::single_deregister(LatchCounter::Count * count=0x00000153a7a3cf68) Line 643 C++ [Inline Frame] mysqld.exe!GenericPolicy<OSTrackMutex<GenericPolicy>>::destroy() Line 232 C++ [Inline Frame] mysqld.exe!PolicyMutex<OSTrackMutex<GenericPolicy>>::destroy() Line 673 C++ mysqld.exe!mutex_destroy<PolicyMutex<OSTrackMutex<GenericPolicy>>>(PolicyMutex<OSTrackMutex<GenericPolicy>> * mutex) Line 193 C++ mysqld.exe!dict_mem_index_free(dict_index_t * index=0x00000153a7a3cde8) Line 1097 C++ mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x00000153a3d88cc8, bool lru, bool keep) Line 1685 C++ mysqld.exe!dict_sys_t::close() Line 5893 C++ mysqld.exe!innodb_shutdown() Line 2520 C++ mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340 C++
            wlad Vladislav Vaintroub added a comment - - edited

            Freeing small allocations, some millions of times

             	ntdll.dll!RtlpHeapFindListLookupEntry()	Unknown
             	ntdll.dll!RtlpInsertFreeBlock()	Unknown
             	ntdll.dll!RtlpDeCommitFreeBlock()	Unknown
             	ntdll.dll!RtlpFreeHeap()	Unknown
             	ntdll.dll!RtlFreeHeap()	Unknown
             	ntdll.dll!RtlpFreeUserBlockToHeap()	Unknown
             	ntdll.dll!RtlpFreeUserBlock()	Unknown
             	ntdll.dll!RtlFreeHeap()	Unknown
             	ucrtbase.dll!_free_base()	Unknown
            >	[Inline Frame] mysqld.exe!mem_heap_free(mem_block_info_t * heap=0x000001506481b5e0) Line 417	C++
             	mysqld.exe!dict_mem_index_free(dict_index_t * index) Line 1107	C++
             	mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x00000150434f2fe8, bool lru, bool keep) Line 1685	C++
             	mysqld.exe!dict_sys_t::close() Line 5893	C++
             	mysqld.exe!innodb_shutdown() Line 2520	C++
             	mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340	C++
            
            

            wlad Vladislav Vaintroub added a comment - - edited Freeing small allocations, some millions of times ntdll.dll!RtlpHeapFindListLookupEntry() Unknown ntdll.dll!RtlpInsertFreeBlock() Unknown ntdll.dll!RtlpDeCommitFreeBlock() Unknown ntdll.dll!RtlpFreeHeap() Unknown ntdll.dll!RtlFreeHeap() Unknown ntdll.dll!RtlpFreeUserBlockToHeap() Unknown ntdll.dll!RtlpFreeUserBlock() Unknown ntdll.dll!RtlFreeHeap() Unknown ucrtbase.dll!_free_base() Unknown > [Inline Frame] mysqld.exe!mem_heap_free(mem_block_info_t * heap=0x000001506481b5e0) Line 417 C++ mysqld.exe!dict_mem_index_free(dict_index_t * index) Line 1107 C++ mysqld.exe!dict_sys_t::remove(dict_table_t * table=0x00000150434f2fe8, bool lru, bool keep) Line 1685 C++ mysqld.exe!dict_sys_t::close() Line 5893 C++ mysqld.exe!innodb_shutdown() Line 2520 C++ mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340 C++

            Freeing buffer pool rwlocks

             	ntdll.dll!RtlpHeapFindListLookupEntry()	Unknown
             	ntdll.dll!RtlpInsertFreeBlock()	Unknown
             	ntdll.dll!RtlpDeCommitFreeBlock()	Unknown
             	ntdll.dll!RtlpFreeHeap()	Unknown
             	ntdll.dll!RtlFreeHeap()	Unknown
             	ntdll.dll!RtlpFreeUserBlockToHeap()	Unknown
             	ntdll.dll!RtlpFreeUserBlock()	Unknown
             	ntdll.dll!RtlFreeHeap()	Unknown
             	ucrtbase.dll!_free_base()	Unknown
            >	mysqld.exe!os_event_destroy(os_event * & event) Line 514	C++
             	mysqld.exe!rw_lock_free_func(rw_lock_t * lock=0x000001114d26ed70) Line 263	C++
             	mysqld.exe!buf_pool_free_instance(buf_pool_t * buf_pool=0x000000e7ff891720) Line 2062	C++
             	mysqld.exe!buf_pool_free(unsigned __int64 n_instances=12) Line 2140	C++
             	mysqld.exe!innodb_shutdown() Line 2539	C++
             	mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340	C++
            
            

            wlad Vladislav Vaintroub added a comment - Freeing buffer pool rwlocks ntdll.dll!RtlpHeapFindListLookupEntry() Unknown ntdll.dll!RtlpInsertFreeBlock() Unknown ntdll.dll!RtlpDeCommitFreeBlock() Unknown ntdll.dll!RtlpFreeHeap() Unknown ntdll.dll!RtlFreeHeap() Unknown ntdll.dll!RtlpFreeUserBlockToHeap() Unknown ntdll.dll!RtlpFreeUserBlock() Unknown ntdll.dll!RtlFreeHeap() Unknown ucrtbase.dll!_free_base() Unknown > mysqld.exe!os_event_destroy(os_event * & event) Line 514 C++ mysqld.exe!rw_lock_free_func(rw_lock_t * lock=0x000001114d26ed70) Line 263 C++ mysqld.exe!buf_pool_free_instance(buf_pool_t * buf_pool=0x000000e7ff891720) Line 2062 C++ mysqld.exe!buf_pool_free(unsigned __int64 n_instances=12) Line 2140 C++ mysqld.exe!innodb_shutdown() Line 2539 C++ mysqld.exe!innobase_end(handlerton * __formal=0x000000e7fb62cd38, ha_panic_function __formal=HA_PANIC_CLOSE) Line 4340 C++
            elenst Elena Stepanova made changes -
            Fix Version/s 10.4 [ 22408 ]
            Assignee Marko Mäkelä [ marko ]

            wlad, thank you for the analysis. Could you please try to repeat this yourself too?

            It is unclear to me why the shutdown would take longer in 10.4 compared to 10.3. This code has not been changed much since the 10.2 GA release. MDEV-22523 recently removed rtr_ssn_mutex, which might alleviate this problem a little bit.

            Could it be that in 10.3, table_definition_cache ends up being smaller in the reported environment than in 10.4? Since MariaDB Server 10.0 (and MySQL 5.6), InnoDB implements a LRU policy for evicting table definitions. The target size of InnoDB’s internal table definition cache is controlled by the SQL-layer table_definition_cache parameter (tdc_size). If the table cache eviction in InnoDB was more ‘eager’ in 10.3, then there would be fewer dictionary objects to destroy at shutdown.

            marko Marko Mäkelä added a comment - wlad , thank you for the analysis. Could you please try to repeat this yourself too? It is unclear to me why the shutdown would take longer in 10.4 compared to 10.3. This code has not been changed much since the 10.2 GA release. MDEV-22523 recently removed rtr_ssn_mutex , which might alleviate this problem a little bit. Could it be that in 10.3, table_definition_cache ends up being smaller in the reported environment than in 10.4? Since MariaDB Server 10.0 (and MySQL 5.6), InnoDB implements a LRU policy for evicting table definitions. The target size of InnoDB’s internal table definition cache is controlled by the SQL-layer table_definition_cache parameter ( tdc_size ). If the table cache eviction in InnoDB was more ‘eager’ in 10.3, then there would be fewer dictionary objects to destroy at shutdown.

            tdc_size is the same in 10.3 and 10.4 . I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet.

            wlad Vladislav Vaintroub added a comment - tdc_size is the same in 10.3 and 10.4 . I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet.

            Let us ignore the implied claim that this is a regression in 10.4, and instead believe the analysis.

            I believe that the reason to call os_event_destroy() on all event-based mutexes and rw-locks in InnoDB is to satisfy the LatchDebug checks.

            Furthermore, I also believe that resource leaks can serve as a proxy for even more serious problems, such as failure to properly shut down a critical subsystem. That is why I would rather preserve the current behavior by default.

            That being said, I think that with innodb_fast_shutdown=3 (which was introduced in 10.3 by MDEV-15832) and innodb_fast_shutdown=2 it would make sense to skip the ‘fold up the tents’ steps in non-debug builds. That would still give us the opportunity to catch errors in most cases, whie the affected users could set innodb_fast_shutdown=3 to get ultra-fast shutdown starting in MariaDB 10.3 and later.

            marko Marko Mäkelä added a comment - Let us ignore the implied claim that this is a regression in 10.4, and instead believe the analysis. I believe that the reason to call os_event_destroy() on all event-based mutexes and rw-locks in InnoDB is to satisfy the LatchDebug checks. Furthermore, I also believe that resource leaks can serve as a proxy for even more serious problems, such as failure to properly shut down a critical subsystem. That is why I would rather preserve the current behavior by default. That being said, I think that with innodb_fast_shutdown=3 (which was introduced in 10.3 by MDEV-15832 ) and innodb_fast_shutdown=2 it would make sense to skip the ‘fold up the tents’ steps in non-debug builds. That would still give us the opportunity to catch errors in most cases, whie the affected users could set innodb_fast_shutdown=3 to get ultra-fast shutdown starting in MariaDB 10.3 and later.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.5 [ 23123 ]
            Assignee Marko Mäkelä [ marko ] Eugene Kosov [ kevg ]
            Summary Slow InnoDB shutdown on large 10.4 instance on Windows Slow InnoDB shutdown on large instance

            > I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet.

            I'm pretty sure I did, but unfortunately I can't find any proof of that ...

            hholzgra Hartmut Holzgraefe added a comment - > I'm not sure anyone checked the claim 10.3 is faster to shutdown, yet. I'm pretty sure I did, but unfortunately I can't find any proof of that ...

            I apparently didn't ... but things are getting even more strange. On 10.4, I had shutdown take ~2 minutes with a 500K table definition cache fully used, 30sec with a 200K one.

            Now with 10.3 I'm already waiting for 25 minutes for shutdown to complete with the 500K test case ...

            I'll retry 10.4 once more next, but it will take some time ...

            hholzgra Hartmut Holzgraefe added a comment - I apparently didn't ... but things are getting even more strange. On 10.4, I had shutdown take ~2 minutes with a 500K table definition cache fully used, 30sec with a 200K one. Now with 10.3 I'm already waiting for 25 minutes for shutdown to complete with the 500K test case ... I'll retry 10.4 once more next, but it will take some time ...

            Ignore last comment, 10.3 shutdown is fast, even with

            Open_table_definitions | 500000

            it only takes 4 seconds to shut down. Previous shutdown
            was blocked by something else ...

            Now re-running 10.4 test ...

            hholzgra Hartmut Holzgraefe added a comment - Ignore last comment, 10.3 shutdown is fast, even with Open_table_definitions | 500000 it only takes 4 seconds to shut down. Previous shutdown was blocked by something else ... Now re-running 10.4 test ...

            10.4.13 -> two minutes to shutown, so it is indeed a 10.4 specific problem, and my memory has not failed me.

            hholzgra Hartmut Holzgraefe added a comment - 10.4.13 -> two minutes to shutown, so it is indeed a 10.4 specific problem, and my memory has not failed me.

            hholzgra, can you share your repro script?

            wlad Vladislav Vaintroub added a comment - hholzgra , can you share your repro script?
            hholzgra Hartmut Holzgraefe made changes -
            Attachment mdev-22778.sql.gz [ 52778 ]
            hholzgra Hartmut Holzgraefe added a comment - - edited

            Attachted dump creates almost 500K empty tables in the "test" database (warning, takes a few hours, I can also share a mariabackup, but that's about 45GB).

            The only non-default configuration I use is:

            [mysqld]
            innodb_buffer_pool_size=16G
            innodb_flush_log_at_trx_commit=2
            table_definition_cache=500000
            

            After restoring the dump is complete I restart mysqld to get a clean state, then run

            mysqldump --skip-lock-tables test >/dev/null

            to make sure every table is opened and read from, verify this with:

            SHOW TABLES LIKE 'Open_table_definitions';
            +--------------------------+--------+
            | Variable_name            | Value  |
            +--------------------------+--------+
            | Open_table_definitions   | 474293 |
            +--------------------------+--------+
            

            After that I simply do mysqladmin shutdown and check the error log for the time between "Normal shutdown" and "Shutdown complete" log lines.

            hholzgra Hartmut Holzgraefe added a comment - - edited Attachted dump creates almost 500K empty tables in the "test" database (warning, takes a few hours, I can also share a mariabackup, but that's about 45GB). The only non-default configuration I use is: [mysqld] innodb_buffer_pool_size=16G innodb_flush_log_at_trx_commit=2 table_definition_cache=500000 After restoring the dump is complete I restart mysqld to get a clean state, then run mysqldump --skip-lock-tables test >/dev/null to make sure every table is opened and read from, verify this with: SHOW TABLES LIKE 'Open_table_definitions'; +--------------------------+--------+ | Variable_name | Value | +--------------------------+--------+ | Open_table_definitions | 474293 | +--------------------------+--------+ After that I simply do mysqladmin shutdown and check the error log for the time between "Normal shutdown" and "Shutdown complete" log lines.

            For faster setup, here's a copy of my 10.3.23 test instances datadir:

            http://get-map.org/MDEV-22776-10.3.23.tar.gz

            hholzgra Hartmut Holzgraefe added a comment - For faster setup, here's a copy of my 10.3.23 test instances datadir: http://get-map.org/MDEV-22776-10.3.23.tar.gz
            hholzgra Hartmut Holzgraefe added a comment - and for 10.4 http://get-map.org/MDEV-22776-10.4.13.tar.gz
            wlad Vladislav Vaintroub added a comment - - edited

            I reproduced it, it took just 43 minutes to load, with dummy tables,

            DELIMETER //
            FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat("create table t",i), " (i int)"); end for; //
            

            here is what VS profiler tells me, about shutdown after the above create 500000 tables is finished:

            Function Name Total CPU (unit, %) Self CPU (unit, %) Module
            LatchCounter::single_deregister 19860 (45,11 %) 19838 (45,06 %) mysqld.exe
            mutex_destroy<PolicyMutex<TTASEventMutex<GenericPolicy> > > 19688 (44,71 %) 19642 (44,61 %) mysqld.exe
            _free_base 1654 (3,76 %) 1654 (3,76 %) ucrtbase.dll
            sync_file_created_deregister 879 (2,00 %) 641 (1,46 %) mysqld.exe
            VirtualFree 641 (1,46 %) 641 (1,46 %) KernelBase.dll
            RtlDeleteCriticalSection 349 (0,79 %) 349 (0,79 %) ntdll.dll
            wlad Vladislav Vaintroub added a comment - - edited I reproduced it, it took just 43 minutes to load, with dummy tables, DELIMETER // FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat( "create table t" ,i), " (i int)" ); end for ; // here is what VS profiler tells me, about shutdown after the above create 500000 tables is finished: Function Name Total CPU (unit, %) Self CPU (unit, %) Module LatchCounter::single_deregister 19860 (45,11 %) 19838 (45,06 %) mysqld.exe mutex_destroy<PolicyMutex<TTASEventMutex<GenericPolicy> > > 19688 (44,71 %) 19642 (44,61 %) mysqld.exe _free_base 1654 (3,76 %) 1654 (3,76 %) ucrtbase.dll sync_file_created_deregister 879 (2,00 %) 641 (1,46 %) mysqld.exe VirtualFree 641 (1,46 %) 641 (1,46 %) KernelBase.dll RtlDeleteCriticalSection 349 (0,79 %) 349 (0,79 %) ntdll.dll
            wlad Vladislav Vaintroub made changes -
            Attachment screenshot-1.png [ 52807 ]

            The whole time of this mutex_destroy is spend in the inlined PolicyMutex<TTASEventMutex<GenericPolicy>>::destroy , according to

            wlad Vladislav Vaintroub added a comment - The whole time of this mutex_destroy is spend in the inlined PolicyMutex<TTASEventMutex<GenericPolicy>>::destroy , according to
            wlad Vladislav Vaintroub made changes -
            Attachment screenshot-2.png [ 52808 ]
            wlad Vladislav Vaintroub added a comment - - edited

            And there is yet another mutex_destroy<PolicyMutex<OSTrackMutex<GenericPolicy> > >

            mysqld.exe , that takes another 45% of the whole time, in LatchCounter::single_deregister

            If not for these mutex_destroys, shutdown would be 10 times faster.

            wlad Vladislav Vaintroub added a comment - - edited And there is yet another mutex_destroy<PolicyMutex< OSTrackMutex <GenericPolicy> > > mysqld.exe , that takes another 45% of the whole time, in LatchCounter::single_deregister If not for these mutex_destroys, shutdown would be 10 times faster.
            wlad Vladislav Vaintroub made changes -
            Attachment image-2020-07-14-19-18-55-706.png [ 52809 ]
            Attachment image-2020-07-14-19-18-54-693.png [ 52810 ]

            Note, that I did the experiment with the innodb_file_per_table=0, as the original reporter .

            wlad Vladislav Vaintroub added a comment - Note, that I did the experiment with the innodb_file_per_table=0, as the original reporter .

            big picture is that 96.58% of the whole shutdown time is taken by the dict_sys.remove(prev_table); inside dict_sys_t::close(), but we knew this all the time

            wlad Vladislav Vaintroub added a comment - big picture is that 96.58% of the whole shutdown time is taken by the dict_sys.remove(prev_table); inside dict_sys_t::close(), but we knew this all the time
            wlad Vladislav Vaintroub added a comment - - edited

            I ran the same test now in 10.3, and it turns out, there is indeed nothing of this kind.

            The most expensive individual functions on shutdown (which is indeed much faster) are

            Function Name Total CPU (unit, %) Self CPU (unit, %) Module
            RtlFreeHeap 1938 (31,10 %) 1938 (31,10 %) ntdll.dll
            System Call - ntoskrnl.exe 1527 (24,50 %) 1527 (24,50 %) ntoskrnl.exe
            VirtualFree 672 (10,78 %) 672 (10,78 %) KernelBase.dll
            RtlDeleteCriticalSection 539 (8,65 %) 539 (8,65 %) ntdll.dll
            dict_table_remove_from_cache_low 3429 (55,02 %) 194 (3,11 %) mysqld.exe

            I guess, a bloated PolicyMutex (ib_mutex_t?), or several were added somewhere into the dictionary object in 10.4 timeline, and this is the actual cause of the slow shutdown.

            I think, OSMutex might suffice, or std::mutex perhaps. They are less bloated, do not spin too much, and are much cheaper to destroy.

            The 10.4s culprit, mutex_destroy does exist in 10.3 shutdown profiler, but does not cost nearly as much

            mutex_destroy<PolicyMutex<TTASEventMutex<BlockMutexPolicy> > > takes 3,35 % inclusive time and 0.3% exclusive time, and is called solely from within buf_pool_free_instance().

            wlad Vladislav Vaintroub added a comment - - edited I ran the same test now in 10.3, and it turns out, there is indeed nothing of this kind. The most expensive individual functions on shutdown (which is indeed much faster) are Function Name Total CPU (unit, %) Self CPU (unit, %) Module RtlFreeHeap 1938 (31,10 %) 1938 (31,10 %) ntdll.dll System Call - ntoskrnl.exe 1527 (24,50 %) 1527 (24,50 %) ntoskrnl.exe VirtualFree 672 (10,78 %) 672 (10,78 %) KernelBase.dll RtlDeleteCriticalSection 539 (8,65 %) 539 (8,65 %) ntdll.dll dict_table_remove_from_cache_low 3429 (55,02 %) 194 (3,11 %) mysqld.exe I guess, a bloated PolicyMutex (ib_mutex_t?), or several were added somewhere into the dictionary object in 10.4 timeline, and this is the actual cause of the slow shutdown. I think, OSMutex might suffice, or std::mutex perhaps. They are less bloated, do not spin too much, and are much cheaper to destroy. The 10.4s culprit, mutex_destroy does exist in 10.3 shutdown profiler, but does not cost nearly as much mutex_destroy<PolicyMutex<TTASEventMutex<BlockMutexPolicy> > > takes 3,35 % inclusive time and 0.3% exclusive time, and is called solely from within buf_pool_free_instance().
            wlad Vladislav Vaintroub added a comment - - edited

            The profiler is actually being helpful to identify the culprit.
            So here are the 2 mutexes, which were lazily used prior to 10.4 , if I'm not mistaken.

            wlad Vladislav Vaintroub added a comment - - edited The profiler is actually being helpful to identify the culprit. So here are the 2 mutexes, which were lazily used prior to 10.4 , if I'm not mistaken. mutex_free(&table->autoinc_mutex); is taking 45% CPU on shutdown. this one used to be lazily allocated before https://github.com/mariadb/server/commit/e9a5f288f21c another one is mutex_free(&index->zip_pad.mutex); is taking 46% of CPU on shutdown. this comes from from https://github.com/mariadb/server/commit/adb640e25ad3

            hholzgra and wlad, thank you for confirming this as a 10.4 regression.

            I think that we could just remove the zip_pad altogether. The code was originally added to improve ROW_FORMAT=COMPRESSED write performance, back when Facebook was an active user, before they invested effort in MyRocks. I have been thinking to remove write support for ROW_FORMAT=COMPRESSED in MariaDB Server 10.6 (MDEV-22367), to simplify the buffer pool code further. I think that it should be OK to lose some write performance on that obscure format.

            It looks like the dict_table_t::autoinc_mutex could be eliminated by changing the data type of dict_table_t::autoinc to Atomic_relaxed<uint64_t>. The trickiest part to replace seems to be ha_innobase::innobase_lock_autoinc().

            marko Marko Mäkelä added a comment - hholzgra and wlad , thank you for confirming this as a 10.4 regression. I think that we could just remove the zip_pad altogether. The code was originally added to improve ROW_FORMAT=COMPRESSED write performance, back when Facebook was an active user, before they invested effort in MyRocks. I have been thinking to remove write support for ROW_FORMAT=COMPRESSED in MariaDB Server 10.6 ( MDEV-22367 ), to simplify the buffer pool code further. I think that it should be OK to lose some write performance on that obscure format. It looks like the dict_table_t::autoinc_mutex could be eliminated by changing the data type of dict_table_t::autoinc to Atomic_relaxed<uint64_t> . The trickiest part to replace seems to be ha_innobase::innobase_lock_autoinc() .
            marko Marko Mäkelä made changes -
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            marko Marko Mäkelä made changes -
            Assignee Eugene Kosov [ kevg ] Marko Mäkelä [ marko ]
            hholzgra Hartmut Holzgraefe made changes -
            Attachment mdev-11778-workaround.patch [ 52898 ]
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]

            I will try to see if I can eliminate the two mutexes.

            marko Marko Mäkelä added a comment - I will try to see if I can eliminate the two mutexes.

            @Marko, maybe you can eliminate the "mutex deregister" stuff on shutdown, for the latch counters and what not, and instead of removing entries from vectors one-by-one, clear the vectors at once , or just let C++ call the static destructors.

            wlad Vladislav Vaintroub added a comment - @Marko, maybe you can eliminate the "mutex deregister" stuff on shutdown, for the latch counters and what not, and instead of removing entries from vectors one-by-one, clear the vectors at once , or just let C++ call the static destructors.

            Removing zip_pad would also disable two configuration parameters: innodb_compression_failure_threshold_pct and innodb_compression_pad_pct_max. It feels wrong to do that in a GA release.

            It does not seem easy to guarantee the correctness of ha_innobase::innobase_lock_autoinc() and its callers if we remove dict_table_t::autoinc_mutex.

            I think that the least intrusive change would be to replace the two problematic mutexes with std::mutex or something that does not need explicit or time-consuming destruction. Their typical hold time is very short, and the InnoDB instrumentation for deadlocks should not be needed.

            marko Marko Mäkelä added a comment - Removing zip_pad would also disable two configuration parameters: innodb_compression_failure_threshold_pct and innodb_compression_pad_pct_max . It feels wrong to do that in a GA release. It does not seem easy to guarantee the correctness of ha_innobase::innobase_lock_autoinc() and its callers if we remove dict_table_t::autoinc_mutex . I think that the least intrusive change would be to replace the two problematic mutexes with std::mutex or something that does not need explicit or time-consuming destruction. Their typical hold time is very short, and the InnoDB instrumentation for deadlocks should not be needed.
            marko Marko Mäkelä added a comment - - edited

            The RelWithDebInfo build completed the following test case in 16 seconds (not counting bootstrap) when using the MyISAM engine on /dev/shm:

            --source include/have_innodb.inc
            DELIMITER //;
            FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat("create table t",i), " (i int) ENGINE=MyISAM;"); end for; //
            DELIMITER ;//
            --source include/restart_mysqld.inc
            

            With ENGINE=InnoDB and my fix of using std::mutex, it completed in 62 seconds.
            With ENGINE=InnoDB and without my fix, it completed in 69 seconds.
            Creating the tables takes the majority of the time (59 seconds).
            The shutdown and restart without the fix would take about 9 seconds, and with the fix, about 3 seconds.

            I think that implementing this fix is sufficient for now. We might look at speeding up innodb_fast_shutdown=2 and innodb_fast_shutdown=3 in non-debug builds later if it is truly needed.

            marko Marko Mäkelä added a comment - - edited The RelWithDebInfo build completed the following test case in 16 seconds (not counting bootstrap) when using the MyISAM engine on /dev/shm : --source include/have_innodb.inc DELIMITER //; FOR i IN 1..100000 do EXECUTE IMMEDIATE concat(concat( "create table t" ,i), " (i int) ENGINE=MyISAM;" ); end for ; // DELIMITER ;// --source include/restart_mysqld.inc With ENGINE=InnoDB and my fix of using std::mutex , it completed in 62 seconds. With ENGINE=InnoDB and without my fix, it completed in 69 seconds. Creating the tables takes the majority of the time (59 seconds). The shutdown and restart without the fix would take about 9 seconds, and with the fix, about 3 seconds. I think that implementing this fix is sufficient for now. We might look at speeding up innodb_fast_shutdown=2 and innodb_fast_shutdown=3 in non-debug builds later if it is truly needed.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.4.14 [ 24305 ]
            Fix Version/s 10.5.5 [ 24423 ]
            Fix Version/s 10.4 [ 22408 ]
            Fix Version/s 10.5 [ 23123 ]
            Resolution Fixed [ 1 ]
            Status In Progress [ 3 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 109417 ] MariaDB v4 [ 157894 ]
            Roel Roel Van de Paar made changes -
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 170968

            People

              marko Marko Mäkelä
              hholzgra Hartmut Holzgraefe
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.