Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-24258

Merge dict_sys.mutex into dict_sys.latch

Details

    Description

      InnoDB data dictionary cache is protected by both dict_sys.latch (an RW-lock) and dict_sys.mutex. One reason for the redundant synchronization primitive would be eliminated by MDEV-24167: the rw-lock should not be slower than a mutex.

      Another reason for keeping a separate mutex is that sometimes, the mutex provides a mechanism to ‘upgrade’ the rw-lock. The most prominent case of that would be removed by MDEV-23484, if we could guarantee that transaction rollback is always protected by MDL. Currently, the rollback of recovered transactions is not being protected by MDL.

      Attachments

        Issue Links

          Activity

            MDEV-24258.patch applies to a development branch of MDEV-24167. Many tests would hang. In every case that I checked, the reason was that transaction rollback would attempt to acquire an exclusive latch while already holding it in shared mode. That would be fixed by MDEV-23484.

            marko Marko Mäkelä added a comment - MDEV-24258.patch applies to a development branch of MDEV-24167 . Many tests would hang. In every case that I checked, the reason was that transaction rollback would attempt to acquire an exclusive latch while already holding it in shared mode. That would be fixed by MDEV-23484 .

            I believe that this can be achieved in the 10.6 release series after all. It seems that all use of dict_sys.freeze() can be replaced with MDL. In most cases, the MDL will already have been acquired by the SQL layer.

            marko Marko Mäkelä added a comment - I believe that this can be achieved in the 10.6 release series after all. It seems that all use of dict_sys.freeze() can be replaced with MDL. In most cases, the MDL will already have been acquired by the SQL layer.

            thiru, please review.

            marko Marko Mäkelä added a comment - thiru , please review.

            Unfortunately, we got the following problem with rr record once (not without it):

            2021-07-30 12:57:47 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/
            

            In the trace that I analyzed, the first thread that started the wait would starve while later threads would acquire and release the latch. Nothing is really stuck, but either the scheduling is extremely unfair, or we may really have to think about making the lock waits fairer. For example, when a wait time start has been set by some thread, threads that are trying to acquire dict_sys.latch would yield.

            Given this potential regression, this change is too risky to be included in the upcoming 10.6.4 release. We will need more testing.

            marko Marko Mäkelä added a comment - Unfortunately, we got the following problem with rr record once (not without it): 2021-07-30 12:57:47 0 [ERROR] [FATAL] InnoDB: innodb_fatal_semaphore_wait_threshold was exceeded for dict_sys.latch. Please refer to https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ In the trace that I analyzed, the first thread that started the wait would starve while later threads would acquire and release the latch. Nothing is really stuck, but either the scheduling is extremely unfair, or we may really have to think about making the lock waits fairer. For example, when a wait time start has been set by some thread, threads that are trying to acquire dict_sys.latch would yield. Given this potential regression, this change is too risky to be included in the upcoming 10.6.4 release. We will need more testing.

            During the testing of MDEV-25919, which depends on this change, a simple reason for excessive waits around dict_sys.latch when running with rr record was identified: Table lookups would acquire dict_sys.latch in exclusive mode even when the table definition is present in the cache. We really only have to acquire an exclusive latch when a table definition needs to be loaded into the cache. The eviction policy for the InnoDB internal table definition cache will have to be changed from LRU to FIFO, because reordering elements in dict_sys.table_LRU on every loookup would require an exclusive latch.

            marko Marko Mäkelä added a comment - During the testing of MDEV-25919 , which depends on this change, a simple reason for excessive waits around dict_sys.latch when running with rr record was identified: Table lookups would acquire dict_sys.latch in exclusive mode even when the table definition is present in the cache. We really only have to acquire an exclusive latch when a table definition needs to be loaded into the cache. The eviction policy for the InnoDB internal table definition cache will have to be changed from LRU to FIFO, because reordering elements in dict_sys.table_LRU on every loookup would require an exclusive latch.

            The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex was removed along with dict_sys.mutex. The dict_sys.latch will continue be instrumented as dict_operation_lock.

            Because dict_sys.mutex will no longer 'throttle' the threads that purge InnoDB transaction history, a performance degradation may be observed unless innodb_purge_threads=1.

            The table cache eviction policy will become FIFO-like, because table lookup will be protected by a shared dict_sys.latch, instead of being protected by exclusive dict_sys.mutex. Note: Tables can never be evicted as long as locks exist on them or the tables are in use by some thread.

            marko Marko Mäkelä added a comment - The PERFORMANCE_SCHEMA instrumentation for dict_sys_mutex was removed along with dict_sys.mutex . The dict_sys.latch will continue be instrumented as dict_operation_lock . Because dict_sys.mutex will no longer 'throttle' the threads that purge InnoDB transaction history, a performance degradation may be observed unless innodb_purge_threads=1 . The table cache eviction policy will become FIFO-like, because table lookup will be protected by a shared dict_sys.latch , instead of being protected by exclusive dict_sys.mutex . Note: Tables can never be evicted as long as locks exist on them or the tables are in use by some thread.

            The http://www.brendangregg.com/offcpuanalysis.html graphs provided by axel showed that purge tasks would end up waiting much more elsewhere if their table lookups are no longer serialized by exclusive dict_sys.latch. Possibly this would lead to purge lag, which in turn would lead to degraded throughput.

            I managed to reproduce the phenomenon on my system today. Adding dummy synchronization fixed the regression for me. This is of course only a work-around, and deeper investigation on purge subsystem will be needed.

            marko Marko Mäkelä added a comment - The http://www.brendangregg.com/offcpuanalysis.html graphs provided by axel showed that purge tasks would end up waiting much more elsewhere if their table lookups are no longer serialized by exclusive dict_sys.latch . Possibly this would lead to purge lag, which in turn would lead to degraded throughput. I managed to reproduce the phenomenon on my system today. Adding dummy synchronization fixed the regression for me. This is of course only a work-around, and deeper investigation on purge subsystem will be needed.

            I tested a more complex purge throttle, and it resulted in much worse overall throughput. The previous dummy synchronization did consistently help in my tests.

            I checked the graphs again, and I see that the purge coordinator is waiting for undo pages to be read into the buffer pool, which in turn can wait for a to-be-evicted page to be written. Purge workers are waiting for index page latches (theoretically conflicting with workload, but maybe more likely just waiting for the page read). Before the removal of dict_sys.mutex, both these threads were spending 2/3 or 3/4 of their waiting time on dict_sys.mutex.

            I think that we must throttle purge based on buffer pool contention. That could be done in MDEV-26356.

            marko Marko Mäkelä added a comment - I tested a more complex purge throttle , and it resulted in much worse overall throughput. The previous dummy synchronization did consistently help in my tests. I checked the graphs again, and I see that the purge coordinator is waiting for undo pages to be read into the buffer pool, which in turn can wait for a to-be-evicted page to be written. Purge workers are waiting for index page latches (theoretically conflicting with workload, but maybe more likely just waiting for the page read). Before the removal of dict_sys.mutex , both these threads were spending 2/3 or 3/4 of their waiting time on dict_sys.mutex . I think that we must throttle purge based on buffer pool contention. That could be done in MDEV-26356 .

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.