Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25404

read-only performance regression in 10.6

Details

    Description

      I see a heavy performance regression in 10.6 that did not exist ~4 weeks ago. It affects all workloads, even read-only:

      --------------------------------------------------------------------------------
      Test 't_1K-reads-innodb-multi' - sysbench OLTP readonly
      1000 point selects per iteration, no range queries
      20 tables, 1 mio rows total, engine InnoDB/XtraDB (builtin)
      numbers are queries per second
       
      #thread count           1       8       16      32      64      128     256
      mariadb-10.5.6          17829   121710  198138  323747  322578  325941  320516
      mariadb-10.5.7          17909   123676  196655  322730  319521  321345  317861
      mariadb-10.5.8          17323   122421  194577  323129  321011  322180  318691
      mariadb-10.5.9          17908   121776  195502  319654  316815  321072  315318
      mariadb-10.6.0          16571   114360  187503  309040  306141  308083  304082
      --------------------------------------------------------------------------------
      Test 't_collate_distinct_range_utf8_unicode' - sysbench OLTP readonly
      selecting distinct rows from short range, collation utf8_unicode_ci
      1 table, 1 mio rows, engine InnoDB/XtraDB (builtin)
      numbers are queries per second
       
      #thread count           1       8       16      32      64      128     256
      mariadb-10.5.6          7802.2  52344   90565   143215  143131  143469  142597
      mariadb-10.5.7          7661.6  51889   89530   141981  141824  142293  141383
      mariadb-10.5.8          7606.1  52009   90159   141161  142194  142386  141560
      mariadb-10.5.9          7561.6  51927   90081   142035  142127  142333  141701
      mariadb-10.6.0          7121.8  48368   84864   136162  135595  135270  134618
      

      Attachments

        Issue Links

          Activity

            We can only compose the ssux_lock using a writer mutex on systems where the mutex is not re-entrant. There is no such guarantee for the POSIX pthread_mutex_t.

            Hence, for systems where only a generic mutex is available, we must retain the old SRW_LOCK_DUMMY implementation that consists of std::atomic<uint32_t>, a pthread_mutex_t and two pthread_cond_t so that in case the ownership of a buf_block_t::lock is transferred to a write completion callback thread, the submitting thread of the write will not wrongly acquire the writer mutex of the buf_block_t::lock while the previously submitted write is in progress. This problem was caught in Microsoft Windows on a system where the tests were run on relatively slow hard disk.

            Using a futex-based srw_mutex writer works correctly, because re-entrant acquisition is not allowed and the mutex does not keep track of the holding thread.

            I successfully tested the fix on Microsoft Windows both with and without the following patch:

            diff --git a/storage/innobase/include/rw_lock.h b/storage/innobase/include/rw_lock.h
            index cf02fe26c2c..7bfce1b62f7 100644
            --- a/storage/innobase/include/rw_lock.h
            +++ b/storage/innobase/include/rw_lock.h
            @@ -22,7 +22,7 @@ this program; if not, write to the Free Software Foundation, Inc.,
             
             #if !(defined __linux__ || defined __OpenBSD__ || defined _WIN32)
             # define SRW_LOCK_DUMMY
            -#elif 0 // defined SAFE_MUTEX
            +#elif 1 // defined SAFE_MUTEX
             # define SRW_LOCK_DUMMY /* Use dummy implementation for debugging purposes */
             #endif
            

            marko Marko Mäkelä added a comment - We can only compose the ssux_lock using a writer mutex on systems where the mutex is not re-entrant. There is no such guarantee for the POSIX pthread_mutex_t . Hence, for systems where only a generic mutex is available, we must retain the old SRW_LOCK_DUMMY implementation that consists of std::atomic<uint32_t> , a pthread_mutex_t and two pthread_cond_t so that in case the ownership of a buf_block_t::lock is transferred to a write completion callback thread, the submitting thread of the write will not wrongly acquire the writer mutex of the buf_block_t::lock while the previously submitted write is in progress. This problem was caught in Microsoft Windows on a system where the tests were run on relatively slow hard disk. Using a futex-based srw_mutex writer works correctly, because re-entrant acquisition is not allowed and the mutex does not keep track of the holding thread. I successfully tested the fix on Microsoft Windows both with and without the following patch: diff --git a/storage/innobase/include/rw_lock.h b/storage/innobase/include/rw_lock.h index cf02fe26c2c..7bfce1b62f7 100644 --- a/storage/innobase/include/rw_lock.h +++ b/storage/innobase/include/rw_lock.h @@ -22,7 +22,7 @@ this program; if not, write to the Free Software Foundation, Inc.,   #if !(defined __linux__ || defined __OpenBSD__ || defined _WIN32) # define SRW_LOCK_DUMMY -#elif 0 // defined SAFE_MUTEX +#elif 1 // defined SAFE_MUTEX # define SRW_LOCK_DUMMY /* Use dummy implementation for debugging purposes */ #endif

            Performance seems ok, the code has grown to be somewhat complicated. maybe it makes sense to document relationship between different rwlocks in Innodb.

            wlad Vladislav Vaintroub added a comment - Performance seems ok, the code has grown to be somewhat complicated. maybe it makes sense to document relationship between different rwlocks in Innodb.

            If there was a way to have non-recursive mutexes on all platforms, the fallback implementation for futex-less systems would be simpler. On GNU/Linux (with GNU libc), pthread_mutex_t is non-recursive by default and "just works". On Microsoft Windows, and on some proprietary UNIX systems, mutexes are recursive by default. There is a way to explicitly request a mutex to be recursive, but nothing to request them to be non-recursive. Recursive mutexes are inherently incompatible with "ownership passing", which is a requirement for the asynchronous writes of pages that are protected by buf_block_t::lock.

            marko Marko Mäkelä added a comment - If there was a way to have non-recursive mutexes on all platforms, the fallback implementation for futex-less systems would be simpler. On GNU/Linux (with GNU libc), pthread_mutex_t is non-recursive by default and "just works". On Microsoft Windows, and on some proprietary UNIX systems, mutexes are recursive by default. There is a way to explicitly request a mutex to be recursive, but nothing to request them to be non-recursive. Recursive mutexes are inherently incompatible with "ownership passing", which is a requirement for the asynchronous writes of pages that are protected by buf_block_t::lock .

            SRW_LOCK_DUMMY was renamed to SUX_LOCK_GENERIC, because on Microsoft Windows, srw_lock will always wrap SRWLOCK even if that alternative implementation were enabled.

            marko Marko Mäkelä added a comment - SRW_LOCK_DUMMY was renamed to SUX_LOCK_GENERIC , because on Microsoft Windows, srw_lock will always wrap SRWLOCK even if that alternative implementation were enabled.

            According to https://shift.click/blog/futex-like-apis/ documented futex equivalents do exist on some operating systems beyond Linux, OpenBSD and Microsoft Windows:

            Furthermore, C++20 defines std::atomic_wait and std::atomic_notify_one.

            marko Marko Mäkelä added a comment - According to https://shift.click/blog/futex-like-apis/ documented futex equivalents do exist on some operating systems beyond Linux, OpenBSD and Microsoft Windows: FreeBSD: _umtx_op() ( UMTX_OP_WAIT_UINT_PRIVATE , UMTX_OP_WAKE_PRIVATE ) DragonflyBSD: umtx_sleep(), umtx_wakeup() Furthermore, C++20 defines std::atomic_wait and std::atomic_notify_one .

            People

              marko Marko Mäkelä
              axel Axel Schwenke
              Votes:
              1 Vote for this issue
              Watchers:
              8 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.