[MDEV-25404] read-only performance regression in 10.6 - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.0
Component/s: Storage Engine - InnoDB
Labels:
- regression

Description

I see a heavy performance regression in 10.6 that did not exist ~4 weeks ago. It affects all workloads, even read-only:

--------------------------------------------------------------------------------

Test 't_1K-reads-innodb-multi' - sysbench OLTP readonly

1000 point selects per iteration, no range queries

20 tables, 1 mio rows total, engine InnoDB/XtraDB (builtin)

numbers are queries per second

#thread count           1       8       16      32      64      128     256

mariadb-10.5.6          17829   121710  198138  323747  322578  325941  320516

mariadb-10.5.7          17909   123676  196655  322730  319521  321345  317861

mariadb-10.5.8          17323   122421  194577  323129  321011  322180  318691

mariadb-10.5.9          17908   121776  195502  319654  316815  321072  315318

mariadb-10.6.0          16571   114360  187503  309040  306141  308083  304082

--------------------------------------------------------------------------------

Test 't_collate_distinct_range_utf8_unicode' - sysbench OLTP readonly

selecting distinct rows from short range, collation utf8_unicode_ci

1 table, 1 mio rows, engine InnoDB/XtraDB (builtin)

numbers are queries per second

#thread count           1       8       16      32      64      128     256

mariadb-10.5.6          7802.2  52344   90565   143215  143131  143469  142597

mariadb-10.5.7          7661.6  51889   89530   141981  141824  142293  141383

mariadb-10.5.8          7606.1  52009   90159   141161  142194  142386  141560

mariadb-10.5.9          7561.6  51927   90081   142035  142127  142333  141701

mariadb-10.6.0          7121.8  48368   84864   136162  135595  135270  134618

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

gdigest.pdf
94 kB
2021-04-13 10:44

Issue Links

relates to

MDEV-24142 rw_lock_t has unnecessarily complex wait logic

Closed

MDEV-26467 Unnecessary compare-and-swap loop in futex-based synchronization

Closed

MDEV-26476 InnoDB is missing futex support on some platforms

Closed

MDEV-25451 TPC-C in-memory performance degradation (dblwr + s.t. more)

Closed

Activity

Ascending order - Click to sort in descending order

View 6 older comments

Marko Mäkelä added a comment - 2021-04-17 10:07

We can only compose the ssux_lock using a writer mutex on systems where the mutex is not re-entrant. There is no such guarantee for the POSIX pthread_mutex_t.

Hence, for systems where only a generic mutex is available, we must retain the old SRW_LOCK_DUMMY implementation that consists of std::atomic<uint32_t>, a pthread_mutex_t and two pthread_cond_t so that in case the ownership of a buf_block_t::lock is transferred to a write completion callback thread, the submitting thread of the write will not wrongly acquire the writer mutex of the buf_block_t::lock while the previously submitted write is in progress. This problem was caught in Microsoft Windows on a system where the tests were run on relatively slow hard disk.

Using a futex-based srw_mutex writer works correctly, because re-entrant acquisition is not allowed and the mutex does not keep track of the holding thread.

I successfully tested the fix on Microsoft Windows both with and without the following patch:

diff --git a/storage/innobase/include/rw_lock.h b/storage/innobase/include/rw_lock.h

index cf02fe26c2c..7bfce1b62f7 100644

--- a/storage/innobase/include/rw_lock.h

+++ b/storage/innobase/include/rw_lock.h

@@ -22,7 +22,7 @@ this program; if not, write to the Free Software Foundation, Inc.,

 #if !(defined __linux__ || defined __OpenBSD__ || defined _WIN32)

 # define SRW_LOCK_DUMMY

-#elif 0 // defined SAFE_MUTEX

+#elif 1 // defined SAFE_MUTEX

 # define SRW_LOCK_DUMMY /* Use dummy implementation for debugging purposes */

 #endif

Marko Mäkelä added a comment - 2021-04-17 10:07 We can only compose the ssux_lock using a writer mutex on systems where the mutex is not re-entrant. There is no such guarantee for the POSIX pthread_mutex_t . Hence, for systems where only a generic mutex is available, we must retain the old SRW_LOCK_DUMMY implementation that consists of std::atomic<uint32_t> , a pthread_mutex_t and two pthread_cond_t so that in case the ownership of a buf_block_t::lock is transferred to a write completion callback thread, the submitting thread of the write will not wrongly acquire the writer mutex of the buf_block_t::lock while the previously submitted write is in progress. This problem was caught in Microsoft Windows on a system where the tests were run on relatively slow hard disk. Using a futex-based srw_mutex writer works correctly, because re-entrant acquisition is not allowed and the mutex does not keep track of the holding thread. I successfully tested the fix on Microsoft Windows both with and without the following patch: diff --git a/storage/innobase/include/rw_lock.h b/storage/innobase/include/rw_lock.h index cf02fe26c2c..7bfce1b62f7 100644 --- a/storage/innobase/include/rw_lock.h +++ b/storage/innobase/include/rw_lock.h @@ -22,7 +22,7 @@ this program; if not, write to the Free Software Foundation, Inc., #if !(defined __linux__ || defined __OpenBSD__ || defined _WIN32) # define SRW_LOCK_DUMMY -#elif 0 // defined SAFE_MUTEX +#elif 1 // defined SAFE_MUTEX # define SRW_LOCK_DUMMY /* Use dummy implementation for debugging purposes */ #endif

Vladislav Vaintroub added a comment - 2021-04-19 14:12

Performance seems ok, the code has grown to be somewhat complicated. maybe it makes sense to document relationship between different rwlocks in Innodb.

Vladislav Vaintroub added a comment - 2021-04-19 14:12 Performance seems ok, the code has grown to be somewhat complicated. maybe it makes sense to document relationship between different rwlocks in Innodb.

Marko Mäkelä added a comment - 2021-04-19 14:29

If there was a way to have non-recursive mutexes on all platforms, the fallback implementation for futex-less systems would be simpler. On GNU/Linux (with GNU libc), pthread_mutex_t is non-recursive by default and "just works". On Microsoft Windows, and on some proprietary UNIX systems, mutexes are recursive by default. There is a way to explicitly request a mutex to be recursive, but nothing to request them to be non-recursive. Recursive mutexes are inherently incompatible with "ownership passing", which is a requirement for the asynchronous writes of pages that are protected by buf_block_t::lock.

Marko Mäkelä added a comment - 2021-04-19 14:29 If there was a way to have non-recursive mutexes on all platforms, the fallback implementation for futex-less systems would be simpler. On GNU/Linux (with GNU libc), pthread_mutex_t is non-recursive by default and "just works". On Microsoft Windows, and on some proprietary UNIX systems, mutexes are recursive by default. There is a way to explicitly request a mutex to be recursive, but nothing to request them to be non-recursive. Recursive mutexes are inherently incompatible with "ownership passing", which is a requirement for the asynchronous writes of pages that are protected by buf_block_t::lock .

Marko Mäkelä added a comment - 2021-04-19 15:30

SRW_LOCK_DUMMY was renamed to SUX_LOCK_GENERIC, because on Microsoft Windows, srw_lock will always wrap SRWLOCK even if that alternative implementation were enabled.

Marko Mäkelä added a comment - 2021-04-19 15:30 SRW_LOCK_DUMMY was renamed to SUX_LOCK_GENERIC , because on Microsoft Windows, srw_lock will always wrap SRWLOCK even if that alternative implementation were enabled.

Marko Mäkelä added a comment - 2021-05-05 10:11

According to https://shift.click/blog/futex-like-apis/ documented futex equivalents do exist on some operating systems beyond Linux, OpenBSD and Microsoft Windows:

FreeBSD: _umtx_op() (UMTX_OP_WAIT_UINT_PRIVATE, UMTX_OP_WAKE_PRIVATE)
DragonflyBSD: umtx_sleep(), umtx_wakeup()

Furthermore, C++20 defines std::atomic_wait and std::atomic_notify_one.

Marko Mäkelä added a comment - 2021-05-05 10:11 According to https://shift.click/blog/futex-like-apis/ documented futex equivalents do exist on some operating systems beyond Linux, OpenBSD and Microsoft Windows: FreeBSD: _umtx_op() ( UMTX_OP_WAIT_UINT_PRIVATE , UMTX_OP_WAKE_PRIVATE ) DragonflyBSD: umtx_sleep(), umtx_wakeup() Furthermore, C++20 defines std::atomic_wait and std::atomic_notify_one .

MariaDB Server

read-only performance regression in 10.6

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration