Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23169

Optimize InnoDB code around mutexes assuming InnoDB locks are uncontended

Details

    Description

      The InnoDB engine locks have low contention during OLTP_PS/RO sysbench tests. Assuming this we used compiler specific attributes to annotate code near mutexes making code path acquiring lock easier. Specifically, branches which lead to ut_delay() function were marked `unlikely` and functions which call ut_delay() were marked `cold`. We got persistent +2% improvement in sysbench OLTP_PS 128 thread, meanwhile sysbench OTLP_RO/RW results almost didn't change.

      Our environment: Kunpeng 920 (64 cores), Ubuntu 20.04, kernel 5.6.0 with 64K pages, gcc-9.3.0.

      Look at the TTASEventMutex implementation, for instance. Here, we can mark while(!try_lock()) as unlikely:

      void enter(
      		uint32_t	max_spins,
      		uint32_t	max_delay,
      		const char*	filename,
      		uint32_t	line)
      		UNIV_NOTHROW
      	{
      		uint32_t	n_spins = 0;
      		uint32_t	n_waits = 0;
      		const uint32_t	step = max_spins;
       
      		while (UNIV_UNLIKELY(!try_lock())) {
                      ...
      

      Attachments

        Issue Links

          Activity

            is contention always unlikely ? Try OLTP-write-only, small dataset, many users.

            wlad Vladislav Vaintroub added a comment - is contention always unlikely ? Try OLTP-write-only, small dataset, many users.

            I think that this is a good idea, but I would like to see a patch and benchmarks. Some contended mutexes include buf_pool.mutex, lock_sys.mutex and log_sys.mutex. trx_sys.mutex should not be that contended since 10.3, and some minor improvement in 10.5.

            marko Marko Mäkelä added a comment - I think that this is a good idea, but I would like to see a patch and benchmarks. Some contended mutexes include buf_pool.mutex , lock_sys.mutex and log_sys.mutex . trx_sys.mutex should not be that contended since 10.3, and some minor improvement in 10.5.

            Ideally, such things are regulated by good CPU branch prediction, second choice by profile guided optimization done on well-chosen , representative benchmarks. The programmer-guided optimization, as in here, is often wrong, or works for one case and not the others, and can bring more damage that goods.

            wlad Vladislav Vaintroub added a comment - Ideally, such things are regulated by good CPU branch prediction, second choice by profile guided optimization done on well-chosen , representative benchmarks. The programmer-guided optimization, as in here, is often wrong, or works for one case and not the others, and can bring more damage that goods.

            1. Actually, not all CPUs have good branch prediction.
            2. You're right, PGO is better, however, it has its own pitfalls: it optimizes the whole program for the particular workload, so it's hard to use it in practice. So here we do a 'handmade PGO' for a small isolated code parts. The idea here is to optimize the fast path and pessimize the slow path. If the load changes (like OLTP_RW, small dataset, many users), the negative effect from the patch will be much less that the whole degradation due to high contention.

            Finally, from our team's point of view, it makes sense.

            dmitriy.philimonov Dmitriy Philimonov added a comment - 1. Actually, not all CPUs have good branch prediction. 2. You're right, PGO is better, however, it has its own pitfalls: it optimizes the whole program for the particular workload, so it's hard to use it in practice. So here we do a 'handmade PGO' for a small isolated code parts. The idea here is to optimize the fast path and pessimize the slow path. If the load changes (like OLTP_RW, small dataset, many users), the negative effect from the patch will be much less that the whole degradation due to high contention. Finally, from our team's point of view, it makes sense.

            In MDEV-21452, I would like to replace the custom InnoDB mutex and event implementations with normal mutex and condition variable. If that effort fails to improve performance, we may have to look at this.

            marko Marko Mäkelä added a comment - In MDEV-21452 , I would like to replace the custom InnoDB mutex and event implementations with normal mutex and condition variable. If that effort fails to improve performance, we may have to look at this.

            dmitriy.philimonov, did you test MariaDB 10.6? It includes completely rewritten mutex and rw-lock code.

            marko Marko Mäkelä added a comment - dmitriy.philimonov , did you test MariaDB 10.6? It includes completely rewritten mutex and rw-lock code.

            No, I didn't test this patch with MariaDB 10.6

            dmitriy.philimonov Dmitriy Philimonov added a comment - No, I didn't test this patch with MariaDB 10.6

            dmitriy.philimonov, the described change would only be applicable to older MariaDB Server releases (10.5 or earlier). I would not try to improve the performance of older releases, because it could introduce surprise regressions that are hard to reason about. See MDEV-20638 and MDEV-23475 for an example.

            During the development of simpler latching for MariaDB Server 10.6, I found that the usefulness of tweaks like "use a spinloop" (MYSQL_MUTEX_INIT_FAST) depends on the actual mutex as well as on the workload. I would expect that log_sys.mutex (which actually is a normal mutex already starting with MDEV-23855) or buf_pool.mutex the answer is "yes, this mutex is contended". For most dict_table_t::lock_mutex the answer might be "no, this is not contended", while for some it could be "yes, this is contended", entirely depending on the workload.

            In general, I think that instead of tweaking the mutex or rw-lock implementation, we should try to reduce the contention in the first place. I would like to note that std::atomic is not a silver bullet. A combination of std::atomic and appropriate partitioning of data could work.

            That said, I welcome a review and any improvements to the InnoDB synchronization primitives in MariaDB Server 10.6 or later.

            marko Marko Mäkelä added a comment - dmitriy.philimonov , the described change would only be applicable to older MariaDB Server releases (10.5 or earlier). I would not try to improve the performance of older releases, because it could introduce surprise regressions that are hard to reason about. See MDEV-20638 and MDEV-23475 for an example. During the development of simpler latching for MariaDB Server 10.6, I found that the usefulness of tweaks like "use a spinloop" ( MYSQL_MUTEX_INIT_FAST ) depends on the actual mutex as well as on the workload. I would expect that log_sys.mutex (which actually is a normal mutex already starting with MDEV-23855 ) or buf_pool.mutex the answer is "yes, this mutex is contended". For most dict_table_t::lock_mutex the answer might be "no, this is not contended", while for some it could be "yes, this is contended", entirely depending on the workload. In general, I think that instead of tweaking the mutex or rw-lock implementation, we should try to reduce the contention in the first place. I would like to note that std::atomic is not a silver bullet. A combination of std::atomic and appropriate partitioning of data could work. That said, I welcome a review and any improvements to the InnoDB synchronization primitives in MariaDB Server 10.6 or later.

            People

              marko Marko Mäkelä
              dmitriy.philimonov Dmitriy Philimonov
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.