[MDEV-26769] InnoDB internal latches do not support hardware lock elision Created: 2021-10-05  Updated: 2023-08-31  Resolved: 2021-10-22

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.6.5

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: performance

Attachments: PDF File MDEV-26769-1.pdf    
Issue Links:
Blocks
is blocked by MDEV-26826 Duplicated computations of buf_pool.p... Closed
is blocked by MDEV-26828 Spinning on buf_pool.page_hash is was... Closed
Problem/Incident
causes MDEV-27936 hardware lock elision on ppc64{,le} ... Closed
causes MDEV-30986 Slow full index scan in 10.6 vs 10.5 ... Closed
Relates
relates to MDEV-28137 Some memory transactions are unnecess... Closed
relates to MDEV-27222 FreeBSD 13.0-p4 build error mariadb-1... Closed

 Description   

The synchronization primitives implemented in operating system libraries such as GNU libc may support lock elision via transactional memory when supported by the underlying hardware. The native primitives in InnoDB are currently missing that.

Lock elision could avoid unnecessary memory cache pollution in case a mutex is not contended. For example, when reading or sometimes even writing buf_pool.page_hash or lock_sys.rec_hash, we could avoid modifying cache lines only for the purpose of acquiring and releasing latches.



 Comments   
Comment by Marko Mäkelä [ 2021-10-14 ]

The throughput and latency are fluctuating a lot in our performance testing. We’d better complete MDEV-26779, MDEV-26827, MDEV-26828 first.

Comment by Axel Schwenke [ 2021-10-21 ]

MDEV-26769-1.pdf shows no regression between mariadb-10.6-d6a3f425ee2 and
mariadb-10.6-MDEV-26769-4d33865d9eb

Comment by Marko Mäkelä [ 2021-10-22 ]

Note: MDEV-26769-1.pdf does show a small regression for write workloads. That branch included also MDEV-26826, MDEV-26827, MDEV-26828. Further tests indicated that MDEV-26827 in its current form is not improving performance, but slightly reducing it. So, we postponed that improvement.

Comment by Marko Mäkelä [ 2021-10-28 ]

If rr replay fails to work, and

grep -lw rtm /proc/cpuinfo

returns a nonempty result on your IA-32 or AMD64 processor, you should disable the RTM detection. In rr 5.5.0 that seems to happen automatically, and you should not see a startup message like this in the server error log that was written during rr record:

2021-11-01  8:22:55 0 [Note] InnoDB: Using transactional memory

In rr 5.4.0 you may have to explicitly enable the CPUID faulting:

./mtr --rr=--disable-cpuid-features-ext=0x800

or

rr record --disable-cpuid-features-ext=0x800 sql/mariadbd

In a virtual machine that runs Ubuntu 20.04, this caused an error:

[FATAL /home/roc/rr/rr/src/RecordSession.cc:2257:RecordSession() errno: EOPNOTSUPP] CPUID faulting required to disable CPUID features

To work around this, you may disable the CPUID check in the source code instead:

diff --git a/storage/innobase/sync/srw_lock.cc b/storage/innobase/sync/srw_lock.cc
index b54191d91b0..d3da62f67ab 100644
--- a/storage/innobase/sync/srw_lock.cc
+++ b/storage/innobase/sync/srw_lock.cc
@@ -41,12 +41,14 @@ bool transactional_lock_enabled()
 bool have_transactional_memory;
 bool transactional_lock_enabled()
 {
+#if 0
   if (__get_cpuid_max(0, nullptr) < 7)
     return false;
   unsigned eax, ebx, ecx, edx;
   __cpuid_count(7, 0, eax, ebx, ecx, edx);
   /* Restricted Transactional Memory (RTM) */
   have_transactional_memory= ebx & 1U << 11;
+#endif
   return have_transactional_memory;
 }
 

Comment by Dmitry Kalinkin [ 2021-11-17 ]

We observe a build regression on macOS:

[ 29%] Building CXX object sql/CMakeFiles/wsrep.dir/wsrep_xid.cc.o
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/btr/btr0btr.cc:28:
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/btr0btr.h:31:
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/dict0dict.h:32:
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/dict0mem.h:45:
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/buf0buf.h:43:
/tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/transactional_lock_guard.h:145:14: error: 'is_write_locked' is a private member of 'rw_lock'
      if (!m.is_write_locked())
             ^
/tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/buf0buf.h:1540:54: note: in instantiation of member function 'transactional_shared_lock_guard<page_hash_latch>::transactional_shared_lock_guard' requested here
    transactional_shared_lock_guard<page_hash_latch> g
                                                     ^
/tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/buf0types.h:182:25: note: constrained by private inheritance here
class page_hash_latch : private rw_lock
                        ^~~~~~~~~~~~~~~
/tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/rw_lock.h:225:8: note: member is declared here
  bool is_write_locked() const { return !!(value() & WRITER); }
       ^
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/btr/btr0btr.cc:28:
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/btr0btr.h:31:
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/dict0dict.h:32:
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/dict0mem.h:45:
In file included from /tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/buf0buf.h:43:
/tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/transactional_lock_guard.h:145:12: error: cannot cast 'page_hash_latch' to its private base class 'const rw_lock'
      if (!m.is_write_locked())
           ^
/tmp/nix-build-mariadb-server-10.6.5.drv-0/source/storage/innobase/include/buf0types.h:182:25: note: declared private here
class page_hash_latch : private rw_lock
                        ^~~~~~~~~~~~~~~
[ 29%] Building C object storage/heap/CMake

It seems to come from
https://github.com/MariaDB/server/commit/1f02280904fcfbb2bd86404d1c85c025634f8c9d

Partially reverting it helps:

diff -ru a/storage/innobase/include/buf0types.h b/storage/innobase/include/buf0types.h
--- a/storage/innobase/include/buf0types.h	2021-11-17 02:37:32.000000000 -0500
+++ b/storage/innobase/include/buf0types.h	2021-11-17 02:38:50.000000000 -0500
@@ -179,7 +179,7 @@
 #include "sux_lock.h"
 
 #ifdef SUX_LOCK_GENERIC
-class page_hash_latch : private rw_lock
+class page_hash_latch : public rw_lock
 {
   /** Wait for a shared lock */
   void read_lock_wait();

Comment by Marko Mäkelä [ 2021-11-17 ]

veprbl, thank you for the observation. I can repeat the compilation failure with the following patch and clang 13:

diff --git a/storage/innobase/include/rw_lock.h b/storage/innobase/include/rw_lock.h
index 0ae052fabe2..4a4eba7df0b 100644
--- a/storage/innobase/include/rw_lock.h
+++ b/storage/innobase/include/rw_lock.h
@@ -22,7 +22,7 @@ this program; if not, write to the Free Software Foundation, Inc.,
 
 #if !(defined __linux__ || defined __OpenBSD__ || defined _WIN32)
 # define SUX_LOCK_GENERIC
-#elif 0 // defined SAFE_MUTEX
+#elif 1 // defined SAFE_MUTEX
 # define SUX_LOCK_GENERIC /* Use dummy implementation for debugging purposes */
 #endif
 

Would the following patch work for you?

diff --git a/storage/innobase/include/srw_lock.h b/storage/innobase/include/srw_lock.h
index 54d042419ca..23d8e35aa7b 100644
--- a/storage/innobase/include/srw_lock.h
+++ b/storage/innobase/include/srw_lock.h
@@ -137,6 +137,7 @@ class ssux_lock_impl final
   void destroy();
   /** @return whether any writer is waiting */
   bool is_waiting() const { return (value() & WRITER_WAITING) != 0; }
+  bool is_write_locked() const { return rw_lock::is_write_locked(); }
 
   bool rd_lock_try() { uint32_t l; return read_trylock(l); }
   bool wr_lock_try() { return write_trylock(); }

For what it is worth, https://shift.click/blog/futex-like-apis/ claims that Mac OS X supports a futex-like API. Maybe you would like to contribute a partial fix of MDEV-26476?

Comment by Dmitry Kalinkin [ 2021-11-17 ]

@marko , your patch doesn't fix the error for me, and I don't quite get why you are patching a different header file. At this time I can't contribute futex support for macOS, but I appreciate your kind suggestion.

Comment by Marko Mäkelä [ 2021-11-17 ]

veprbl, sorry, indeed it is a different header file that you had to patch. The problem that you reported is repeatable with CMAKE_BUILD_TYPE=RelWithDebInfo but not CMAKE_BUILD_TYPE=Debug, which I used. Would this work for you?

diff --git a/storage/innobase/include/buf0types.h b/storage/innobase/include/buf0types.h
index 2cb92a5f1df..d3ee0b42169 100644
--- a/storage/innobase/include/buf0types.h
+++ b/storage/innobase/include/buf0types.h
@@ -191,10 +191,8 @@ class page_hash_latch : private rw_lock
   /** Acquire an exclusive lock */
   inline void lock();
 
-#ifdef UNIV_DEBUG
   /** @return whether an exclusive lock is being held by any thread */
   bool is_write_locked() const { return rw_lock::is_write_locked(); }
-#endif
 
   /** @return whether any lock is being held by any thread */
   bool is_locked() const { return rw_lock::is_locked(); }

Generated at Thu Feb 08 09:47:48 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.