[MDEV-22850] Reduce buf_pool.page_hash latch contention - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.5
Fix Version/s: 10.5.4
Component/s: Storage Engine - InnoDB
Labels:
- Scalability
- performance

Description

The changes that were introduced in ~~MDEV-15053~~ are sometimes causing severe performance loss when there is contention on buf_pool.page_hash. At the root we have the invocation of rw_lock_lock_word_decr(lock, 1, 0)) in rw_lock_s_lock_low(). That function is looping:

UNIV_INLINE

bool

rw_lock_lock_word_decr(

/*===================*/

	rw_lock_t*	lock,		/*!< in/out: rw-lock */

	int32_t		amount,		/*!< in: amount to decrement */

	int32_t		threshold)	/*!< in: threshold of judgement */

	int32_t lock_copy = lock->lock_word;

	while (lock_copy > threshold) {

		if (lock->lock_word.compare_exchange_strong(

			lock_copy,

			lock_copy - amount,

			std::memory_order_acquire,

			std::memory_order_relaxed)) {

			return(true);

		(void) LF_BACKOFF();

	return(false);

Note: the LF_BACKOFF() was added in an attempt to reduce the problem.

It feels strange that we are first loading lock_copy and then insisting that we will get an exact match on that lock_copy value. It seems that this could cause unnecessary contention between non-conflicting S-latch requests. The following naïve patch should alleviate that:

diff --git a/storage/innobase/include/sync0rw.ic b/storage/innobase/include/sync0rw.ic

index 18a6c69ece4..cec796c8892 100644

--- a/storage/innobase/include/sync0rw.ic

+++ b/storage/innobase/include/sync0rw.ic

@@ -209,20 +209,18 @@ rw_lock_lock_word_decr(

 	int32_t		amount,		/*!< in: amount to decrement */

 	int32_t		threshold)	/*!< in: threshold of judgement */

-	int32_t lock_copy = lock->lock_word;

-	while (lock_copy > threshold) {

-		if (lock->lock_word.compare_exchange_strong(

-			lock_copy,

-			lock_copy - amount,

-			std::memory_order_acquire,

-			std::memory_order_relaxed)) {

+  for (;;)

+  {

+    int32_t lock_copy= lock->lock_word;

-			return(true);

-		}

-		(void) LF_BACKOFF();

-	}

-	return(false);

+    if (lock_copy <= threshold)

+      return false;

+    if (lock->lock_word.compare_exchange_strong(lock_copy, lock_copy - amount,

+						std::memory_order_acquire,

+						std::memory_order_relaxed))

+      return true;

+    (void) LF_BACKOFF();

+  }

 /******************************************************************//**

For even better results, we might want to rewrite the S-latch acquisition, as in the following incomplete patch. At least rw_lock_x_unlock_func() would have to be adjusted as well:

diff --git a/storage/innobase/include/sync0rw.ic b/storage/innobase/include/sync0rw.ic

index 18a6c69ece4..32ad6497a35 100644

--- a/storage/innobase/include/sync0rw.ic

+++ b/storage/innobase/include/sync0rw.ic

@@ -239,8 +239,9 @@ rw_lock_s_lock_low(

 	const char*	file_name, /*!< in: file name where lock requested */

 	unsigned	line)	/*!< in: line where requested */

-	if (!rw_lock_lock_word_decr(lock, 1, 0)) {

+	if (lock->lock_word.fetch_sub(1, std::memory_order_acquire) <= 0) {

 		/* Locking did not succeed */

+		lock->lock_word.fetch_add(1, std::memory_order_release);

 		return(FALSE);

Also, somewhat related to this, we might want to replace the rw_lock_s_lock(l) call in buf_pool_t::page_hash_lock() with a non-inlined function call, because that code is extremely unlikely to be executed, only when the buffer pool is being resized.

Attachments

Issue Links

relates to

MDEV-19929 Add a startup message about PAUSE instruction timing

In Review

MDEV-22859 Possible hang in InnoDB rw_lock_lock_word_decr()

Closed

MDEV-22871 Contention on the buf_pool.page_hash

Closed

MDEV-14659 Innodb scalibility issue found in Mariadb code for complex 'select' queries in Arm platform

Closed

MDEV-15053 Reduce buf_pool_t::mutex contention

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2020-06-10 10:34

Related to this, measurements on two Intel Cascade Lake based systems suggest that the PAUSE instruction latency was reverted from the Skylake microarchitecture’s roughly 120 clock cycles to pre-Skylake levels (about 12 clock cycles). To compensate that, it seems that a larger value of my_cpu_relax_multiplier would be useful. We should also finally implement a startup message about it (MDEV-19929).

Marko Mäkelä added a comment - 2020-06-10 10:34 Related to this, measurements on two Intel Cascade Lake based systems suggest that the PAUSE instruction latency was reverted from the Skylake microarchitecture’s roughly 120 clock cycles to pre-Skylake levels (about 12 clock cycles). To compensate that, it seems that a larger value of my_cpu_relax_multiplier would be useful. We should also finally implement a startup message about it ( MDEV-19929 ).

Marko Mäkelä added a comment - 2020-06-10 14:06

I filed and rejected ~~MDEV-22859~~ for the suspected hang due to failing to reload lock_copy.

My suggested patch cannot be helpful after all. The lock_copy is actually being reloaded in the loop, by the compare_exchange_strong() operation.

Adding a redundant load of the lock_copy to the loop hurt performance, not improve it.

The performance regression on Intel Cascade Lake remains unresolved. And we should still review how we could reduce the buf_pool.page_hash X-latch hold time, which should be the only factor that contributes to this contention.

Marko Mäkelä added a comment - 2020-06-10 14:06 I filed and rejected MDEV-22859 for the suspected hang due to failing to reload lock_copy . My suggested patch cannot be helpful after all. The lock_copy is actually being reloaded in the loop, by the compare_exchange_strong() operation. Adding a redundant load of the lock_copy to the loop hurt performance, not improve it. The performance regression on Intel Cascade Lake remains unresolved. And we should still review how we could reduce the buf_pool.page_hash X-latch hold time, which should be the only factor that contributes to this contention.

Marko Mäkelä added a comment - 2020-06-10 16:36

I believe that we can rely solely on buf_pool.mutex on some code paths where we are currently acquiring the rw-lock on buf_pool.page_hash. Currently, my fix is causing hangs especially in encryption tests.

I now believe there is no need to have a LF_BACKOFF() call in rw_lock_lock_word_decr(). After all, we are terminating the spinloop as soon as a conflict is detected.

Marko Mäkelä added a comment - 2020-06-10 16:36 I believe that we can rely solely on buf_pool.mutex on some code paths where we are currently acquiring the rw-lock on buf_pool.page_hash . Currently, my fix is causing hangs especially in encryption tests. I now believe there is no need to have a LF_BACKOFF() call in rw_lock_lock_word_decr() . After all, we are terminating the spinloop as soon as a conflict is detected.

Marko Mäkelä added a comment - 2020-06-11 08:25

I was able to reduce the buf_pool.page_hash X-latch hold time in buf_page_init_for_read() and buf_page_create(), which I hope should address the root cause of this.

That change includes the following:

diff --git a/storage/innobase/include/sync0rw.ic b/storage/innobase/include/sync0rw.ic

index 18a6c69ece4..c881c2fcdf1 100644

--- a/storage/innobase/include/sync0rw.ic

+++ b/storage/innobase/include/sync0rw.ic

@@ -220,8 +220,14 @@ rw_lock_lock_word_decr(

 			return(true);

-		(void) LF_BACKOFF();

+		/* Note that lock_copy was reloaded above. We will

+		keep trying if a spurious conflict occurred, typically

+		caused by concurrent executions of

+		rw_lock_s_lock(). */

+	/* A real conflict was detected. */

 	return(false);

It remains to be seen whether having the LF_BACKOFF() could still be beneficial. After all, we could still have many concurrent S-latch requests on the buf_pool.page_hash.

A proper fix for the spurious S-latch contention (if it still remains an issue) might be to implement a simpler variant of rw_lock_t that does not support SX-latches (we only need those for dict_index_t::lock, buf_block_t::lock and possibly fil_space_t::latch) nor any recursive latches, and implements the S-latch acquisition by fetch_add(1).

Marko Mäkelä added a comment - 2020-06-11 08:25 I was able to reduce the buf_pool.page_hash X-latch hold time in buf_page_init_for_read() and buf_page_create() , which I hope should address the root cause of this. That change includes the following: diff --git a/storage/innobase/include/sync0rw.ic b/storage/innobase/include/sync0rw.ic index 18a6c69ece4..c881c2fcdf1 100644 --- a/storage/innobase/include/sync0rw.ic +++ b/storage/innobase/include/sync0rw.ic @@ -220,8 +220,14 @@ rw_lock_lock_word_decr( return(true); } - (void) LF_BACKOFF(); + + /* Note that lock_copy was reloaded above. We will + keep trying if a spurious conflict occurred, typically + caused by concurrent executions of + rw_lock_s_lock(). */ } + + /* A real conflict was detected. */ return(false); } It remains to be seen whether having the LF_BACKOFF() could still be beneficial. After all, we could still have many concurrent S-latch requests on the buf_pool.page_hash . A proper fix for the spurious S-latch contention (if it still remains an issue) might be to implement a simpler variant of rw_lock_t that does not support SX-latches (we only need those for dict_index_t::lock , buf_block_t::lock and possibly fil_space_t::latch ) nor any recursive latches, and implements the S-latch acquisition by fetch_add(1) .

Marko Mäkelä added a comment - 2020-06-11 14:48

To be on the safe side with regard to performance when the number of concurrently executing threads exceeds the number of available CPU cores, I decided to retain the LF_BACKOFF() call after all, even though krunalbauskar found it to reduce performance on one system. That call will be removed in ~~MDEV-22871~~ once we have migrated buf_pool.page_hash and possibly other hot rw-locks to use a more efficient S-latch acquisition protocol.

In buf_page_init_for_read(), we must acquire the buf_pool.page_hash latch early to prevent a race condition with buf_pool_t::watch_remove(). So, basically this fix is only reducing contention in buf_page_create(), which should mostly benefit read-write use cases.

Marko Mäkelä added a comment - 2020-06-11 14:48 To be on the safe side with regard to performance when the number of concurrently executing threads exceeds the number of available CPU cores, I decided to retain the LF_BACKOFF() call after all, even though krunalbauskar found it to reduce performance on one system. That call will be removed in MDEV-22871 once we have migrated buf_pool.page_hash and possibly other hot rw-locks to use a more efficient S-latch acquisition protocol. In buf_page_init_for_read() , we must acquire the buf_pool.page_hash latch early to prevent a race condition with buf_pool_t::watch_remove() . So, basically this fix is only reducing contention in buf_page_create() , which should mostly benefit read-write use cases.

MariaDB Server

Reduce buf_pool.page_hash latch contention

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration