[MDEV-22859] Possible hang in InnoDB rw_lock_lock_word_decr() Created: 2020-06-10  Updated: 2020-06-10  Resolved: 2020-06-10

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.2.3, 10.3.0, 10.4.0, 10.5.0
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Not a Bug Votes: 1
Labels: hang, race

Issue Links:
Problem/Incident
is caused by MDEV-10813 Clean-up InnoDB atomics, memory barri... Closed
Relates
relates to MDEV-22850 Reduce buf_pool.page_hash latch conte... Closed

 Description   

In MDEV-10813, the function rw_lock_lock_word_decr() was simplified in an incorrect way:

@@ -275,12 +275,11 @@ rw_lock_lock_word_decr(
 	os_rmb;
 	local_lock_word = lock->lock_word;
 	while (local_lock_word > threshold) {
-		if (os_compare_and_swap_lint(&lock->lock_word,
-					     local_lock_word,
-					     local_lock_word - amount)) {
+		if (my_atomic_caslint(&lock->lock_word,
+				      &local_lock_word,
+				      local_lock_word - amount)) {
 			return(true);
 		}
-		local_lock_word = lock->lock_word;
 	}
 	return(false);
 #else /* INNODB_RW_LOCKS_USE_ATOMICS */

Because we sample the local_lock_word only once at the start of the loop, it is possible that the following happens during rw_lock_s_lock():

  1. Another thread is holding an S-latch, so that local_lock_word will be X_LOCK_DECR-1.
  2. The other thread will release its S-latch and never come back.
  3. An infinite loop will occur, because lock->lock_word will be X_LOCK_DECR from that point onwards.
  4. Only a subsequent S-latch acquisition from another thread could resolve the hang.


 Comments   
Comment by Marko Mäkelä [ 2020-06-10 ]

Sorry, I missed the fact that my_atomic_caslint() is reloading the local_lock_word. On GCC-compatible compilers, the macro maps to the following code:

#define make_atomic_cas_body64                                    \
  asm volatile ("push %%ebx;"                                     \
                "movl (%%ecx), %%ebx;"                            \
                "movl 4(%%ecx), %%ecx;"                           \
                LOCK_prefix "; cmpxchg8b (%%esi);"                \
                "setz %2; pop %%ebx"                              \
                : "+S" (a), "+A" (*cmp), "=c" (ret)               \
                : "c" (&set)                                      \
                : "memory", "esp")

The "+S" means that the parameter a is both read and written by the assembler snippet via the esi (or rsi) register. The + feels unnecessary to me, because we are changing the data that esi (or rsi) points to, not esi itself. The "+A" means that the register pair eax:edx would be both read and written on IA-32. So, it says that cmp will be assigned to.

The cmpxchg8b instruction is documented to operate on eax:edx and ecx:ebx and its operand. So, everything looks correct after all.

The previous primitive os_compare_and_swap_lint() failed to return the value that was read.

Generated at Thu Feb 08 09:18:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.