[MDEV-28043] Race condition between mtr_t::commit() and checkpoint - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: N/A
Fix Version/s: 10.9.0, 10.8.3
Component/s: Storage Engine - InnoDB
Labels:
Environment:
GNU/Linux with mmap() based redo log on /dev/shm

Description

When ~~MDEV-27774~~ replaced log_sys.mutex with log_sys.latch, it introduced a race condition in mtr_t::do_write():

    if (!ex)

      log_sys.latch.rd_unlock();

      log_sys.latch.wr_lock(SRW_LOCK_CALL);

      if (UNIV_LIKELY(!m_user_space->max_lsn))

        name_write();

      std::pair<lsn_t,mtr_t::page_flush_ahead> p{finish_write(len, true)};

      log_sys.latch.wr_unlock();

      log_sys.latch.rd_lock(SRW_LOCK_CALL);

      return p;

It is not safe to release the exclusive log_sys.latch between finish_write() and ReleaseBlocks. Because we have no portable operation that would downgrade the latch from exclusive to shared, we must retain that exclusive latch until the end of the critical section in mtr_t::commit().

I debugged an rr replay trace of this:

ssh pluto

rr replay /data/results/1647008467/TBR-1420/dev/shm/rqg/1647008467/53/1/rr/latest-trace

continue

watch -l log_sys.last_checkpoint_lsn.m._M_i

watch -l buf_pool.flush_list.count

reverse-continue

reverse-continue

reverse-continue

thread apply 24 backtrace

From the end of the start, we have Thread 3 hitting an assertion failure:

mysqld: /data/Server/bb-10.9-MDEV-26603-async-redo-writeB/storage/innobase/buf/buf0flu.cc:1877: bool log_checkpoint_low(lsn_t, lsn_t): Assertion `oldest_lsn > log_sys.last_checkpoint_lsn' failed.

Before that, we had Thread 24 inserting the unexpectedly old block to buf_pool.flush_list, and before that, Thread 3 updating the checkpoint LSN to the too new value.

Attachments

Issue Links

is caused by

MDEV-27774 Reduce scalability bottlenecks in mtr_t::commit()

Closed

Activity

Matthias Leich added a comment - 2022-03-14 19:11

origin/bb-10.8-~~MDEV-28043~~ d8dd388f5b549000fcd2af0b576bb24154914368 2022-03-14T14:26:09+02:00
performed well in RQG testing.

Matthias Leich added a comment - 2022-03-14 19:11 origin/bb-10.8- MDEV-28043 d8dd388f5b549000fcd2af0b576bb24154914368 2022-03-14T14:26:09+02:00 performed well in RQG testing.

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2022-03-11 16:54

Updated:: 2022-03-17 07:40

Resolved:: 2022-03-15 10:53

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server