[MDEV-16989] InnoDB hang on crash recovery: Waited for 10 seconds for 256 pending reads - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
Fix Version/s: 10.5.0
Component/s: Backup, Storage Engine - InnoDB, Storage Engine - XtraDB
Labels:
- hang
- not-10.5
- recovery

Description

wlad made me aware of a PBX-1467 fix for Percona Xtrabackup. I believe that the described hang scenario is possible in InnoDB and XtraDB crash recovery.

Quoting sergei-gl’s commit message:

Here is an example deadlock scenario:

Thread 1 in `recv_apply_hashed_log_recs' is waiting when
`buf_pool->n_pend_reads' become not too high to make a progress. It is
`apply_batch_on=TRUE' and will change it to be `FALSE' once apply batch
completed. Note that `buf_pool->n_pend_reads' is already high.

Now, one of the pending reads completes and `buf_page_io_complete'
invoked. It should decrement `buf_pool->n_pend_reads' and let current
apply batch to make progress.

But before decrementing `buf_pool->n_pend_reads', `buf_page_io_complete'
invoked `ibuf_merge_or_delete_for_page' which in turn triggered one more
`recv_apply_hashed_log_recs'. This new `recv_apply_hashed_log_recs'
cannot make progress because `apply_batch_on' is `TRUE', it is waiting
for thread 1. We are in the deadlock now.

Lets imagine that all IO handlers (`buf_page_io_complete') stuck in the
`recv_apply_hashed_log_recs', here is what we see in this case.

Proposed fix is to decrement `buf_pool->n_pend_reads' before invoking
`ibuf_merge_or_delete_for_page'.

This hang should only be possible if there were buffered changes to secondary index leaf pages, to pages which were read during the redo log processing, possibly by read-ahead.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

Hide
localtranprepare18604.zip
2020-03-27 22:10
306 kB
YURII KANTONISTOV
Extracting archive...
Show
localtranprepare18604.zip
2020-03-27 22:10
306 kB
YURII KANTONISTOV

Issue Links

relates to

MDEV-19514 Defer change buffer merge until pages are requested

Closed

Activity

Ascending order - Click to sort in descending order

YURII KANTONISTOV added a comment - 2020-03-27 22:14 - edited

Reproducible in stress tests with prepare of incremental backup - see attached log of mariabackup. localtranprepare18604.zip

YURII KANTONISTOV added a comment - 2020-03-27 22:14 - edited Reproducible in stress tests with prepare of incremental backup - see attached log of mariabackup. localtranprepare18604.zip

Marko Mäkelä added a comment - 2020-10-19 18:10

~~MDEV-19514~~ in MariaDB Server 10.5 should have fixed this.

Marko Mäkelä added a comment - 2020-10-19 18:10 MDEV-19514 in MariaDB Server 10.5 should have fixed this.

Marko Mäkelä added a comment - 2021-05-11 16:09

ykantoni, is this reproducible with MariaDB 10.5.10? The crash recovery was heavily refactored in 10.5 (among other things, in ~~MDEV-12353~~, ~~MDEV-21351~~, ~~MDEV-23855~~.

Marko Mäkelä added a comment - 2021-05-11 16:09 ykantoni , is this reproducible with MariaDB 10.5.10? The crash recovery was heavily refactored in 10.5 (among other things, in MDEV-12353 , MDEV-21351 , MDEV-23855 .

Marko Mäkelä added a comment - 2021-05-11 16:11

Restoring an incremental backup should never invoke ibuf_merge_or_delete_for_page(), and in 10.5 that is not invoked at all during log-based recovery, thanks to ~~MDEV-19514~~.

Marko Mäkelä added a comment - 2021-05-11 16:11 Restoring an incremental backup should never invoke ibuf_merge_or_delete_for_page() , and in 10.5 that is not invoked at all during log-based recovery, thanks to MDEV-19514 .

Marko Mäkelä added a comment - 2022-01-14 15:50

~~MDEV-19514~~ appeared in MariaDB Server 10.5.0.

Marko Mäkelä added a comment - 2022-01-14 15:50 MDEV-19514 appeared in MariaDB Server 10.5.0.

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2018-08-15 14:55

Updated:: 2024-07-07 23:41

Resolved:: 2022-01-14 15:50

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.