[MDEV-16989] InnoDB hang on crash recovery: Waited for 10 seconds for 256 pending reads - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
Fix Version/s: 10.5.0
Component/s: Backup, Storage Engine - InnoDB, Storage Engine - XtraDB
Labels:
- hang
- not-10.5
- recovery

Description

wlad made me aware of a PBX-1467 fix for Percona Xtrabackup. I believe that the described hang scenario is possible in InnoDB and XtraDB crash recovery.

Quoting sergei-gl’s commit message:

Here is an example deadlock scenario:

Thread 1 in `recv_apply_hashed_log_recs' is waiting when
`buf_pool->n_pend_reads' become not too high to make a progress. It is
`apply_batch_on=TRUE' and will change it to be `FALSE' once apply batch
completed. Note that `buf_pool->n_pend_reads' is already high.

Now, one of the pending reads completes and `buf_page_io_complete'
invoked. It should decrement `buf_pool->n_pend_reads' and let current
apply batch to make progress.

But before decrementing `buf_pool->n_pend_reads', `buf_page_io_complete'
invoked `ibuf_merge_or_delete_for_page' which in turn triggered one more
`recv_apply_hashed_log_recs'. This new `recv_apply_hashed_log_recs'
cannot make progress because `apply_batch_on' is `TRUE', it is waiting
for thread 1. We are in the deadlock now.

Lets imagine that all IO handlers (`buf_page_io_complete') stuck in the
`recv_apply_hashed_log_recs', here is what we see in this case.

Proposed fix is to decrement `buf_pool->n_pend_reads' before invoking
`ibuf_merge_or_delete_for_page'.

This hang should only be possible if there were buffered changes to secondary index leaf pages, to pages which were read during the redo log processing, possibly by read-ahead.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

localtranprepare18604.zip
306 kB
2020-03-27 22:10

Issue Links

relates to

MDEV-19514 Defer change buffer merge until pages are requested

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2018-08-15 14:55

Updated:: 2024-07-07 23:41

Resolved:: 2022-01-14 15:50

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.