Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Fixed
-
5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL)
Description
wlad made me aware of a PBX-1467 fix for Percona Xtrabackup. I believe that the described hang scenario is possible in InnoDB and XtraDB crash recovery.
Quoting sergei-gl’s commit message:
Here is an example deadlock scenario:
Thread 1 in `recv_apply_hashed_log_recs' is waiting when
`buf_pool->n_pend_reads' become not too high to make a progress. It is
`apply_batch_on=TRUE' and will change it to be `FALSE' once apply batch
completed. Note that `buf_pool->n_pend_reads' is already high.Now, one of the pending reads completes and `buf_page_io_complete'
invoked. It should decrement `buf_pool->n_pend_reads' and let current
apply batch to make progress.But before decrementing `buf_pool->n_pend_reads', `buf_page_io_complete'
invoked `ibuf_merge_or_delete_for_page' which in turn triggered one more
`recv_apply_hashed_log_recs'. This new `recv_apply_hashed_log_recs'
cannot make progress because `apply_batch_on' is `TRUE', it is waiting
for thread 1. We are in the deadlock now.Lets imagine that all IO handlers (`buf_page_io_complete') stuck in the
`recv_apply_hashed_log_recs', here is what we see in this case.Proposed fix is to decrement `buf_pool->n_pend_reads' before invoking
`ibuf_merge_or_delete_for_page'.
This hang should only be possible if there were buffered changes to secondary index leaf pages, to pages which were read during the redo log processing, possibly by read-ahead.
Attachments
Issue Links
- relates to
-
MDEV-19514 Defer change buffer merge until pages are requested
- Closed