Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16989

InnoDB hang on crash recovery: Waited for 10 seconds for 256 pending reads



      Vladislav Vaintroub made me aware of a PBX-1467 fix for Percona Xtrabackup. I believe that the described hang scenario is possible in InnoDB and XtraDB crash recovery.

      Quoting Sergei Glushchenko’s commit message:

      Here is an example deadlock scenario:

      Thread 1 in `recv_apply_hashed_log_recs' is waiting when
      `buf_pool->n_pend_reads' become not too high to make a progress. It is
      `apply_batch_on=TRUE' and will change it to be `FALSE' once apply batch
      completed. Note that `buf_pool->n_pend_reads' is already high.

      Now, one of the pending reads completes and `buf_page_io_complete'
      invoked. It should decrement `buf_pool->n_pend_reads' and let current
      apply batch to make progress.

      But before decrementing `buf_pool->n_pend_reads', `buf_page_io_complete'
      invoked `ibuf_merge_or_delete_for_page' which in turn triggered one more
      `recv_apply_hashed_log_recs'. This new `recv_apply_hashed_log_recs'
      cannot make progress because `apply_batch_on' is `TRUE', it is waiting
      for thread 1. We are in the deadlock now.

      Lets imagine that all IO handlers (`buf_page_io_complete') stuck in the
      `recv_apply_hashed_log_recs', here is what we see in this case.

      Proposed fix is to decrement `buf_pool->n_pend_reads' before invoking

      This hang should only be possible if there were buffered changes to secondary index leaf pages, to pages which were read during the redo log processing, possibly by read-ahead.




            • Assignee:
              vlad.lesin Vladislav Lesin
              marko Marko Mäkelä
            • Votes:
              0 Vote for this issue
              4 Start watching this issue


              • Created: