Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-16989

InnoDB hang on crash recovery: Waited for 10 seconds for 256 pending reads

Details

    Description

      wlad made me aware of a PBX-1467 fix for Percona Xtrabackup. I believe that the described hang scenario is possible in InnoDB and XtraDB crash recovery.

      Quoting sergei-gl’s commit message:

      Here is an example deadlock scenario:

      Thread 1 in `recv_apply_hashed_log_recs' is waiting when
      `buf_pool->n_pend_reads' become not too high to make a progress. It is
      `apply_batch_on=TRUE' and will change it to be `FALSE' once apply batch
      completed. Note that `buf_pool->n_pend_reads' is already high.

      Now, one of the pending reads completes and `buf_page_io_complete'
      invoked. It should decrement `buf_pool->n_pend_reads' and let current
      apply batch to make progress.

      But before decrementing `buf_pool->n_pend_reads', `buf_page_io_complete'
      invoked `ibuf_merge_or_delete_for_page' which in turn triggered one more
      `recv_apply_hashed_log_recs'. This new `recv_apply_hashed_log_recs'
      cannot make progress because `apply_batch_on' is `TRUE', it is waiting
      for thread 1. We are in the deadlock now.

      Lets imagine that all IO handlers (`buf_page_io_complete') stuck in the
      `recv_apply_hashed_log_recs', here is what we see in this case.

      Proposed fix is to decrement `buf_pool->n_pend_reads' before invoking
      `ibuf_merge_or_delete_for_page'.

      This hang should only be possible if there were buffered changes to secondary index leaf pages, to pages which were read during the redo log processing, possibly by read-ahead.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            serg Sergei Golubchik made changes -
            Field Original Value New Value
            Labels hang recovery hang need_feedback recovery
            elenst Elena Stepanova made changes -
            Labels hang need_feedback recovery hang recovery
            serg Sergei Golubchik made changes -
            Fix Version/s 10.4 [ 22408 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Assignee Marko Mäkelä [ marko ] Vladislav Lesin [ vlad.lesin ]
            Priority Critical [ 2 ] Major [ 3 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Component/s Backup [ 13902 ]
            ykantoni YURII KANTONISTOV made changes -
            Attachment localtranprepare18604.zip [ 50917 ]
            ykantoni YURII KANTONISTOV added a comment - - edited

            Reproducible in stress tests with prepare of incremental backup - see attached log of mariabackup. localtranprepare18604.zip

            ykantoni YURII KANTONISTOV added a comment - - edited Reproducible in stress tests with prepare of incremental backup - see attached log of mariabackup. localtranprepare18604.zip
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 5.5 [ 15800 ]
            Fix Version/s 10.0 [ 16000 ]

            MDEV-19514 in MariaDB Server 10.5 should have fixed this.

            marko Marko Mäkelä added a comment - MDEV-19514 in MariaDB Server 10.5 should have fixed this.
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Labels hang recovery hang not-10.5 recovery
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.1 [ 16100 ]
            julien.fritsch Julien Fritsch made changes -
            Assignee Vladislav Lesin [ vlad.lesin ] Marko Mäkelä [ marko ]

            ykantoni, is this reproducible with MariaDB 10.5.10? The crash recovery was heavily refactored in 10.5 (among other things, in MDEV-12353, MDEV-21351, MDEV-23855.

            marko Marko Mäkelä added a comment - ykantoni , is this reproducible with MariaDB 10.5.10? The crash recovery was heavily refactored in 10.5 (among other things, in MDEV-12353 , MDEV-21351 , MDEV-23855 .

            Restoring an incremental backup should never invoke ibuf_merge_or_delete_for_page(), and in 10.5 that is not invoked at all during log-based recovery, thanks to MDEV-19514.

            marko Marko Mäkelä added a comment - Restoring an incremental backup should never invoke ibuf_merge_or_delete_for_page() , and in 10.5 that is not invoked at all during log-based recovery, thanks to MDEV-19514 .
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 88931 ] MariaDB v4 [ 140893 ]

            MDEV-19514 appeared in MariaDB Server 10.5.0.

            marko Marko Mäkelä added a comment - MDEV-19514 appeared in MariaDB Server 10.5.0.
            marko Marko Mäkelä made changes -
            Fix Version/s 10.5.0 [ 23709 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.3 [ 22126 ]
            Fix Version/s 10.4 [ 22408 ]
            Resolution Fixed [ 1 ]
            Status Open [ 1 ] Closed [ 6 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 121612

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.