Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34212

InnoDB transaction recovery is incorrect

Details

    Description

      Starting with MDEV-32042, we assume that the function buf_page_get_gen() is not being invoked while log is being applied during crash recovery. The idea is that any access to buffer pool pages during crash recovery must look up pages via recv_sys_t::recover(). This was being violated by the function trx_undo_mem_create_at_db_start() during startup.

      As far as I can tell, there are two possible outcomes of this:

      • (the lucky case): InnoDB refuses to start up and reports Data structure corruption.
        This causes occasional failures of the test innodb.table_flags and some other tests, including some mariabackup tests.
      • (the scary case): InnoDB recovers incorrect state of some transactions, and may cause permanent corruption.
        I have not reproduced this; this is my hypothesis.

      The fix is twofold:

      1. Add an assertion to buf_page_get_gen() to catch incorrect calls, and implement a special case for recv_sys.recover()
      2. Make recv_sys.recover() only buffer-fix pages, instead of acquiring shared latches. This is needed because there will be some recursive access to undo pages, and the shared latches are not recursive. A buffer-fix is sufficient, because at the early phase of startup there are no page modifications, other than by the application of the write-ahead log (ib_logfile0).

      Attachments

        Issue Links

          Activity

            Related to this (exposed by the same test case), MDEV-34216 is another bug that affects MariaDB Server releases starting from 11.2.

            marko Marko Mäkelä added a comment - Related to this (exposed by the same test case), MDEV-34216 is another bug that affects MariaDB Server releases starting from 11.2.

            I don't think I can contribute much to this review. It is more related to the base refactoring done in MDEV-32042.

            Looking at MDEV-32042, the change looks a bit risky and not very good for future. If the decision is based on state, it looks much cleaner and less error prone to have a single interface and make the decision there.

            If we don't have other dependencies my suggestion would be to revert MDEV-32042 and re-evaluate the refactoring.

            However, I don't think the patch should block on my review if both @dr-m and @Thirunarayanan agree on it.

            debarun Debarun Banerjee added a comment - I don't think I can contribute much to this review. It is more related to the base refactoring done in MDEV-32042 . Looking at MDEV-32042 , the change looks a bit risky and not very good for future. If the decision is based on state, it looks much cleaner and less error prone to have a single interface and make the decision there. If we don't have other dependencies my suggestion would be to revert MDEV-32042 and re-evaluate the refactoring. However, I don't think the patch should block on my review if both @dr-m and @Thirunarayanan agree on it.

            debarun, thank you for your comment. I thought that it could be useful to explain some wider context that is related to this.

            There was a plan to run the log-based recovery concurrently with workload. This was tested also internally at Oracle, I think in the MySQL 5.0 or MySQL 5.1 time frame, but testing was encountering too many bugs and the idea was abandoned. I remember that there were bugs related to BLOBs, possibly also something else. I had filed MDEV-14481 to implement this in MariaDB, but ultimately I abandoned the idea.

            However, there always was a case where the last batch of log application runs concurrently with some buffer pool access. That is when we traverse the undo logs to resurrect the locks of incomplete transactions and to initialize the queue for purging committed transaction history. MDEV-21572 fixed a race condition in this area, by adding extra handling of crash recovery to buf_page_get_gen(). I don’t remember that this problem was fixed in MySQL, definitely not in 5.7, but maybe you can comment on MySQL 8.0. Possibly this was somewhat less of an issue with MySQL, because MySQL is missing MDEV-12699 as well. That is, MariaDB recovery will avoid reading pages that can be recovered solely based on log records.

            I am reluctant to revert MDEV-32042, because it could introduce a performance regression. Reverting it might also cause a conflict with MDEV-14795, because after reverting, we would not only access undo log pages concurrently with the last recovery batch, but also the data dictionary tables in the system tablespace.

            The debug assertion that my proposed patch is adding to the start of buf_page_get_gen() should catch any misuse of the API. We should be wiser in a few hours, once the logs from the overnight stress test have been evaluated.

            marko Marko Mäkelä added a comment - debarun , thank you for your comment. I thought that it could be useful to explain some wider context that is related to this. There was a plan to run the log-based recovery concurrently with workload. This was tested also internally at Oracle, I think in the MySQL 5.0 or MySQL 5.1 time frame, but testing was encountering too many bugs and the idea was abandoned. I remember that there were bugs related to BLOBs, possibly also something else. I had filed MDEV-14481 to implement this in MariaDB, but ultimately I abandoned the idea. However, there always was a case where the last batch of log application runs concurrently with some buffer pool access. That is when we traverse the undo logs to resurrect the locks of incomplete transactions and to initialize the queue for purging committed transaction history. MDEV-21572 fixed a race condition in this area, by adding extra handling of crash recovery to buf_page_get_gen() . I don’t remember that this problem was fixed in MySQL, definitely not in 5.7, but maybe you can comment on MySQL 8.0. Possibly this was somewhat less of an issue with MySQL, because MySQL is missing MDEV-12699 as well. That is, MariaDB recovery will avoid reading pages that can be recovered solely based on log records. I am reluctant to revert MDEV-32042 , because it could introduce a performance regression. Reverting it might also cause a conflict with MDEV-14795 , because after reverting, we would not only access undo log pages concurrently with the last recovery batch, but also the data dictionary tables in the system tablespace. The debug assertion that my proposed patch is adding to the start of buf_page_get_gen() should catch any misuse of the API. We should be wiser in a few hours, once the logs from the overnight stress test have been evaluated.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.