Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34283

A misplaced btr_cur_need_opposite_intention() check may fail to prevent hangs

Details

    Description

      In MDEV-30400 the function btr_cur_t::search_leaf() replaced the function btr_cur_search_to_nth_level() for the case level=0. That code was revised in MDEV-29385 with regard to the function btr_cur_need_opposite_intention().

      Upon reaching the leaf level, one call to btr_cur_need_opposite_intention() is misplaced. Before these changes, btr_cur_search_to_nth_level() would invoke btr_cur_need_opposite_intention() after positioning page_cur_t::rec on the current page. As a result of the misplaced call, the calls to page_rec_is_last() and page_rec_is_first() would never seem to hold, because the page and rec would be within different buffer pool blocks.

      The purpose of the function btr_cur_need_opposite_intention() is to detect when a page split could occur. As far as I can tell, this bug could cause a hang similar to the ones that the fix of MDEV-29835 attempted to address. Possibly, this bug could explain MDEV-31815.

      Attachments

        Issue Links

          Activity

            The patch looks straight forward. Can you please see if a testcase/scenario can be created that can validate the patch ? Other than validation, it would also help us map any customer issue that is possibly caused by it.

            debarun Debarun Banerjee added a comment - The patch looks straight forward. Can you please see if a testcase/scenario can be created that can validate the patch ? Other than validation, it would also help us map any customer issue that is possibly caused by it.

            I am afraid that other than the debug assertion that I added (and which would fail in a number of regression tests) and retesting for MDEV-31815 (which is a more direct sign of a potential hang), there is not much that can be done. The suspected scenario is a WL#6326 violation. It took us years after the MySQL 5.7 release to find and address those hangs, mostly thanks to rr replay.

            marko Marko Mäkelä added a comment - I am afraid that other than the debug assertion that I added (and which would fail in a number of regression tests) and retesting for MDEV-31815 (which is a more direct sign of a potential hang), there is not much that can be done. The suspected scenario is a WL#6326 violation. It took us years after the MySQL 5.7 release to find and address those hangs, mostly thanks to rr replay .

            I understand it could be hard to create a scenario. It is indeed critical to adhere to these B-Tree latching rules and we don't have enough validation mechanism today. Thanks for identifying and fixing the issue. The patch looks good to me.

            debarun Debarun Banerjee added a comment - I understand it could be hard to create a scenario. It is indeed critical to adhere to these B-Tree latching rules and we don't have enough validation mechanism today. Thanks for identifying and fixing the issue. The patch looks good to me.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.