Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34283

A misplaced btr_cur_need_opposite_intention() check may fail to prevent hangs

Details

    Description

      In MDEV-30400 the function btr_cur_t::search_leaf() replaced the function btr_cur_search_to_nth_level() for the case level=0. That code was revised in MDEV-29385 with regard to the function btr_cur_need_opposite_intention().

      Upon reaching the leaf level, one call to btr_cur_need_opposite_intention() is misplaced. Before these changes, btr_cur_search_to_nth_level() would invoke btr_cur_need_opposite_intention() after positioning page_cur_t::rec on the current page. As a result of the misplaced call, the calls to page_rec_is_last() and page_rec_is_first() would never seem to hold, because the page and rec would be within different buffer pool blocks.

      The purpose of the function btr_cur_need_opposite_intention() is to detect when a page split could occur. As far as I can tell, this bug could cause a hang similar to the ones that the fix of MDEV-29835 attempted to address. Possibly, this bug could explain MDEV-31815.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Debarun Banerjee [ JIRAUSER54513 ]
            Status In Progress [ 3 ] In Review [ 10002 ]

            The patch looks straight forward. Can you please see if a testcase/scenario can be created that can validate the patch ? Other than validation, it would also help us map any customer issue that is possibly caused by it.

            debarun Debarun Banerjee added a comment - The patch looks straight forward. Can you please see if a testcase/scenario can be created that can validate the patch ? Other than validation, it would also help us map any customer issue that is possibly caused by it.

            I am afraid that other than the debug assertion that I added (and which would fail in a number of regression tests) and retesting for MDEV-31815 (which is a more direct sign of a potential hang), there is not much that can be done. The suspected scenario is a WL#6326 violation. It took us years after the MySQL 5.7 release to find and address those hangs, mostly thanks to rr replay.

            marko Marko Mäkelä added a comment - I am afraid that other than the debug assertion that I added (and which would fail in a number of regression tests) and retesting for MDEV-31815 (which is a more direct sign of a potential hang), there is not much that can be done. The suspected scenario is a WL#6326 violation. It took us years after the MySQL 5.7 release to find and address those hangs, mostly thanks to rr replay .

            I understand it could be hard to create a scenario. It is indeed critical to adhere to these B-Tree latching rules and we don't have enough validation mechanism today. Thanks for identifying and fixing the issue. The patch looks good to me.

            debarun Debarun Banerjee added a comment - I understand it could be hard to create a scenario. It is indeed critical to adhere to these B-Tree latching rules and we don't have enough validation mechanism today. Thanks for identifying and fixing the issue. The patch looks good to me.
            debarun Debarun Banerjee made changes -
            Assignee Debarun Banerjee [ JIRAUSER54513 ] Marko Mäkelä [ marko ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6.19 [ 29833 ]
            Fix Version/s 10.11.9 [ 29834 ]
            Fix Version/s 11.1.6 [ 29835 ]
            Fix Version/s 11.2.5 [ 29836 ]
            Fix Version/s 11.4.3 [ 29837 ]
            Fix Version/s 11.5.2 [ 29838 ]
            Fix Version/s 10.6 [ 24028 ]
            Fix Version/s 10.11 [ 27614 ]
            Fix Version/s 11.4 [ 29301 ]
            Fix Version/s 11.5 [ 29506 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.