Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-30134

buf_page_t::unfix(): Assertion `!((f ^ (f - 1)) & LRU_MASK)' failed

Details

    Description

      The attached test case provided by mleich rather easily reproduces an assertion failure related to the change buffer, on all server versions 10.6 through 10.11.

      The involved code was refactored as part of MDEV-27058, but it is yet unclear if this failure is a regression starting with 10.6.

      The change buffer was disabled by default in MDEV-27734, deprecated in MDEV-27735, and i t is scheduled for removal in MDEV-29694. With MDEV-29694 present, the test will not crash.

      The assertion fails with various stack traces, related to buffering purge operations (not inserts or delete-mark operations).

      Attachments

        1. TBR-1258.opt
          0.6 kB
        2. TBR-1258.result
          0.2 kB
        3. TBR-1258.test
          823 kB

        Issue Links

          Activity

            This can sometimes be reproduced with

            ./mtr --mysqld=--loose-innodb-change-buffering{=all,-debug=1} innodb.innodb_defragment
            

            marko Marko Mäkelä added a comment - This can sometimes be reproduced with . /mtr --mysqld=--loose-innodb-change-buffering{=all,-debug=1} innodb.innodb_defragment

            This is also occasionally reproducible with

            ./mtr --mysqld=--loose-innodb-change-buffering{=all,-debug=1} innodb.ibuf_delete
            

            marko Marko Mäkelä added a comment - This is also occasionally reproducible with ./mtr --mysqld=--loose-innodb-change-buffering{=all,-debug=1} innodb.ibuf_delete

            I just tried to reproduce this on 10.6 39f46745995939e17678d3c2f030f625d5bc41c2 (one commit before MDEV-30400), but failed so far. The only thing that I reproduced was a server hang due to a bug in innodb_change_buffering_debug=1 that was fixed in MDEV-30400.

            marko Marko Mäkelä added a comment - I just tried to reproduce this on 10.6 39f46745995939e17678d3c2f030f625d5bc41c2 (one commit before MDEV-30400 ), but failed so far. The only thing that I reproduced was a server hang due to a bug in innodb_change_buffering_debug=1 that was fixed in MDEV-30400 .

            Because mleich informed me that he last reproduced this on a development branch of MDEV-30148 on December 2, I created a fix based on its 10.6 parent commit for testing. I was not able to reproduce the failure myself today, either with that supposed fix or its 10.6 parent commit.

            As far as I can tell, this a bug specific to the debug-only parameter innodb_change_buffering_debug=1. But, there is some room for simplifying some code around this. To my understanding, a similar bug affects 10.5 as well, but the assertion expression would be count != 0 before this data structure was refactored in MDEV-27058.

            It might be much harder to reproduce the failure in 10.5 or older versions. While I developed a 10.5 version of the fix, I don’t think that it is feasible to apply it, if this really only affects a debug parameter, and if we are unable to reproduce the failure in the first place. Starting with 10.6, MDEV-21452, MDEV-24142 and similar changes could dramatically change some timing around this.

            marko Marko Mäkelä added a comment - Because mleich informed me that he last reproduced this on a development branch of MDEV-30148 on December 2, I created a fix based on its 10.6 parent commit for testing. I was not able to reproduce the failure myself today, either with that supposed fix or its 10.6 parent commit. As far as I can tell, this a bug specific to the debug-only parameter innodb_change_buffering_debug=1 . But, there is some room for simplifying some code around this. To my understanding, a similar bug affects 10.5 as well, but the assertion expression would be count != 0 before this data structure was refactored in MDEV-27058 . It might be much harder to reproduce the failure in 10.5 or older versions. While I developed a 10.5 version of the fix, I don’t think that it is feasible to apply it, if this really only affects a debug parameter, and if we are unable to reproduce the failure in the first place. Starting with 10.6, MDEV-21452 , MDEV-24142 and similar changes could dramatically change some timing around this.

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.