Details

    Description

      We are starting to experience random table index corruption on our more busy (read and write) databases. We've had these databases for over 8 years, and have gone through multiple upgrades over the years, however, nothing recently has changed, and these random corrupt indexes just started showing up.

      Currently, unable to find the proper method to troubleshoot what is causing this corruption, the log files look clean leading up to the corruption, and then just the random:
      2024-05-01 23:56:02 172 [ERROR] mariadbd: Index for table 'BusyTableHere' is corrupt; try to repair it

      Attachments

        Issue Links

          Activity

            No feedback, so closing. Will reopen if more feedback is added

            serg Sergei Golubchik added a comment - No feedback, so closing. Will reopen if more feedback is added

            This may share a root cause with MDEV-34453. MDEV-27058 had introduced a race condition between page creation and page eviction. While we were able to reproduce it on undo log pages only (between read-ahead and creating an undo page), based on my discussion with debarun yesterday, it could happen on any persistent InnoDB data page.

            One possible scenario might be that a large INSERT is being rolled back for whatever reason, and some index pages would be freed because the rollback would shrink the B-tree. Some time after that, this recently freed page would be reallocated due to a subsequent INSERT or UPDATE. Concurrently, the buf_flush_page_cleaner() thread would remove this being-created page from the buffer pool. We did not think through what the corruption could look like, and we did not reproduce such corruption. Already reproducing the MDEV-34453 was a significant effort; it would never reproduce during our regular stress tests.

            marko Marko Mäkelä added a comment - This may share a root cause with MDEV-34453 . MDEV-27058 had introduced a race condition between page creation and page eviction. While we were able to reproduce it on undo log pages only (between read-ahead and creating an undo page), based on my discussion with debarun yesterday, it could happen on any persistent InnoDB data page. One possible scenario might be that a large INSERT is being rolled back for whatever reason, and some index pages would be freed because the rollback would shrink the B-tree. Some time after that, this recently freed page would be reallocated due to a subsequent INSERT or UPDATE . Concurrently, the buf_flush_page_cleaner() thread would remove this being-created page from the buffer pool. We did not think through what the corruption could look like, and we did not reproduce such corruption. Already reproducing the MDEV-34453 was a significant effort; it would never reproduce during our regular stress tests.

            We are tracking these corruptions internally and so far we've not seen the same host getting the same table corrupted twice after being rebuilt the first time.

            marostegui Manuel Arostegui added a comment - We are tracking these corruptions internally and so far we've not seen the same host getting the same table corrupted twice after being rebuilt the first time.

            Thank you, marostegui. I think that it is very plausible that this shares a root cause with MDEV-34453.

            marko Marko Mäkelä added a comment - Thank you, marostegui . I think that it is very plausible that this shares a root cause with MDEV-34453 .

            Thank you marko - I will work on 10.6.20 probably by the end of the week.

            marostegui Manuel Arostegui added a comment - Thank you marko - I will work on 10.6.20 probably by the end of the week.

            People

              marko Marko Mäkelä
              dbray_sd Daniel Bray
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.