Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28349

Provide "crash safe" options for CHECK TABLE and ALTER TABLE ... CHECK PARTITION ...

Details

    Description

      We need a safe way to run table checks for InnoDB tabes with statements like CHECK TABLE or ALTER TABLE ... CHECK PARTITION in production, that will NOT cause any deliberate assertion failures.

      Something like deprecated and removed since 10.3 innodb_corrupt_table_action option (the name may be different) for these statements (or all access) with values like "assert" (current behaviour), "warn" (add the details about corruption found and continue if possible or stop stating the table/partition is corrupted) etc. This should apply NOT only to page checksums, but to all other kinds of assertions we may hit in InnoDB in the process.

      Attachments

        Issue Links

          Activity

            The CHECK TABLE record-counting code was rewritten in MDEV-24402. It will share less code with DML statements, such as SELECT. Because the implementation is simpler, it could be even less prone to crashing.

            marko Marko Mäkelä added a comment - The CHECK TABLE record-counting code was rewritten in MDEV-24402 . It will share less code with DML statements, such as SELECT . Because the implementation is simpler, it could be even less prone to crashing.

            valerii, have any crashes been observed with MariaDB Server 10.6.9 or later? (MDEV-24402 was implemented in 10.6.11.)

            marko Marko Mäkelä added a comment - valerii , have any crashes been observed with MariaDB Server 10.6.9 or later? ( MDEV-24402 was implemented in 10.6.11.)

            Are we sure that there is a version (which one, 10.6.9?) where CHECK TABLE and ALTER TABLE ... CHECK PARTITION ... statements are entirely safe, in a sense that when the statement finds any corruption or problem, it reports it, maybe do something else, but let server (other threads) to continue working? If so, the task can be closed IMHO. I doubt we are at this stage already, though.

            valerii Valerii Kravchuk added a comment - Are we sure that there is a version (which one, 10.6.9?) where CHECK TABLE and ALTER TABLE ... CHECK PARTITION ... statements are entirely safe, in a sense that when the statement finds any corruption or problem, it reports it, maybe do something else, but let server (other threads) to continue working? If so, the task can be closed IMHO. I doubt we are at this stage already, though.
            marko Marko Mäkelä added a comment - - edited

            valerii, I agree that it is better to retain this ticket open for a few more months, to find practical examples where CHECK TABLE would crash.

            Just today, related to MDEV-28797, I became aware of MDEV-29976, which is a possible crash when a particular form of corruption is encountered in a ROW_FORMAT=COMPRESSED page.
            The reported crash is on the "write" side, but there are some intentional crashes in code invoked by page_zip_decompress() as well. That code can be invoked by CHECK TABLE also after the MDEV-24402 rewrite.

            marko Marko Mäkelä added a comment - - edited valerii , I agree that it is better to retain this ticket open for a few more months, to find practical examples where CHECK TABLE would crash. Just today, related to MDEV-28797 , I became aware of MDEV-29976 , which is a possible crash when a particular form of corruption is encountered in a ROW_FORMAT=COMPRESSED page. The reported crash is on the "write" side, but there are some intentional crashes in code invoked by page_zip_decompress() as well. That code can be invoked by CHECK TABLE also after the MDEV-24402 rewrite.

            valerii, a few months have passed. MDEV-29976 might be a duplicate of MDEV-28797.

            Since CHECK TABLE shares quite some code with the rest of InnoDB even after MDEV-24402, it is impossible to give guarantees that no crashes are possible. Fixing any remaining crashes (such as MDEV-30787, affecting only ROW_FORMAT=REDUNDANT tables) is only possible if we can get copies of the corrupted pages.

            marko Marko Mäkelä added a comment - valerii , a few months have passed. MDEV-29976 might be a duplicate of MDEV-28797 . Since CHECK TABLE shares quite some code with the rest of InnoDB even after MDEV-24402 , it is impossible to give guarantees that no crashes are possible. Fixing any remaining crashes (such as MDEV-30787 , affecting only ROW_FORMAT=REDUNDANT tables) is only possible if we can get copies of the corrupted pages.

            People

              Unassigned Unassigned
              valerii Valerii Kravchuk
              Votes:
              1 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.