Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-21109

Table corruption not detected with CHECK TABLE or innochecksum, only with mariabackup

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Incomplete
    • 10.3.17, 10.3.21, 10.3.23
    • N/A
    • mariabackup

    Description

      It is observed that, while taking the backup using mariabackup(10.3.17) it's keep failing with following error:

      Backup command used:


      /var/lib/mysql/bin/mariabackup --defaults-file=/etc/my.cnf --user=**** --password=******* --backup --skip-encrypted-backup --compress --ftwrl-wait-timeout=5 --ftwrl-wait-threshold=300 --ftwrl-wait-query-type=all --target-dir=/tmp/backup/full_backup_2019
      
      


      mariabackup output:


      [00] 2019-11-16 06:29:24 Connecting to MySQL server host: localhost, user: xxx, password: xxx, port: 3306, socket: /var/lib/mysql/mysql.sock
      [00] 2019-11-16 06:29:24 Using server version 10.3.17-MariaDB-log
      /var/lib/mysql/bin/mariabackup based on MariaDB server 10.3.17-MariaDB Linux (x86_64)
      ....
      ....
      [01] 2019-11-16 06:29:27 Compressing ./foo/bar.ibd to /tmp/backup/full_backup_2019/foo/bar.ibd.qp
      [01] 2019-11-16 06:29:27         ...done
      [01] 2019-11-16 06:29:27 Compressing ./foo/foobar.ibd to /tmp/backup/full_backup_2019/foo/foobar.ibd.qp
      [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
      [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
      [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
      [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
      [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
      [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
      [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
      [01] 2019-11-16 06:29:27 Database page corruption detected at page 5, retrying...
      [01] 2019-11-16 06:29:28 Database page corruption detected at page 5, retrying...
      [00] 2019-11-16 06:29:28 >> log scanned up to (30851334774)
      [01] 2019-11-16 06:29:28 Error: failed to read page after 10 retries. File ./foo/foobar.ibd seems to be corrupted.
      2019-11-16 6:29:28 0 [Note] InnoDB: Page dump in ascii and hex (16384 bytes):
      .......
      .......
      InnoDB: End of page dump
      2019-11-16 6:29:28 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 3454859770, calculated checksums for field1: crc32 3454859770, innodb 3654618756, page type 17855 == INDEX.none 3735928559, stored checksum in field2 3454859770, calculated checksums for field2: crc32 3454859770, innodb 4252287317, none 3735928559, page LSN 7 450897511, low 4 bytes of LSN at page end 450897511, page number (if stored to page already) 7, space id (if created with >= MySQL-4.1.1 and stored already) 1962
      2019-11-16 6:29:28 0 [Note] InnoDB: Page may be an index page where index id is 5185
      [01] 2019-11-16 06:29:28 mariabackup: xtrabackup_copy_datafile() failed.
      [00] FATAL ERROR: 2019-11-16 06:29:28 failed to copy datafile.
      
      


      Did tried, fixing the table (pointed as corrupted) with following, but the same issue is occurring.


      set OLD_ALTER_TABLE=1
      Alter table table_name engine=InnoDB
      Alter table table_name FORCE
      Take mysqldump of table and restore it to Database


      Attachments

        1. break_down_rate.ddl
          0.8 kB
        2. break_down_rate.ibd
          112 kB
        3. corrupt20200921.tgz.enc
          10 kB
        4. cs0093460_20200807_core.21139.gza
          10.00 MB
        5. cs0093460_20200807_core.21139.gzb
          10.00 MB
        6. cs0093460_20200807_core.21139.gzc
          10.00 MB
        7. cs0093460_20200807_core.21139.gzd
          10.00 MB
        8. cs0093460_20200807_core.21139.gze
          10.00 MB
        9. cs0093460_20200807_core.21139.gzf
          10.00 MB
        10. cs0093460_20200807_core.21139.gzg
          10.00 MB
        11. cs0093460_20200807_core.21139.gzh
          10.00 MB
        12. cs0093460_20200807_core.21139.gzi
          10.00 MB
        13. cs0093460_20200807_core.21139.gzj
          10.00 MB
        14. cs0093460_20200807_core.21139.gzk
          10.00 MB
        15. cs0093460_20200807_core.21139.gzl
          10.00 MB
        16. cs0093460_20200807_core.21139.gzm
          8.14 MB
        17. cs0093460_20200807_error_mariadb_x01bofiddb1a.log
          78 kB
        18. cs0093460_20200807_lib.tar.gza
          10.00 MB
        19. cs0093460_20200807_lib.tar.gzb
          4.03 MB
        20. cs0093460_20200917_error_mariadb_x01bstredb1a.log
          14 kB
        21. cs0093460_20200917_full_backup_20200916_054536.log
          30 kB
        22. cs0093460_20200917_full_backup_20200917_120913.log
          29 kB
        23. mdev-21109-rr.sh
          4 kB
        24. show_global_status_x07gisiddb3a.log
          62 kB
        25. show_global_variables_x07gisiddb3a.log
          475 kB
        26. t33914-202006252311.log
          104 kB
        27. t33914-202006252313.tgz.enc.a
          9.00 MB
        28. t33914-202006252313.tgz.enc.b
          9.00 MB
        29. t33914-202006252313.tgz.enc.c
          9.00 MB
        30. t33914-202006252313.tgz.enc.d
          8.09 MB

        Issue Links

          Activity

            In break_down_rate.ddl, the PAGE_LEVEL of both pages 3 (the PRIMARY KEY root page) and 4 (the first secondary index root page, if all indexes were created before inserting data) is 0. These are the 2 bytes at byte offsets 0xc040 and 0x10040.

            This suggests that page 5 coudd be unused garbage, and CHECK TABLE does not read such pages. It is only reading pages that are reachable from the index tree root pages.

            Furthermore, CHECK TABLE (more specifically, btr_validate_index()) does not seem to fetch any BLOB pages. They would be reachable from clustered index (PRIMARY KEY) leaf pages.

            Because page 5 (starting at 0x14000) does contain the B-tree page identifiers "infimum" and "supremum", and because FIL_PAGE_TYPE is FIL_PAGE_INDEX, it should not be a BLOB page. So, CHECK TABLE seems to be working as designed.

            Both index root pages 3 and 4 are empty: the bytes 0x000d right before the "infimum" point straight to the "supremum". So, there cannot be any references to BLOB pages.

            The file appears to contain two ‘extra’ pages: the allegedly corrupted page 5 and an all-zero page 6.

            One last thing that we can check is the page allocation bitmap in page 0, which should describe pages 0 to 16383. The FSP_SIZE at offset 0x2e is 7, which does match the file size (page 6 is the last page). It is normal to have a few extra empty pages at the end of the data file.

            The allocation bitmap seems to start at byte offset 0xae, and according to it, only pages 0,1,2,3,4 are marked as allocated. The corrupted page 5 is marked as free.

            This is a border case, but apart from failing to read and check BLOB pages, I do not see any wrongdoing by CHECK TABLE.

            I would tend to suspect a problem in Mariabackup. The page at byte offset 0x14000 (page 5) looks unrelated to the rest of the data file. vlad.lesin, did you notice that the tablespace ID at byte offset 0x14022 is 0x431 instead of 0x4ec?

            I remember diagnosing a similar issue with the same support customer about a year ago, but we failed to reach a conclusion then. On July 1, 2019, we added a debug check to InnoDB in MDEV-19871 to specifically catch such tablespace ID mismatch. (The commit message uses the wrong MDEV number and is misleadingly mentioning the innochecksum tool, which was not changed.). That check has never failed in our internal testing since then. Based on that, I firmly believe that a bug must exist in the interaction between mariabackup --backup and the server process. It remains to be seen whether that is a design bug (which could be fixed by implementing server-side backup, as noted in MDEV-14992) or something that could be improved in the code.

            I would request that a repeatable test case for this be created. Because the failure is likely to be nondeterministic, I would suggest the following approach with https://rr-project.org/:

            1. Start the server as rr record /usr/sbin/mysqld …
            2. Start the workload on the server.
            3. Run rr record mariabackup --backup …
            4. Shut down the server.
            5. Make a copy of the backup.
            6. Run mariabackup --prepare … on the backup (or the copy of it).
            7. If it does not fail, delete the backup and the copy, and start from the beginning.
            8. Else, preserve all files for subsequent analysis with rr replay.

            I am proposing frequent server restarts, so that the rr replay traces would be smaller. Note that rr replay traces are not portable between computers in practice. For analysis, remote access to the system would be needed.

            This approach was successfully used when fixing crash recovery bugs in code that was rewritten in MariaDB 10.5, for example MDEV-22139.

            marko Marko Mäkelä added a comment - In break_down_rate.ddl , the PAGE_LEVEL of both pages 3 (the PRIMARY KEY root page) and 4 (the first secondary index root page, if all indexes were created before inserting data) is 0. These are the 2 bytes at byte offsets 0xc040 and 0x10040. This suggests that page 5 coudd be unused garbage, and CHECK TABLE does not read such pages. It is only reading pages that are reachable from the index tree root pages. Furthermore, CHECK TABLE (more specifically, btr_validate_index() ) does not seem to fetch any BLOB pages. They would be reachable from clustered index ( PRIMARY KEY ) leaf pages. Because page 5 (starting at 0x14000) does contain the B-tree page identifiers "infimum" and "supremum", and because FIL_PAGE_TYPE is FIL_PAGE_INDEX , it should not be a BLOB page. So, CHECK TABLE seems to be working as designed. Both index root pages 3 and 4 are empty: the bytes 0x000d right before the "infimum" point straight to the "supremum". So, there cannot be any references to BLOB pages. The file appears to contain two ‘extra’ pages: the allegedly corrupted page 5 and an all-zero page 6. One last thing that we can check is the page allocation bitmap in page 0, which should describe pages 0 to 16383. The FSP_SIZE at offset 0x2e is 7, which does match the file size (page 6 is the last page). It is normal to have a few extra empty pages at the end of the data file. The allocation bitmap seems to start at byte offset 0xae, and according to it, only pages 0,1,2,3,4 are marked as allocated. The corrupted page 5 is marked as free. This is a border case, but apart from failing to read and check BLOB pages, I do not see any wrongdoing by CHECK TABLE . I would tend to suspect a problem in Mariabackup. The page at byte offset 0x14000 (page 5) looks unrelated to the rest of the data file. vlad.lesin , did you notice that the tablespace ID at byte offset 0x14022 is 0x431 instead of 0x4ec? I remember diagnosing a similar issue with the same support customer about a year ago, but we failed to reach a conclusion then. On July 1, 2019, we added a debug check to InnoDB in MDEV-19871 to specifically catch such tablespace ID mismatch. (The commit message uses the wrong MDEV number and is misleadingly mentioning the innochecksum tool, which was not changed.). That check has never failed in our internal testing since then. Based on that, I firmly believe that a bug must exist in the interaction between mariabackup --backup and the server process. It remains to be seen whether that is a design bug (which could be fixed by implementing server-side backup, as noted in MDEV-14992 ) or something that could be improved in the code. I would request that a repeatable test case for this be created. Because the failure is likely to be nondeterministic, I would suggest the following approach with https://rr-project.org/: Start the server as rr record /usr/sbin/mysqld … Start the workload on the server. Run rr record mariabackup --backup … Shut down the server. Make a copy of the backup. Run mariabackup --prepare … on the backup (or the copy of it). If it does not fail, delete the backup and the copy, and start from the beginning. Else, preserve all files for subsequent analysis with rr replay . I am proposing frequent server restarts, so that the rr replay traces would be smaller. Note that rr replay traces are not portable between computers in practice. For analysis, remote access to the system would be needed. This approach was successfully used when fixing crash recovery bugs in code that was rewritten in MariaDB 10.5, for example MDEV-22139 .

            vlad.lesin, I think that you must reproduce the scenario that leads to mariabackup --prepare creating a backup that contains a page for the wrong tablespace.

            I do not want to add non-debug code to the server in main branches. I also think that we should check that the page number in the page frame matches the block, when we read pages into the buffer pool.

            We did provide a custom non-debug build (with an if statement) in the support case that was associated with MDEV-19871, hoping it to catch the mismatch. And it never did. You can provide another custom build with more extensive check. Maybe you should also add such a check to the mariabackup --backup code.

            marko Marko Mäkelä added a comment - vlad.lesin , I think that you must reproduce the scenario that leads to mariabackup --prepare creating a backup that contains a page for the wrong tablespace. I do not want to add non-debug code to the server in main branches. I also think that we should check that the page number in the page frame matches the block, when we read pages into the buffer pool. We did provide a custom non-debug build (with an if statement) in the support case that was associated with MDEV-19871 , hoping it to catch the mismatch. And it never did. You can provide another custom build with more extensive check. Maybe you should also add such a check to the mariabackup --backup code.
            vlad.lesin Vladislav Lesin added a comment - - edited

            marko > You can provide another custom build with more extensive check. Maybe you should also add such a check to the mariabackup --backup code.

            I already did it, and created several custom builds for different server versions(they can be found by "git branch -a | grep 21109") and instructions similar to yours instruction above, how to use such builds, but, as I know, non of those builds were tried following the instructions.

            > Vladislav Lesin, I think that you must reproduce the scenario that leads to mariabackup --prepare creating a backup that contains a page for the wrong tablespace.

            "mariabackup --prepare" does not take part in the bug reproduction on the customer's side.

            I have some assumption I am working on. The assumption is based on your analysis above and our discussion in slack. Mariabackup fails on freed page. Innodb recovery does not read "freed" pages. Read-ahead algorithm reads pages which are in the chain of B-tree leaf pages. So the assumption is that mariabackup must not check freed pages, but it does. The weakness of this assumption is that our tests would more likely catch such error.

            vlad.lesin Vladislav Lesin added a comment - - edited marko > You can provide another custom build with more extensive check. Maybe you should also add such a check to the mariabackup --backup code. I already did it, and created several custom builds for different server versions(they can be found by "git branch -a | grep 21109") and instructions similar to yours instruction above, how to use such builds, but, as I know, non of those builds were tried following the instructions. > Vladislav Lesin, I think that you must reproduce the scenario that leads to mariabackup --prepare creating a backup that contains a page for the wrong tablespace. "mariabackup --prepare" does not take part in the bug reproduction on the customer's side. I have some assumption I am working on. The assumption is based on your analysis above and our discussion in slack. Mariabackup fails on freed page. Innodb recovery does not read "freed" pages. Read-ahead algorithm reads pages which are in the chain of B-tree leaf pages. So the assumption is that mariabackup must not check freed pages, but it does. The weakness of this assumption is that our tests would more likely catch such error.

            According to os_file_set_size(), when tablespace is extended, all extra pages must be filled with zero's. But we have two pages at the end of "break_down_rate.ibd", the last page 6 is zeroed, while allegedly corrupted page 5 is not zeroed.

            vlad.lesin Vladislav Lesin added a comment - According to os_file_set_size(), when tablespace is extended, all extra pages must be filled with zero's. But we have two pages at the end of "break_down_rate.ibd", the last page 6 is zeroed, while allegedly corrupted page 5 is not zeroed.

            BTW, there might also be the sense to implement something like online check mode from innochecksum, to have ability to detect such errors with this tool, but not only with mariabackup. What do you think, marko?

            vlad.lesin Vladislav Lesin added a comment - BTW, there might also be the sense to implement something like online check mode from innochecksum, to have ability to detect such errors with this tool, but not only with mariabackup. What do you think, marko ?

            I think that it could be easier to extend CHECK TABLE in the server than to duplicate the functionality in an external tool. If there is a concern that CHECK TABLE would crash the server due to MDEV-13542, that could be worked around by importing the table-to-check into a non-production server instance and then executing the command.

            marko Marko Mäkelä added a comment - I think that it could be easier to extend CHECK TABLE in the server than to duplicate the functionality in an external tool. If there is a concern that CHECK TABLE would crash the server due to MDEV-13542 , that could be worked around by importing the table-to-check into a non-production server instance and then executing the command.

            vlad.lesin, as part of MDEV-19871 we already extended the innochecksum tool with validation of the page identifier. Is the corruption of the provided data files detected by that tool? If yes, could this tool be integrated as part of backup validation?

            I have a further idea: What if we extend the innochecksum tool with an option that will zero out (or optionally punch holes) over any pages that are marked as freed in the page allocation bitmap pages?

            An even further idea (if such an extended innochecksum tool is proved to help) would be to improve mariabackup --prepare in such a way that it will remember which pages were corrupted (maybe getting this information from mariabackup --backup via some configuration file) and then check if all those pages happen to be marked as freed. If so, zero out the pages and issue diagnostic messages (NOTE). If some corrupted pages are not marked as freed in the allocation bitmap, then we would issue WARNING or ERROR messages.

            marko Marko Mäkelä added a comment - vlad.lesin , as part of MDEV-19871 we already extended the innochecksum tool with validation of the page identifier . Is the corruption of the provided data files detected by that tool? If yes, could this tool be integrated as part of backup validation? I have a further idea: What if we extend the innochecksum tool with an option that will zero out (or optionally punch holes) over any pages that are marked as freed in the page allocation bitmap pages? An even further idea (if such an extended innochecksum tool is proved to help) would be to improve mariabackup --prepare in such a way that it will remember which pages were corrupted (maybe getting this information from mariabackup --backup via some configuration file) and then check if all those pages happen to be marked as freed. If so, zero out the pages and issue diagnostic messages (NOTE). If some corrupted pages are not marked as freed in the allocation bitmap, then we would issue WARNING or ERROR messages.

            marko Yes, the "corruption" can be detected by innochecksum with MDEV-19871 fix. And yes, we can zero out non-allocated pages both with innochecksum and "mariabackup --prepare". But I have the following concerns:

            1. innochecksum is the tool which is supposed to check tablespaces and does not modify them. We have option to rewrite checksum algorithm. But it modifies only checksums. Should we turn the tool for checksums check to the tool for corruptions fixing?

            2. "mariabackup --prepare" could fix such pages. But, as I understood, there must not be non-zero non-allocated pages. And we need to understand what is the source of such pages. And when we understand it, the code to fix such pages will be useless. There will be one more option which is rarely used, and which we have to maintain until some major version.

            vlad.lesin Vladislav Lesin added a comment - marko Yes, the "corruption" can be detected by innochecksum with MDEV-19871 fix. And yes, we can zero out non-allocated pages both with innochecksum and "mariabackup --prepare". But I have the following concerns: 1. innochecksum is the tool which is supposed to check tablespaces and does not modify them. We have option to rewrite checksum algorithm. But it modifies only checksums. Should we turn the tool for checksums check to the tool for corruptions fixing? 2. "mariabackup --prepare" could fix such pages. But, as I understood, there must not be non-zero non-allocated pages. And we need to understand what is the source of such pages. And when we understand it, the code to fix such pages will be useless. There will be one more option which is rarely used, and which we have to maintain until some major version.

            One change that we should be able to implement rather easily in mariabackup --prepare if it has not been already done, is what MDEV-25361 did to innochecksum: Do not care about corrupted pages that are marked as freed in the data files. Is that fix still missing, now that MDEV-22929 has been implemented in backup? Does someone get bogus page corruption alarms for pages that are actually marked as free?

            I think that we need an executable test case that does not involve any pre-corrupted .ibd files but only depends on a newly reinitialized server, to demonstrate what actually is not working as expected.

            While waiting for a test case, I think that we should fix reproducible backup bugs, such as MDEV-18200 and MDEV-26326.

            The MariaDB Server 10.6 release fixed the last known InnoDB violations of the write-ahead logging protocol that I was aware of: MDEV-24626 and MDEV-25506. With that release, most remaining backup problems should be bugs in the backup code, not in the server.

            marko Marko Mäkelä added a comment - One change that we should be able to implement rather easily in mariabackup --prepare if it has not been already done, is what MDEV-25361 did to innochecksum : Do not care about corrupted pages that are marked as freed in the data files. Is that fix still missing, now that MDEV-22929 has been implemented in backup? Does someone get bogus page corruption alarms for pages that are actually marked as free? I think that we need an executable test case that does not involve any pre-corrupted .ibd files but only depends on a newly reinitialized server, to demonstrate what actually is not working as expected. While waiting for a test case, I think that we should fix reproducible backup bugs, such as MDEV-18200 and MDEV-26326 . The MariaDB Server 10.6 release fixed the last known InnoDB violations of the write-ahead logging protocol that I was aware of: MDEV-24626 and MDEV-25506 . With that release, most remaining backup problems should be bugs in the backup code, not in the server.

            Isn't this bug report about CHECK TABLE (and innochecksum) behavior, a request to find problems that affect mariabackup at earlier stages? OK, in new version 10.6 mariabackup should work in a more robust way, but what about 10.2 - 10.5?

            valerii Valerii Kravchuk added a comment - Isn't this bug report about CHECK TABLE (and innochecksum) behavior, a request to find problems that affect mariabackup at earlier stages? OK, in new version 10.6 mariabackup should work in a more robust way, but what about 10.2 - 10.5?

            In the original customer case that prompted us to file MDEV-19871 (and later this ticket), we found that a data file had an incorrect page written to it. This was never reproducible outside that customer’s environment. We added some internal checks to detect incorrect page ID when writing to a file, and those checks have never failed. I can only suspect that something was wrong in the underlying layer (corrupted file system causing a block to be mapped to two files, or incorrectly working block device).

            valerii, it is not clear to me whether there are any "problems that affect mariabackup at earlier stages" anymore. I believe that my comment in this ticket from 2020-06-10 was actually addressed in MDEV-22929 and MDEV-25361:

            The allocation bitmap seems to start at byte offset 0xae, and according to it, only pages 0,1,2,3,4 are marked as allocated. The corrupted page 5 is marked as free.

            What is the exact problem that we want to detect and where?

            marko Marko Mäkelä added a comment - In the original customer case that prompted us to file MDEV-19871 (and later this ticket), we found that a data file had an incorrect page written to it. This was never reproducible outside that customer’s environment. We added some internal checks to detect incorrect page ID when writing to a file, and those checks have never failed. I can only suspect that something was wrong in the underlying layer (corrupted file system causing a block to be mapped to two files, or incorrectly working block device). valerii , it is not clear to me whether there are any "problems that affect mariabackup at earlier stages" anymore. I believe that my comment in this ticket from 2020-06-10 was actually addressed in MDEV-22929 and MDEV-25361 : The allocation bitmap seems to start at byte offset 0xae, and according to it, only pages 0,1,2,3,4 are marked as allocated. The corrupted page 5 is marked as free. What is the exact problem that we want to detect and where?

            I was hoping that this could be closed as a duplicate of MDEV-22929. But, valerii stated that the problem was repeatable with 10.3.28.

            However, we were not able to repeat this. Hence, I am closing this as "Cannot Reproduce". We can reopen and fix this if someone provides something for repeating this bug.

            marko Marko Mäkelä added a comment - I was hoping that this could be closed as a duplicate of MDEV-22929 . But, valerii stated that the problem was repeatable with 10.3.28. However, we were not able to repeat this. Hence, I am closing this as "Cannot Reproduce". We can reopen and fix this if someone provides something for repeating this bug.

            People

              marko Marko Mäkelä
              suresh.ramagiri@mariadb.com suresh ramagiri
              Votes:
              4 Vote for this issue
              Watchers:
              19 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.