Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31502

mariabackup: Failed to validate first page of the file error 39

Details

    Description

      In the galera environment, wsrep_sst_method=mariabackup, backup failed with

      [00] FATAL ERROR: 2023-06-07 22:01:34 Failed to validate first page of the file dbname/tablename, error 39

      on all 3 nodes of the cluster, after db was upgraded from 10.4.24 to 10.4.27.

      Attachments

        Activity

          Error 39 is simply DB_CORRUPTION. The message could simply mean that the checksum of the first page of the file is incorrect. What would innochecksum report on the file?

          I would expect mariadb-backup to fail if the checksum of the first page of any data file is incorrect. Could it be the case that the SST script ignored the error?

          marko Marko Mäkelä added a comment - Error 39 is simply DB_CORRUPTION . The message could simply mean that the checksum of the first page of the file is incorrect. What would innochecksum report on the file? I would expect mariadb-backup to fail if the checksum of the first page of any data file is incorrect. Could it be the case that the SST script ignored the error?

          We know that the file is corrupted, but we’d want to know how it might have become corrupted in the first place. Was this the first time that SST was being attempted? Could the file have been produced by prior use of wsrep_sst_method=rsync?

          marko Marko Mäkelä added a comment - We know that the file is corrupted, but we’d want to know how it might have become corrupted in the first place. Was this the first time that SST was being attempted? Could the file have been produced by prior use of wsrep_sst_method=rsync ?

          One more message that had been displayed was this one in Datafile::validate_first_page():

                  if (page_get_page_no(m_first_page) != 0) {
                          /* First page must be number 0 */
                          error_txt = "Header page contains inconsistent data";
                          goto err_exit;
                  }
          

          The first page of the 128KiB .ibd file, which contains some metadata and a page allocation bitmap for the first innodb_page_size pages, differs between two copies of the file as follows:

          --- /dev/fd/63	2023-06-26 12:42:39.653103434 +0300
          +++ /dev/fd/62	2023-06-26 12:42:39.649103383 +0300
          @@ -1,5 +1,5 @@
          -000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
          -000010 00 00 00 07 a3 e6 84 b9 00 08 00 00 00 00 00 00
          +000000 15 03 03 00 1a b2 47 d7 7c fe a6 d5 55 2f 46 90
          +000010 74 c6 f6 79 51 ca b3 f8 a7 2d 90 58 5f 8c 03 00
           000020 00 00 00 00 02 3a 00 00 02 3a 00 00 00 00 00 00
           000030 00 08 00 00 00 40 00 00 00 15 00 00 00 06 00 00
           000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
          @@ -12,15 +12,15 @@
           0000b0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00
           0000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
           *
          -003ff0 00 00 00 00 00 00 00 00 a3 e6 84 b9 95 e2 81 f4
          +003ff0 00 00 00 00 00 00 00 00 ab 7f 1c 69 67 9d 74 a2
          

          For some reason, the first 32 bytes of the file were overwritten by something that looks like garbage. The 32-bit page number 0 would be stored at offset 4. The tablespace identifier at 34 and 38 is 0x023a in both files.

          At offset FIL_PAGE_LSN (0x10) we have the 64-bit log sequence number of the page. It is 0x07a3e684b9 in the correct file, and some garbage in the corrupted file. At the end of the correct file, before the 32-bit checksum we have the 32 least significant bits of FIL_PAGE_LSN, that is, 0xa3e684b9. In the corrupted file, those bytes are 0xab7f1c69. Assuming that the corrupted file is newer, its correct LSN must be 0x07ab7f1c69 or more. In any case, the FIL_PAGE_LSN at the start of the corrupted file 0x74c6f67951cab3f8a7 does not match the LSN at the end of the page.

          Something has corrupted the file. Theoretically, it could be anything that has write access to the file system or the block device. I think that it is unlikely that the page would have been corrupted in RAM when it was in the buffer pool of an InnoDB server or mariadb-backup, because right before when writing a page to disk, InnoDB would copy the least significant bits of the LSN and compute the page checksum. I would tend to blame the hardware on this. But, which hardware would use a 32-byte buffer size? In many processor caches and I suppose SDRAM transactions, the block size is 64 bytes.

          marko Marko Mäkelä added a comment - One more message that had been displayed was this one in Datafile::validate_first_page() : if (page_get_page_no(m_first_page) != 0) { /* First page must be number 0 */ error_txt = "Header page contains inconsistent data" ; goto err_exit; } The first page of the 128KiB .ibd file, which contains some metadata and a page allocation bitmap for the first innodb_page_size pages, differs between two copies of the file as follows: --- /dev/fd/63 2023-06-26 12:42:39.653103434 +0300 +++ /dev/fd/62 2023-06-26 12:42:39.649103383 +0300 @@ -1,5 +1,5 @@ -000000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 -000010 00 00 00 07 a3 e6 84 b9 00 08 00 00 00 00 00 00 +000000 15 03 03 00 1a b2 47 d7 7c fe a6 d5 55 2f 46 90 +000010 74 c6 f6 79 51 ca b3 f8 a7 2d 90 58 5f 8c 03 00 000020 00 00 00 00 02 3a 00 00 02 3a 00 00 00 00 00 00 000030 00 08 00 00 00 40 00 00 00 15 00 00 00 06 00 00 000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00 @@ -12,15 +12,15 @@ 0000b0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff 00 00 0000c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 * -003ff0 00 00 00 00 00 00 00 00 a3 e6 84 b9 95 e2 81 f4 +003ff0 00 00 00 00 00 00 00 00 ab 7f 1c 69 67 9d 74 a2 For some reason, the first 32 bytes of the file were overwritten by something that looks like garbage. The 32-bit page number 0 would be stored at offset 4. The tablespace identifier at 34 and 38 is 0x023a in both files. At offset FIL_PAGE_LSN (0x10) we have the 64-bit log sequence number of the page. It is 0x07a3e684b9 in the correct file, and some garbage in the corrupted file. At the end of the correct file, before the 32-bit checksum we have the 32 least significant bits of FIL_PAGE_LSN , that is, 0xa3e684b9 . In the corrupted file, those bytes are 0xab7f1c69 . Assuming that the corrupted file is newer, its correct LSN must be 0x07ab7f1c69 or more. In any case, the FIL_PAGE_LSN at the start of the corrupted file 0x74c6f67951cab3f8a7 does not match the LSN at the end of the page. Something has corrupted the file. Theoretically, it could be anything that has write access to the file system or the block device. I think that it is unlikely that the page would have been corrupted in RAM when it was in the buffer pool of an InnoDB server or mariadb-backup , because right before when writing a page to disk, InnoDB would copy the least significant bits of the LSN and compute the page checksum. I would tend to blame the hardware on this. But, which hardware would use a 32-byte buffer size? In many processor caches and I suppose SDRAM transactions, the block size is 64 bytes.
          garnera Andrew Garner added a comment -

          We observed a similarly weird data corruption bug that manifested slightly differently in our environment - the first 24 bytes of an .ibd were clobbered. We could reproduce this by running SST in loop. It can be tricky hitting the right timing to expose this easily - some environments would never fail this way.

          I think the hint here is the initial byte sequence "15 03 03 00 1a ... 26 more bytes ...". This seems like a TLS (v1.2) alert record - probably the "close notify" message for a connection. If TLSv1.3 were in use there, we might see "17 03 03 00 13 .. 19 more bytes .." (24 bytes) written instead (which is what observed in our environment).

          This sequence of events may be occurring:

          • MariaDB starts a node joining an existing cluster
          • Galera eventually runs the IST receiver
          • The IST receiver accepts a connection from the donor, say fd=16
          • The IST receiver finishes and closes this socket, which just closes the underlying file descriptor (fd=16 is closed)
          • Due to unfortunate timing, InnoDB opens a file right around this time, and reuses this file descriptor, fd=16, for some ./database/table.ibd
          • The IST socket goes out of scope. The destructor gets called. This destructor calls "shutdown()" and "close()" again
          • shutdown() calls SSL_shutdown() using the old (previously closed) fd=16 is written to; Usually this is a write to a closed file that fails and is relatively harmless, but InnoDB had reused this file descriptor
          • SSL writes a close notify record to its file descriptor (fd=16), which was previously closed
          • ./database/table.ibd now has garbage in the initial bytes of its header that originated from some SSL record
          • No one notices until mariabackup is run later - possibly when this nodes acts as a donor in the future or MariaDB crashes or someone runs "CHECK TABLE" or similar.

          Seems like this bug may have been around for a while, but this unfortunate sequence of events is rather rare.

          garnera Andrew Garner added a comment - We observed a similarly weird data corruption bug that manifested slightly differently in our environment - the first 24 bytes of an .ibd were clobbered. We could reproduce this by running SST in loop. It can be tricky hitting the right timing to expose this easily - some environments would never fail this way. I think the hint here is the initial byte sequence "15 03 03 00 1a ... 26 more bytes ...". This seems like a TLS (v1.2) alert record - probably the "close notify" message for a connection. If TLSv1.3 were in use there, we might see "17 03 03 00 13 .. 19 more bytes .." (24 bytes) written instead (which is what observed in our environment). This sequence of events may be occurring: MariaDB starts a node joining an existing cluster Galera eventually runs the IST receiver The IST receiver accepts a connection from the donor, say fd=16 The IST receiver finishes and closes this socket , which just closes the underlying file descriptor (fd=16 is closed) Due to unfortunate timing, InnoDB opens a file right around this time, and reuses this file descriptor, fd=16, for some ./database/table.ibd The IST socket goes out of scope. The destructor gets called. This destructor calls "shutdown()" and "close()" again shutdown() calls SSL_shutdown() using the old (previously closed) fd=16 is written to; Usually this is a write to a closed file that fails and is relatively harmless, but InnoDB had reused this file descriptor SSL writes a close notify record to its file descriptor (fd=16), which was previously closed ./database/table.ibd now has garbage in the initial bytes of its header that originated from some SSL record No one notices until mariabackup is run later - possibly when this nodes acts as a donor in the future or MariaDB crashes or someone runs "CHECK TABLE" or similar. Seems like this bug may have been around for a while, but this unfortunate sequence of events is rather rare.

          People

            sysprg Julius Goryavsky
            YK Yakov Kushnirsky
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.