Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29438

Recovery or backup of instant ALTER TABLE is incorrect

    XMLWordPrintable

Details

    Description

      mleich provided rr replay traces where mariadb-backup --prepare fails like this:

      10.6 92032499874259bae7455130958ea7f38c4d53a3

      2022-08-31  9:06:55 0 [Note] InnoDB: Starting final batch to recover 241 pages from redo log.
      2022-08-31  9:06:58 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=0, page number=514]
      2022-08-31  9:06:58 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.
      2022-08-31  9:06:58 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=0, page number=514]; set innodb_force_recovery to ignore
      2022-08-31  9:06:58 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1490] with error Data structure corruption
      

      I extracted a copy of the page both from the backup and from the server at the logical point of time when the OPT_PAGE_CHECKSUM record was written. Apart from FIL_PAGE_LSN which is excluded from the checksum, the pages differ as follows:

      @@ -38,7 +38,7 @@
       0004e0 20 20 20 20 20 20 20 20 00 00 00 00 00 00 00 00
       0004f0 01 01 00 00 00 00 00 00 00 00 00 f0 3f 00 00 00
       000500 00 00 00 f0 3f 03 01 30 00 3c fb 73 00 00 00 00
      -000510 00 00 00 00 07 0a 8b 00 00 01 46 0d 17 00 00 00
      +000510 00 00 00 00 06 0a 8b 00 00 01 46 0d 17 00 00 00
       000520 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       000530 00 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
       000540 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20
      

      The difference is for a clustered index leaf page record that starts at 0x50c. The DB_TRX_ID=0x70a would be incorrectly recovered as 0x60a. This difference was caught thanks to MDEV-18976.

      The columns of the index seem to be something like the following:
      (id INT UNSIGNED NOT NULL, DB_TRX_ID, DB_ROLL_PTR, pad CHAR(60), c CHAR(120), k INT, geocol2 GEOMETRY, tcol CHAR(3)).

      On recovery, the incorrect byte was recovered by the following:

      10.6 92032499874259bae7455130958ea7f38c4d53a3

      (rr) frame 2
      #2  0x000055f5b96c8359 in page_apply_insert_dynamic (block=..., reuse=false, 
          prev=0, shift=0, enc_hdr_l=23, hdr_c=0, data_c=9, data=0x4e0c52e6e807, 
          data_len=212)
          at /data/Server/bb-10.6-MDEV-29374/storage/innobase/page/page0cur.cc:2838
      2838	  memcpy(buf, prev_rec, data_c);
      

      The buf starts at 0x50c (the start of the first column), and the prev_rec at offset 0x63 contains the following:

      (rr) p/x *prev_rec@data_c
      $12 = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x6}
      

      It is worth noting that right after the incorrect data had been copied, that prev_rec field would have been adjusted to the correct value:

      #0  mach_write_to_1 (b=0x4e0c5298806b "\a", n=7)
          at /data/Server/bb-10.6-MDEV-29374/storage/innobase/include/mach0data.inl:45
      #1  0x000055f5b96b4204 in rec_set_bit_field_1 (rec=0x4e0c52988070 "", val=7, 
          offs=5, mask=15, shift=0)
          at /data/Server/bb-10.6-MDEV-29374/storage/innobase/include/rem0rec.inl:159
      #2  0x000055f5b96c83ab in page_apply_insert_dynamic (block=..., reuse=false, 
          prev=0, shift=0, enc_hdr_l=23, hdr_c=0, data_c=9, data=0x4e0c52e6e807, 
          data_len=212)
          at /data/Server/bb-10.6-MDEV-29374/storage/innobase/page/page0cur.cc:2842
      (rr) frame 2
      #2  0x000055f5b96c83ab in page_apply_insert_dynamic (block=..., reuse=false, 
          prev=0, shift=0, enc_hdr_l=23, hdr_c=0, data_c=9, data=0x4e0c52e6e807, 
          data_len=212)
          at /data/Server/bb-10.6-MDEV-29374/storage/innobase/page/page0cur.cc:2842
      2842	  rec_set_bit_field_1(owner_rec, n_owned + 1, REC_NEW_N_OWNED,
      

      Offset 0x63 is where the page infimum pseudo-record is stored. Because an instant ADD/DROP COLUMN has been executed on this table, the record will not contain the string infimum but something else (NUL bytes followed by the header of the supremum record).

      I think that we must ‘pessimize’ the implementation of MDEV-21724 and never write log that would copy something from the infimum record to the first actual user record in the table.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.