Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-33980

mariadb-backup --backup is missing retry logic for undo tablespaces

    XMLWordPrintable

Details

    Description

      mleich provided two rr replay traces where mariadb-backup --backup would fail as follows:

      10.6 ec7db2bdf849fc1a5bad906764920edda4121bd6

      2024-04-22 12:33:25 0 [ERROR] InnoDB: Checksum mismatch in the first page of file .//undo001
      2024-04-22 12:33:25 0 [ERROR] InnoDB: Unable to read first page of file .//undo001
      [00] 2024-04-22 12:33:25 merror: xb_load_tablespaces() failed with error Data structure corruption.
      

      It is obvious that the copy of the page that was read is a mix of two versions, because the least significant 32 bits of log sequence numbers at the start and the end of the page differ:

      0x000055cee647bbe4	595			if (crc32 != ut_crc32(read_buf,
      (rr) display/i $pc
      1: x/i $pc
      => 0x55cee647bbe4 <_Z21buf_page_is_corruptedbPKhm+620>:	cmp    %eax,%r13d
      (rr) i reg eax
      eax            0x3dc71f83          1036459907
      (rr) i reg r13d
      r13d           0xece2f286          -320671098
      (rr) p/x read_buf[16]@8
      $1 = {0x0, 0x0, 0x0, 0x0, 0x1, 0x29, 0xb3, 0x19}
      (rr) p/x read_buf[srv_page_size-8]@8
      $2 = {0x1, 0x1, 0x3, 0x2c, 0xec, 0xe2, 0xf2, 0x86}
      (rr) bt
      #0  0x000055cee647bbe4 in buf_page_is_corrupted (check_lsn=check_lsn@entry=false, read_buf=read_buf@entry=0x5c352ffe0000 "", fsp_flags=fsp_flags@entry=23)
          at /data/Server/10.6B/storage/innobase/buf/buf0buf.cc:595
      #1  0x000055cee672e0a4 in srv_undo_tablespace_open (create=create@entry=false, name=<optimized out>, name@entry=0x7fff9fd92ea0 ".//undo001", i=i@entry=0)
          at /data/Server/10.6B/storage/innobase/srv/srv0start.cc:537
      #2  0x000055cee6730322 in srv_all_undo_tablespaces_open (create_new_db=create_new_db@entry=false, n_undo=16) at /data/Server/10.6B/storage/innobase/srv/srv0start.cc:654
      #3  0x000055cee6730c7e in srv_undo_tablespaces_init (create_new_db=create_new_db@entry=false) at /data/Server/10.6B/storage/innobase/srv/srv0start.cc:739
      #4  0x000055cee5cf6f7f in xb_load_tablespaces () at /data/Server/10.6B/extra/mariabackup/xtrabackup.cc:4081
      #5  0x000055cee5d030a7 in xtrabackup_backup_func () at /data/Server/10.6B/extra/mariabackup/xtrabackup.cc:4861
      #6  0x000055cee5d03df5 in main_low (argv=0x55cee86d7650) at /data/Server/10.6B/extra/mariabackup/xtrabackup.cc:7156
      

      We do have retry logic for most other page reads; see the calls to buf_page_is_corrupted in fil_cur.cc. For the TRX_SYS page in xb_assign_undo_space_start() there is special handling of 5 reread attempts.

      Having to re-read pages in case they were concurrently written by the server that is being backed up is needed by the current design. A better design would be to have the server responsible for making backups (MDEV-14992). But, we need to fix this bug in GA releases, especially given that MDEV-29986 made multiple undo tablespaces the default.

      Attachments

        1. MB-1435.cfg
          46 kB
        2. MB-1435.yy
          0.3 kB

        Issue Links

          Activity

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.