Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-26326

MDEV-24626 (remove synchronous page0 write) seems to cause mariabackup to skip valid ibd file.

Details

    Description

      http://buildbot.askmonty.org/buildbot/builders/winx64-packages/builds/26370/steps/test/logs/stdio , shows that the file t.ibd is not copied by copy-back, thus it is not in the backup.

      A theory :
      An empty .ibd is not getting copied into backup, because it is not getting loaded by xb_load_single_table_tablespace . There is a test small_ibd.test, that actually checks that.
      Perhaps that does not work well anymore, and the file should be loaded and or copied or something like that.

      Attachments

        Issue Links

          Activity

            If an empty .ibd file is not being copied into a backup, then the backup will have to be adjusted somehow. Maybe during mariabackup --prepare, we should actually create data files in recv_sys_t::recover_deferred() if that is not already the case.

            Somewhat related to this, MDEV-25909 has been filed for avoiding unnecessary repeated reads of the first page of a data file. That could affect mariabackup --prepare as well.

            marko Marko Mäkelä added a comment - If an empty .ibd file is not being copied into a backup, then the backup will have to be adjusted somehow. Maybe during mariabackup --prepare , we should actually create data files in recv_sys_t::recover_deferred() if that is not already the case. Somewhat related to this, MDEV-25909 has been filed for avoiding unnecessary repeated reads of the first page of a data file. That could affect mariabackup --prepare as well.

            I just analyzed an rr replay trace of a failed mariabackup --prepare . An assertion would fail in my development branch due to buf_flush_discard_page() being invoked while deferred_spaces still has not been converted to tablespaces.

            I was unable to reproduce the race when restoring the same backup very many times. I think that a simple fix is as follows (possibly to be revised in MDEV-14481):

            diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc
            index 1f720989f6f..30f46e6024f 100644
            --- a/storage/innobase/buf/buf0flu.cc
            +++ b/storage/innobase/buf/buf0flu.cc
            @@ -1301,7 +1301,7 @@ static void buf_flush_LRU_list_batch(ulint max, flush_counters_t *n)
                     space= nullptr;
                   }
             
            -      if (!space)
            +      if (!space && !recv_recovery_is_on())
                     buf_flush_discard_page(bpage);
                   else if (neighbors && space->is_rotational())
                   {
             

            A cleaner fix could be to retain an additional X-latch on the page, instead of only retaining a buffer-fix. Then, no change to the page cleaner should be needed. But, this requires the page latch to always exist also for ROW_FORMAT=COMPRESSED blocks. (That is possible in my development branch.)

            marko Marko Mäkelä added a comment - I just analyzed an rr replay trace of a failed mariabackup --prepare . An assertion would fail in my development branch due to buf_flush_discard_page() being invoked while deferred_spaces still has not been converted to tablespaces. I was unable to reproduce the race when restoring the same backup very many times. I think that a simple fix is as follows (possibly to be revised in MDEV-14481 ): diff --git a/storage/innobase/buf/buf0flu.cc b/storage/innobase/buf/buf0flu.cc index 1f720989f6f..30f46e6024f 100644 --- a/storage/innobase/buf/buf0flu.cc +++ b/storage/innobase/buf/buf0flu.cc @@ -1301,7 +1301,7 @@ static void buf_flush_LRU_list_batch(ulint max, flush_counters_t *n) space= nullptr; } - if (!space) + if (!space && !recv_recovery_is_on()) buf_flush_discard_page(bpage); else if (neighbors && space->is_rotational()) { A cleaner fix could be to retain an additional X-latch on the page, instead of only retaining a buffer-fix. Then, no change to the page cleaner should be needed. But, this requires the page latch to always exist also for ROW_FORMAT=COMPRESSED blocks. (That is possible in my development branch.)

            The link in the description is stale, and no summary of the failure is available. On a quick search of the cross-reference, I found some failures of the test mariabackup.xb_compressed_encrypted that fail like this:

            10.6 48bbc447335a0b3ec698e975d48ad491

             
            mariabackup.xb_compressed_encrypted '16k,innodb' w3 [ fail ]
                    Test ended at 2021-09-17 07:00:03
             
            CURRENT_TEST: mariabackup.xb_compressed_encrypted
            /mnt/buildbot/build/mariadb-10.6.5/extra/mariabackup/mariabackup based on MariaDB server 10.6.5-MariaDB Linux (x86_64)
            …
            [00] 2021-09-17 07:00:02 completed OK!
            mysqltest: At line 29: query 'select sum(c1) from t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB
            

            In fact, mariabackup.xb_compressed_encrypted is the only mariabackup.* test that has with Tablespace is missing for table in the "Failure Output" section. There were a couple more tests failing for bb-10.6-MDEV-24626_1 like this.

            In MDEV-27424, I posted a rough mtr test case that could explain this scenario, and is indeed caused by MDEV-24626.

            marko Marko Mäkelä added a comment - The link in the description is stale, and no summary of the failure is available. On a quick search of the cross-reference, I found some failures of the test mariabackup.xb_compressed_encrypted that fail like this: 10.6 48bbc447335a0b3ec698e975d48ad491   mariabackup.xb_compressed_encrypted '16k,innodb' w3 [ fail ] Test ended at 2021-09-17 07:00:03   CURRENT_TEST: mariabackup.xb_compressed_encrypted /mnt/buildbot/build/mariadb-10.6.5/extra/mariabackup/mariabackup based on MariaDB server 10.6.5-MariaDB Linux (x86_64) … [00] 2021-09-17 07:00:02 completed OK! mysqltest: At line 29: query 'select sum(c1) from t1' failed: ER_GET_ERRNO (1030): Got error 194 "Tablespace is missing for a table" from storage engine InnoDB In fact, mariabackup.xb_compressed_encrypted is the only mariabackup.* test that has with Tablespace is missing for table in the "Failure Output" section. There were a couple more tests failing for bb-10.6- MDEV-24626 _1 like this. In MDEV-27424 , I posted a rough mtr test case that could explain this scenario, and is indeed caused by MDEV-24626 .

            This issue should be fixed by patch for MDEV-27424 (bb-10.6-thiru)

            thiru Thirunarayanan Balathandayuthapani added a comment - This issue should be fixed by patch for MDEV-27424 (bb-10.6-thiru)

            The fix looks OK to push. I would suggest to refer to this bug in the commit and let the later report MDEV-27424 remain open, because it is unclear which version that report was filed for.

            marko Marko Mäkelä added a comment - The fix looks OK to push. I would suggest to refer to this bug in the commit and let the later report MDEV-27424 remain open, because it is unclear which version that report was filed for.

            People

              thiru Thirunarayanan Balathandayuthapani
              wlad Vladislav Vaintroub
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.