Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31080

InnoDB: Failing assertion: space->size == check.size in fil_validate() during deferred tablespace recovery

    XMLWordPrintable

Details

    Description

      While repeatedly running the test innodb.alter_copy to test MDEV-29911, I reproduced an assertion failure in fil_validate(). To make it more likely, apply this patch:

      diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
      index 365bb11c38f..a00709dab01 100644
      --- a/storage/innobase/fil/fil0fil.cc
      +++ b/storage/innobase/fil/fil0fil.cc
      @@ -200,6 +200,8 @@ fil_system_t	fil_system;
       extern uint srv_fil_crypt_rotate_key_age;
       
       #ifdef UNIV_DEBUG
      +#define fil_validate_skip fil_validate
      +#if 0
       /** Try fil_validate() every this many times */
       # define FIL_VALIDATE_SKIP	17
       
      @@ -218,6 +220,7 @@ fil_validate_skip(void)
       	check in debug builds. */
       	return (fil_validate_count++ % FIL_VALIDATE_SKIP) || fil_validate();
       }
      +#endif
       #endif /* UNIV_DEBUG */
       
       /** Look up a tablespace.
      

      I reproduced two rr replay traces that would be fixed by the following:

      @@ -1140,8 +1144,10 @@ fil_space_t *recv_sys_t::recover_deferred(const recv_sys_t::map::iterator &p,
                 uint32_t(file_size / fil_space_t::physical_size(flags));
               if (n_pages > size)
               {
      +          mysql_mutex_lock(&fil_system.mutex);
                 space->size= node->size= n_pages;
                 space->set_committed_size();
      +          mysql_mutex_unlock(&fil_system.mutex);
                 goto size_set;
               }
             }
      

      But, there is more to it. I think that it is best to continuously hold fil_system.mutex around the deferred tablespace creation, or in general, around the creation of a tablespace and adding the first (only) data file to it. With these fixes, the test innodb.alter_copy is stable in the 10.8 branch also after applying the MDEV-29911 changes.

      I suspect that these changes should also fix the ut_a(fil_system.n_open == n_open) assertion failures in fil_validate() that have been observed starting with 10.6, occasionally when running the test innodb.table_definition_cache_debug.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.