[MDEV-31080] InnoDB: Failing assertion: space->size == check.size in fil_validate() during deferred tablespace recovery Created: 2023-04-19  Updated: 2023-04-19  Resolved: 2023-04-19

Status: Closed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1
Fix Version/s: 11.1.1, 10.11.3, 11.0.2, 10.6.13, 10.8.8, 10.9.6, 10.10.4

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: debug, race

Issue Links:
Blocks
blocks MDEV-29911 InnoDB recovery and mariadb-backup --... Closed

 Description   

While repeatedly running the test innodb.alter_copy to test MDEV-29911, I reproduced an assertion failure in fil_validate(). To make it more likely, apply this patch:

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index 365bb11c38f..a00709dab01 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -200,6 +200,8 @@ fil_system_t	fil_system;
 extern uint srv_fil_crypt_rotate_key_age;
 
 #ifdef UNIV_DEBUG
+#define fil_validate_skip fil_validate
+#if 0
 /** Try fil_validate() every this many times */
 # define FIL_VALIDATE_SKIP	17
 
@@ -218,6 +220,7 @@ fil_validate_skip(void)
 	check in debug builds. */
 	return (fil_validate_count++ % FIL_VALIDATE_SKIP) || fil_validate();
 }
+#endif
 #endif /* UNIV_DEBUG */
 
 /** Look up a tablespace.

I reproduced two rr replay traces that would be fixed by the following:

@@ -1140,8 +1144,10 @@ fil_space_t *recv_sys_t::recover_deferred(const recv_sys_t::map::iterator &p,
           uint32_t(file_size / fil_space_t::physical_size(flags));
         if (n_pages > size)
         {
+          mysql_mutex_lock(&fil_system.mutex);
           space->size= node->size= n_pages;
           space->set_committed_size();
+          mysql_mutex_unlock(&fil_system.mutex);
           goto size_set;
         }
       }

But, there is more to it. I think that it is best to continuously hold fil_system.mutex around the deferred tablespace creation, or in general, around the creation of a tablespace and adding the first (only) data file to it. With these fixes, the test innodb.alter_copy is stable in the 10.8 branch also after applying the MDEV-29911 changes.

I suspect that these changes should also fix the ut_a(fil_system.n_open == n_open) assertion failures in fil_validate() that have been observed starting with 10.6, occasionally when running the test innodb.table_definition_cache_debug.



 Comments   
Comment by Matthias Leich [ 2023-04-19 ]

origin/bb-10.6-MDEV-31080 2a61d6a701dc847f24267b2695a42475f2603c03 2023-04-19T12:12:54+03:00
+ debug.patch.diff + rollover.patch.diff
performed well in RQG testing.

Generated at Thu Feb 08 10:21:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.