[MDEV-26811] Assertion "log_sys.n_pending_flushes == 1" fails in undo_truncate test, on shutdown Created: 2021-10-12  Updated: 2021-10-13  Resolved: 2021-10-13

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Tests
Affects Version/s: 10.2, 10.3, 10.4
Fix Version/s: 10.2.41, 10.3.32, 10.4.22

Type: Bug Priority: Blocker
Reporter: Vladislav Vaintroub Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: not-10.5

Issue Links:
Problem/Incident
is caused by MDEV-26450 Corruption due to innodb_undo_log_tru... Closed

 Description   

http://buildbot.askmonty.org/buildbot/builders/winx64-packages/builds/27562/steps/test/logs/stdio

Looks like one thread is doing log_write_and_flush(), while another is doing log_write_flush_to_disk_low() . Also and third one is doing close_connections(), so perhaps this could be related to the recent sudden surge of crashes on shutdown in this specific test.



 Comments   
Comment by Marko Mäkelä [ 2021-10-13 ]

Thank you. In the stack traces, I see that we have log_checkpoint() executing log_write_flush_to_disk_low() where we have a similar assertion (which did not fail):

	ut_a(log_sys.n_pending_flushes == 1); /* No other threads here */
 
	bool	do_flush = srv_file_flush_method != SRV_O_DSYNC;
 
	if (do_flush) {
		fil_flush(SRV_LOG_SPACE_FIRST_ID);
	}
 
 
	log_mutex_enter();

The thread where the assertion failed in log_write_and_flush() is holding the mutex and thus blocking this thread.

It seems to me that the == 1 must be removed from both assertions. Invoking fil_flush() concurrently from multiple threads is safe, because it is protected by fil_system.mutex.

Generated at Thu Feb 08 09:48:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.