[MDEV-20934] Infinite loop on innodb_fast_shutdown=0 with inconsistent change buffer Created: 2019-10-31  Updated: 2023-09-08  Resolved: 2019-11-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4
Fix Version/s: 10.2.29, 10.3.20, 10.4.10

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: corruption, shutdown

Attachments: Text File stack.txt    
Issue Links:
Problem/Incident
causes MDEV-23839 innodb_fast_shutdown=0 hang on change... Closed
causes MDEV-25783 CHECK TABLE harvests InnoDB: Index 'a... Closed
Relates
relates to MDEV-21069 Crash on DROP TABLE if the data file ... Closed
relates to MDEV-21152 Bogus debug assertion btr_pcur_is_aft... Closed
relates to MDEV-24449 Corruption of system tablespace or la... Closed
relates to MDEV-30009 InnoDB shutdown hangs when the change... Closed
relates to MDEV-32132 DROP INDEX followed by CREATE INDEX m... Closed
relates to MDEV-20864 Introduce debug option innodb_change_... Closed

 Description   

Due to a data corruption bug in the past (such as MySQL Bug #69122 InnoDB doesn't redo-log insert buffer merge operation if it is done in-place) it seems possible that the InnoDB change buffer ends up containing entries, while no buffered changes exist according to the change buffer bitmap pages in the .ibd files.

The logic on slow shutdown would proceed as follows:

  • ibuf_merge_pages() calls btr_pcur_open_at_rnd_pos(), which will find a change buffer leaf page
  • page numbers are read from the change buffer records
  • page reads requests will be posted
  • on read completion, ibuf_merge_or_delete_for_page() will be invoked
  • Alas, the bitmap page in the .ibd says that there are no buffered changes, and nothing will be done.
  • Because the ‘orphan’ records for the page were not deleted from the change buffer, this will keep looping.

To fix this, I think that we should change the following code in ibuf_merge_or_delete_for_page():

			if (!bitmap_bits) {
				/* No inserts buffered for this page */
 
				fil_space_release(space);
				return;
			}

Before returning, we should check if slow shutdown is in progress. If yes, we should attempt to delete any change buffer entries for page_id. We should not try this during normal operation, because it would cause a lot of unnecessary work.



 Comments   
Comment by Marko Mäkelä [ 2019-11-01 ]

I can repeat the infinite loop with the following test case change:

diff --git a/mysql-test/suite/innodb/r/ibuf_not_empty.result b/mysql-test/suite/innodb/r/ibuf_not_empty.result
index 2c898b8916d..7c61e74850b 100644
--- a/mysql-test/suite/innodb/r/ibuf_not_empty.result
+++ b/mysql-test/suite/innodb/r/ibuf_not_empty.result
@@ -22,4 +22,5 @@ check table t1;
 Table	Op	Msg_type	Msg_text
 test.t1	check	Warning	InnoDB: Index 'b' contains #### entries, should be 4096.
 test.t1	check	error	Corrupt
+SET GLOBAL innodb_fast_shutdown=0;
 DROP TABLE t1;
diff --git a/mysql-test/suite/innodb/t/ibuf_not_empty.test b/mysql-test/suite/innodb/t/ibuf_not_empty.test
index 9ee0b180f44..8b16d197e03 100644
--- a/mysql-test/suite/innodb/t/ibuf_not_empty.test
+++ b/mysql-test/suite/innodb/t/ibuf_not_empty.test
@@ -40,15 +40,43 @@ INSERT INTO t1 SELECT 0,b,c FROM t1;
 INSERT INTO t1 SELECT 0,b,c FROM t1;
 INSERT INTO t1 SELECT 0,b,c FROM t1;
 INSERT INTO t1 SELECT 0,b,c FROM t1;
+let MYSQLD_DATADIR=`select @@datadir`;
+let PAGE_SIZE=`select @@innodb_page_size`;
+
+--source include/shutdown_mysqld.inc
+
+# Corrupt the change buffer bitmap, to claim that pages are clean
+perl;
+do "$ENV{MTR_SUITE_DIR}/include/crc32.pl";
+my $file = "$ENV{MYSQLD_DATADIR}/test/t1.ibd";
+open(FILE, "+<$file") || die "Unable to open $file";
+binmode FILE;
+my $ps= $ENV{PAGE_SIZE};
+my $page;
+sysseek(FILE, $ps, 0) || die "Unable to seek $file\n";
+die "Unable to read $file" unless sysread(FILE, $page, $ps) == $ps;
+# Clean the change buffer bitmap.
+substr($page,38,$ps - 38 - 8) = chr(0) x ($ps - 38 - 8);
+my $polynomial = 0x82f63b78; # CRC-32C
+my $ck= pack("N",mycrc32(substr($page, 4, 22), 0, $polynomial) ^
+		 mycrc32(substr($page, 38, $ps - 38 - 8), 0, $polynomial));
+substr($page,0,4)=$ck;
+substr($page,$ps-8,4)=$ck;
+sysseek(FILE, $ps, 0) || die "Unable to rewind $file\n";
+syswrite(FILE, $page, $ps)==$ps || die "Unable to write $file\n";
+close(FILE) || die "Unable to close $file";
+EOF
 
 --let $restart_parameters= --innodb-force-recovery=6 --innodb-change-buffer-dump
---source include/restart_mysqld.inc
+--source include/start_mysqld.inc
 
 --replace_regex /contains \d+ entries/contains #### entries/
 check table t1;
 
 --let $restart_parameters=
 --source include/restart_mysqld.inc
+SET GLOBAL innodb_fast_shutdown=0;
+--source include/restart_mysqld.inc
 
 # Cleanup
 DROP TABLE t1;

The added slow shutdown right before the DROP TABLE would get into the infinite loop.

Comment by Marko Mäkelä [ 2019-11-16 ]

This fix was motivated by a support customer whose database hung on slow shutdown, because there were orphan entries in the change buffer. The reason for those orphan entries was not found, but I would tend to suspect that copying data files directly was involved.

In the end, this fix did not work in their environment. Dropping and rebuilding the affected tables did help them to shut down. I believe that a possible reason is that the change buffer contained buffered changes for pages that were beyond the current end of the user tablespace. I plan to test this and file a follow-up ticket if I can repeat the scenario.

Comment by Marko Mäkelä [ 2019-11-17 ]

I attempted to guess and reproduce the customer’s failure scenario (see MDEV-21069). While the slow shutdown did not hang for me, I got an assertion failure for an out-of-bounds page access during the change buffer merge. That is addressed in my MDEV-21069 patch.

Comment by Bernardo Perez [ 2020-04-22 ]

Hello Marko.

I seem to have a case that matches this behavior on one of the versions that in theory fixes this (10.3.20).

I've attached the stack.

Here there is some extra info:

Thread 4 and 5 seem to be both on ibuf_merge_or_delete_for_page.

Thread 4 seems to be a buffer dumping the buffer pool.
Thread 5 seems to be in the state you describe on this JIRA.

They seem to be working on a cluster index from change buffer.

(gdb) p cursor->index->table->space->name
$25 = 0x2b29f7036230 "innodb_system"

(gdb) p index->table->name
$26 =

{m_name = 0x2b2a367ff340 "innodb_change_buffer", static part_suffix = "#P#"}

(gdb) p index->name
$27 =

{m_name = 0x2b2a087db270 "CLUST_IND"}

I've printed dictionary operation locks and sys mutexes:

(gdb) print dict_operation_lock
$36 = {lock_word = 536870912, waiters = 0, sx_recursive = 0, writer_is_wait_ex = false, writer_thread = 0, event = 0x2b2c5deec8d0, wait_ex_event = 0x2b2c5deec940,
cfile_name = 0x55f9dbc4d608 "/local/p4clients/pkgbuild-9zvR5/workspace/src/RDSMariaDB/storage/innobase/dict/dict0dict.cc",
last_x_file_name = 0x55f9dbc41fe0 "/local/p4clients/pkgbuild-9zvR5/workspace/src/RDSMariaDB/storage/innobase/srv/srv0srv.cc", cline = 1097, is_block_lock = 0,
last_x_line = 2025, count_os_wait = 0, list =

{prev = 0x2b29f7027a78, next = 0x2b2a2c3f4120}

, pfs_psi = 0x2b2a0800d680}

(gdb) print dict_sys.mutex
$37 = {m_impl = {m_lock_word = 0, m_event = 0x2b2c5deec860, m_policy = {m_count =

{m_spins = 0, m_waits = 0, m_calls = 0, m_enabled = false}

, m_id = LATCH_ID_DICT_SYS}},
m_ptr = 0x2b29f70bca80}

Locks of the two threads:

thread 4
(gdb) p lock->waiters
$59 = 1

(gdb) p lock->event
$56 = (os_event_t) 0x2b2ac905a9b0

(gdb) p lock->wait_ex_event
$57 = (os_event_t) 0x2b2ac905aa20

thread 5
(gdb) p lock->waiters
$59 = 1

(gdb) p lock->event
$54 = (os_event_t) 0x2b2ac905a9b0

(gdb) p lock->wait_ex_event
$55 = (os_event_t) 0x2b2ac905aa20

Cam you confirm that this looks like the a similar issue?

Comment by Marko Mäkelä [ 2020-04-22 ]

In stack.txt, there could be a latch conflict between Thread 4 and Thread 5. Both are requesting a page of the change buffer from the system tablespace. Thread 5 is the InnoDB master thread doing change buffer merge and Thread 4 is loading a buffer pool dump. The hang (if it really is one, and not simply a case of the server being I/O bound) looks completely unrelated to this one. Also, it is no longer be possible in 10.5, because MDEV-19514 removed the background change buffer merge.

Bernardo Perez, can you please file a separate MDEV for your problem? I would need more information (including p page_id and p *block) in order to debug this.

Side note: dict_operation_lock.writer_thread=0 indicates that no thread is waiting for nor holding that latch in either X or SX mode. Only debug builds would keep track of S latch holders. Similarly, for mutexes there is not much information in non-debug builds. The os_event_t is actually a pointer; you could have done p *lock->event and so on.

Comment by Bernardo Perez [ 2020-04-22 ]

Sure I will. I have that information. I will come back to you and file a separate report.

Comment by Bernardo Perez [ 2020-04-22 ]

Created MDEV-22340

Comment by Marko Mäkelä [ 2020-05-07 ]

I found a likely cause of the scenario that caused change buffer merge to hang. In ibuf_insert_low() we update the change buffer bitmap in a separate mini-transaction, ahead of writing the data to the change buffer:

	/* Set the bitmap bit denoting that the insert buffer contains
	buffered entries for this index page, if the bit is not set yet */
 
	old_bit_value = ibuf_bitmap_page_get_bits(bitmap_page, page_no,
					IBUF_BITMAP_BUFFERED, &bitmap_mtr);
	if (!old_bit_value) {
		ibuf_bitmap_page_set_bits(bitmap_page, page_no,
				IBUF_BITMAP_BUFFERED, TRUE, &bitmap_mtr);
	}
 
	mtr_commit(&bitmap_mtr);

The above was introduced with the initial commit of InnoDB into MySQL 3.23.34. If the server is killed or a backup is finished between the logical time of the commit of bitmap_mtr and the subsequent mini-transaction commit that inserts the record into the change buffer, then we will have the bitmap page indicating that there exist unbuffered changes for a page, although none might actually exist.

I do not think that this non-atomicity can be fixed, so the change buffer merge will have to deal with this situation.

Comment by Bernardo Perez [ 2020-05-07 ]

Thanks a lot for the update Marko.

Does that update also refer to https://jira.mariadb.org/browse/MDEV-22340 ?

Thanks

Comment by Marko Mäkelä [ 2021-03-05 ]

I think that MDEV-24449 is a rather likely cause of corrupting not only the change buffer, but also the system tablespace and any secondary index leaf page in user tables.

Comment by Marko Mäkelä [ 2022-11-10 ]

Unfortunately, an attempt to fix this corruption caused further corruption related to the change buffer in MariaDB Server 10.5 or later; see MDEV-25783.

We recently reproduced this type of scenario in house, and we are working on a better fix.

Comment by Marko Mäkelä [ 2022-11-14 ]

I filed MDEV-30009 for a 10.5 regression where the slow shutdown hangs when this type of corruption is present.

Generated at Thu Feb 08 09:03:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.