[MDEV-24449] Corruption of system tablespace or last recovered page - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL)
Fix Version/s: 10.2.37, 10.3.28, 10.4.18
Component/s: Backup, mariabackup, Storage Engine - InnoDB
Labels:
- corruption
- not-10.5

Description

On Buildbot, we have a few cases of assertion failures in mariabackup --prepare, because buf_pool.n_pend_reads is not zero. mleich was able to reproduce the error under rr record, and I analyzed a case which is reported in ~~MDEV-24448~~.

In the analyzed trace, we are recovering 4 page reads. The read callback for 3 of the pages was finished (with no log to be applied, because the pages were up to date), but for the last page, the read callback was still in progress. Yet, srv_start() would concurrently initiate shutdown. We failed to wait for all asynchronous reads to complete. This omission could cause the failure to apply log to some data pages.

While this error started to occur on Buildbot after we replaced the InnoDB mutexes with native ones (~~MDEV-21452~~), I think that it is possible in 10.5, and probably in older versions as well, even though the code was heavily refactored in 10.5, starting with ~~MDEV-16264~~. The earliest known failure for the 10.6 main branch was as follows:

10.6 cf2480dd77a45cf56389015a1eabf567
mariabackup.xb_fulltext_encrypted w2 [ fail ]
Test ended at 2020-12-16 04:19:07

CURRENT_TEST: mariabackup.xb_fulltext_encrypted
mysqltest: At line 19: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup --innobackupex --apply-log /mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/2/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11
…
2020-12-16 4:19:05 0 [Note] InnoDB: Starting final batch to recover 7 pages from redo log.
mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.

I think that there are two causes to this bug in 10.5 and 10.6:

io_callback() was invoking read_slots->release() before invoking the callback function (for write callbacks the existing order is OK)
recv_sys_t::apply() is missing a call to read_slots->wait() before flushing the buffer pool. The recv_sys.apply(true) was called by srv_start(), so this should fix the observed race condition (~~MDEV-9663~~).

Attachments

Issue Links

relates to

MDEV-22373 Unable to find a record to delete-mark ends up crashing mysqld process after upgrading from 10.1.43 to 10.4

Closed

MDEV-25198 mariadb backup fails with signal 6 and signal 11 during prepare

Closed

MDEV-25344 Two hosts crashed at the same time with: Failing assertion: !cursor->index->is_committed()

Closed

MDEV-27734 Set innodb_change_buffering=none by default

Closed

MDEV-29694 Remove the InnoDB change buffer

Closed

MDEV-9663 InnoDB assertion failure: *cursor->index->name == TEMP_INDEX_PREFIX, or !cursor->index->is_committed()

Closed

MDEV-14643 InnoDB: Failing assertion: !cursor->index->is_committed()

Closed

MDEV-15608 Crash during transaction rollback when using optimistic parallel replication, few threads, non-durable configuration.

Closed

MDEV-16264 Implement a common work queue for InnoDB background tasks

Closed

MDEV-20934 Infinite loop on innodb_fast_shutdown=0 with inconsistent change buffer

Closed

MDEV-21452 Use condition variables and normal mutexes instead of InnoDB os_event and mutex

Closed

MDEV-21811 MariaDB not starting

Closed

MDEV-23192 Crash in row_search_mvcc() probably related to secondary index corruption

Closed

MDEV-24141 Failing assertion

Closed

MDEV-24448 mariabackup: storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed.

Closed

MDEV-24709 Assertion !recv_no_ibuf_operations failed in ibuf_page_low()

Closed

MDEV-30422 Merge new release of InnoDB 5.7.41 to 10.3

Closed

mentioned in: Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...; Page Loading...

(12 relates to, 7 mentioned in)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä created issue - 2020-12-18 17:53

Marko Mäkelä made changes - 2020-12-18 17:53

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-21452~~ [ ~~MDEV-21452~~ ]

Marko Mäkelä made changes - 2020-12-18 17:53

Link

This issue relates to ~~MDEV-16264~~ [ ~~MDEV-16264~~ ]

Marko Mäkelä made changes - 2020-12-18 17:54

Component/s		Storage Engine - InnoDB [ 10129 ]
Summary	mariabackup --prepare may fails to apply some log	Crash recovery or mariabackup --prepare may fail to apply some log

Marko Mäkelä made changes - 2020-12-18 17:59

Link

This issue is duplicated by ~~MDEV-24448~~ [ ~~MDEV-24448~~ ]

Marko Mäkelä added a comment - 2020-12-18 18:36

Before ~~MDEV-19586~~ was implemented in MariaDB Server 10.5.0, there was a counter recv_sys->n_addrs. This counter would be decremented at the end of recv_recover_page(), which is invoked by the read completion callback routine. In recv_apply_hashed_log_recs() (which was replaced by recv_sys_t::apply() in 10.5) there is no wait for pending reads to complete, but we do wait for recv_sys->n_addrs to reach 0, similar to how in recv_sys_t::apply() we waited for recv_sys.pages.empty() to hold.

It is possible that this is a false alarm (merely a debug assertion failure) with no actual potential for data corruption. In 10.2, recv_recover_page() will add the page to the flush list if any log was applied, before decrementing the recv_sys->n_addrs. In 10.5, recv_recover_page() will do the same before any of its caller invoke recv_sys.pages.erase().

Marko Mäkelä added a comment - 2020-12-18 18:36 Before MDEV-19586 was implemented in MariaDB Server 10.5.0, there was a counter recv_sys->n_addrs . This counter would be decremented at the end of recv_recover_page() , which is invoked by the read completion callback routine. In recv_apply_hashed_log_recs() (which was replaced by recv_sys_t::apply() in 10.5) there is no wait for pending reads to complete, but we do wait for recv_sys->n_addrs to reach 0, similar to how in recv_sys_t::apply() we waited for recv_sys.pages.empty() to hold. It is possible that this is a false alarm (merely a debug assertion failure) with no actual potential for data corruption. In 10.2, recv_recover_page() will add the page to the flush list if any log was applied, before decrementing the recv_sys->n_addrs . In 10.5, recv_recover_page() will do the same before any of its caller invoke recv_sys.pages.erase() .

Marko Mäkelä made changes - 2020-12-18 18:36

Labels

corruption rr-profile-analyzed

rr-profile-analyzed

Marko Mäkelä made changes - 2020-12-18 18:38

Status

Open [ 1 ]

In Progress [ 3 ]

Marko Mäkelä added a comment - 2020-12-19 13:09

I posted some more analysis to ~~MDEV-24448~~, where we will address the debug assertion failure that should be possible in 10.5 and 10.6, but on buildbot.askmonty.org has only occurred on 10.6 (after ~~MDEV-21452~~). I do not think that any corruption is possible in the 10.5 or 10.6 series, thanks to ~~MDEV-19514~~.

However, I do think that in earlier series, this race condition could cause the very last recovered page to be corrupted, if that page was a secondary index leaf page for which some changes exist in the change buffer. I hope that introducing some delay would repeat the problem:

diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc

index 9f3d743ada1..b6ad559f079 100644

--- a/storage/innobase/buf/buf0buf.cc

+++ b/storage/innobase/buf/buf0buf.cc

@@ -6133,6 +6133,7 @@ buf_page_io_complete(buf_page_t* bpage, bool dblwr, bool evict)

 		    && fil_page_get_type(frame) == FIL_PAGE_INDEX

 		    && page_is_leaf(frame)) {

+			/* This may be executed too late! */

 			ibuf_merge_or_delete_for_page(

 				(buf_block_t*) bpage, bpage->id,

 				bpage->size);

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc

index 4c3886caeaf..3bafd3f874c 100644

--- a/storage/innobase/log/log0recv.cc

+++ b/storage/innobase/log/log0recv.cc

@@ -2265,8 +2265,10 @@ void recv_recover_page(buf_page_t* bpage)

 	mtr.commit();

 func_exit:

+	const ulint n_addrs = recv_sys->n_addrs;

 	mutex_exit(&recv_sys->mutex);

 	ut_ad(mtr.has_committed());

+	if (!n_addrs) os_thread_sleep(100000);

 /** Reads in pages which have hashed log records, from an area around a given

Concurrently while the ibuf_merge_or_delete_for_page() is executing, recv_apply_hashed_log_recs(true) may complete. It seems that mariabackup --prepare would be able to exit without having properly written out the change buffer merge, in innobase_start_or_create_for_mysql():

			buf_flush_sync_all_buf_pools();

			err = fil_write_flushed_lsn(log_get_lsn());

			ut_ad(!buf_pool_check_no_pending_io());

			fil_close_log_files(true);

This might explain ~~MDEV-20934~~ as well as the ~~MDEV-9663~~ family of bugs.

I may of course be mistaken, but I hope that adding some sleep will improve the chances of reproducing the mystery corruption.

Marko Mäkelä added a comment - 2020-12-19 13:09 I posted some more analysis to MDEV-24448 , where we will address the debug assertion failure that should be possible in 10.5 and 10.6, but on buildbot.askmonty.org has only occurred on 10.6 (after MDEV-21452 ). I do not think that any corruption is possible in the 10.5 or 10.6 series, thanks to MDEV-19514 . However, I do think that in earlier series, this race condition could cause the very last recovered page to be corrupted, if that page was a secondary index leaf page for which some changes exist in the change buffer. I hope that introducing some delay would repeat the problem: diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc index 9f3d743ada1..b6ad559f079 100644 --- a/storage/innobase/buf/buf0buf.cc +++ b/storage/innobase/buf/buf0buf.cc @@ -6133,6 +6133,7 @@ buf_page_io_complete(buf_page_t* bpage, bool dblwr, bool evict) && fil_page_get_type(frame) == FIL_PAGE_INDEX && page_is_leaf(frame)) { + /* This may be executed too late! */ ibuf_merge_or_delete_for_page( (buf_block_t*) bpage, bpage->id, bpage->size); diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 4c3886caeaf..3bafd3f874c 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2265,8 +2265,10 @@ void recv_recover_page(buf_page_t* bpage) mtr.commit(); func_exit: + const ulint n_addrs = recv_sys->n_addrs; mutex_exit(&recv_sys->mutex); ut_ad(mtr.has_committed()); + if (!n_addrs) os_thread_sleep(100000); } /** Reads in pages which have hashed log records, from an area around a given Concurrently while the ibuf_merge_or_delete_for_page() is executing, recv_apply_hashed_log_recs(true) may complete. It seems that mariabackup --prepare would be able to exit without having properly written out the change buffer merge, in innobase_start_or_create_for_mysql() : buf_flush_sync_all_buf_pools(); err = fil_write_flushed_lsn(log_get_lsn()); ut_ad(!buf_pool_check_no_pending_io()); fil_close_log_files( true ); This might explain MDEV-20934 as well as the MDEV-9663 family of bugs. I may of course be mistaken, but I hope that adding some sleep will improve the chances of reproducing the mystery corruption.

Marko Mäkelä made changes - 2020-12-19 13:09

Link

This issue relates to ~~MDEV-9663~~ [ ~~MDEV-9663~~ ]

Marko Mäkelä made changes - 2020-12-19 13:09

Link

This issue relates to ~~MDEV-20934~~ [ ~~MDEV-20934~~ ]

Marko Mäkelä made changes - 2020-12-19 13:09

Fix Version/s		10.2 [ 14601 ]
Fix Version/s		10.3 [ 22126 ]
Fix Version/s		10.4 [ 22408 ]
Fix Version/s	10.5 [ 23123 ]
Fix Version/s	10.6 [ 24028 ]
Affects Version/s		5.5 [ 15800 ]
Affects Version/s		10.1 [ 16100 ]
Affects Version/s		10.2 [ 14601 ]
Affects Version/s		10.3 [ 22126 ]
Affects Version/s		10.4 [ 22408 ]
Affects Version/s		10.0 [ 16000 ]
Affects Version/s	10.5 [ 23123 ]
Affects Version/s	10.6 [ 24028 ]
Assignee	Marko Mäkelä [ marko ]	Thirunarayanan Balathandayuthapani [ thiru ]
Description	Note: I did not analyze this in older versions than 10.5. On Buildbot, we have a few cases of assertion failures in {{mariabackup --prepare}}, because {{buf_pool.n_pend_reads}} is not zero. [~mleich] was able to reproduce the error under {{rr record}}, and I analyzed a case. In the analyzed trace, we are recovering 4 page reads. The read callback for 3 of the pages was finished (with no log to be applied, because the pages were up to date), but for the last page, the read callback was still in progress. Yet, {{srv_start()}} would concurrently initiate shutdown. We failed to wait for all asynchronous reads to complete. This omission could cause the failure to apply log to some data pages. While this error started to occur on Buildbot after we replaced the InnoDB mutexes with native ones (~~MDEV-21452~~), I think that it is possible in 10.5, and probably in older versions as well, even though the code was heavily refactored in 10.5, starting with ~~MDEV-16264~~. The [earliest known failure for the 10.6 main branch\|http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/26092] was as follows: {noformat:title=10.6 cf2480dd77a45cf56389015a1eabf567} mariabackup.xb_fulltext_encrypted w2 [ fail ] Test ended at 2020-12-16 04:19:07 CURRENT_TEST: mariabackup.xb_fulltext_encrypted mysqltest: At line 19: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup --innobackupex --apply-log /mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/2/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11 … 2020-12-16 4:19:05 0 [Note] InnoDB: Starting final batch to recover 7 pages from redo log. mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed. {noformat} I think that there are two causes to this bug in 10.5 and 10.6: * {{io_callback()}} was invoking {{read_slots->release()}} before invoking the callback function (for write callbacks the existing order is OK) * {{recv_sys_t::apply()}} is missing a call to {{read_slots->wait()}} before flushing the buffer pool. The {{recv_sys.apply(true)}} was called by {{srv_start()}}, so this should fix the observed race condition.	On Buildbot, we have a few cases of assertion failures in {{mariabackup --prepare}}, because {{buf_pool.n_pend_reads}} is not zero. [~mleich] was able to reproduce the error under {{rr record}}, and I analyzed a case which is reported in ~~MDEV-24448~~. In the analyzed trace, we are recovering 4 page reads. The read callback for 3 of the pages was finished (with no log to be applied, because the pages were up to date), but for the last page, the read callback was still in progress. Yet, {{srv_start()}} would concurrently initiate shutdown. We failed to wait for all asynchronous reads to complete. This omission could cause the failure to apply log to some data pages. While this error started to occur on Buildbot after we replaced the InnoDB mutexes with native ones (~~MDEV-21452~~), I think that it is possible in 10.5, and probably in older versions as well, even though the code was heavily refactored in 10.5, starting with ~~MDEV-16264~~. The [earliest known failure for the 10.6 main branch\|http://buildbot.askmonty.org/buildbot/builders/kvm-fulltest2/builds/26092] was as follows: {noformat:title=10.6 cf2480dd77a45cf56389015a1eabf567} mariabackup.xb_fulltext_encrypted w2 [ fail ] Test ended at 2020-12-16 04:19:07 CURRENT_TEST: mariabackup.xb_fulltext_encrypted mysqltest: At line 19: exec of '/mnt/buildbot/build/mariadb-10.6.0/extra/mariabackup/mariabackup --innobackupex --apply-log /mnt/buildbot/build/mariadb-10.6.0/mysql-test/var/2/tmp/backup 2>&1' failed, error: 34304, status: 134, errno: 11 … 2020-12-16 4:19:05 0 [Note] InnoDB: Starting final batch to recover 7 pages from redo log. mariabackup: /home/buildbot/buildbot/build/mariadb-10.6.0/storage/innobase/srv/srv0start.cc:1582: dberr_t srv_start(bool): Assertion `!buf_pool.any_io_pending()' failed. {noformat} I think that there are two causes to this bug in 10.5 and 10.6: * {{io_callback()}} was invoking {{read_slots->release()}} before invoking the callback function (for write callbacks the existing order is OK) * {{recv_sys_t::apply()}} is missing a call to {{read_slots->wait()}} before flushing the buffer pool. The {{recv_sys.apply(true)}} was called by {{srv_start()}}, so this should fix the observed race condition (~~MDEV-9663~~).
Labels	rr-profile-analyzed	corruption not-10.5
Summary	Crash recovery or mariabackup --prepare may fail to apply some log	Potential corruption of last recovered page

Marko Mäkelä made changes - 2020-12-19 13:17

Link

This issue relates to ~~MDEV-22373~~ [ ~~MDEV-22373~~ ]

Marko Mäkelä made changes - 2020-12-21 09:14

Link

This issue relates to ~~MDEV-24448~~ [ ~~MDEV-24448~~ ]

Marko Mäkelä made changes - 2020-12-21 09:15

Link

This issue is duplicated by ~~MDEV-24448~~ [ ~~MDEV-24448~~ ]

Marko Mäkelä added a comment - 2020-12-21 13:32

I hope that something like this would reproduce the ~~MDEV-9663~~ family of corruption, in a DML-only workload with Mariabackup, using tables that have many secondary indexes. The patch is for 10.2, and I think it should be similar or identical on 10.3 or 10.4 as well:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc

--- a/storage/innobase/log/log0recv.cc

+++ b/storage/innobase/log/log0recv.cc

@@ -2265,8 +2265,10 @@ void recv_recover_page(buf_page_t* bpage)

 	mtr.commit();

 func_exit:

+	const ulint n_addrs = recv_sys->n_addrs;

 	mutex_exit(&recv_sys->mutex);

 	ut_ad(mtr.has_committed());

+	if (!n_addrs) os_thread_sleep(1000);

 /** Reads in pages which have hashed log records, from an area around a given

@@ -2407,7 +2409,7 @@ void recv_apply_hashed_log_recs(bool last_batch)

 			return;

-		os_thread_sleep(500000);

+		os_thread_sleep(1000);

 		mutex_enter(&recv_sys->mutex);

@@ -2516,7 +2518,7 @@ void recv_apply_hashed_log_recs(bool last_batch)

 			return;

-		os_thread_sleep(500000);

+		os_thread_sleep(1000);

 		mutex_enter(&(recv_sys->mutex));

I believe that the sleeps have to be of similar duration so that there is more chance of interleaving. It might also be that some or all of these should be changed to os_thread_yield() instead.

Marko Mäkelä added a comment - 2020-12-21 13:32 I hope that something like this would reproduce the MDEV-9663 family of corruption, in a DML-only workload with Mariabackup, using tables that have many secondary indexes. The patch is for 10.2, and I think it should be similar or identical on 10.3 or 10.4 as well: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2265,8 +2265,10 @@ void recv_recover_page(buf_page_t* bpage) mtr.commit(); func_exit: + const ulint n_addrs = recv_sys->n_addrs; mutex_exit(&recv_sys->mutex); ut_ad(mtr.has_committed()); + if (!n_addrs) os_thread_sleep(1000); } /** Reads in pages which have hashed log records, from an area around a given @@ -2407,7 +2409,7 @@ void recv_apply_hashed_log_recs(bool last_batch) return; } - os_thread_sleep(500000); + os_thread_sleep(1000); mutex_enter(&recv_sys->mutex); } @@ -2516,7 +2518,7 @@ void recv_apply_hashed_log_recs(bool last_batch) return; } - os_thread_sleep(500000); + os_thread_sleep(1000); mutex_enter(&(recv_sys->mutex)); } I believe that the sleeps have to be of similar duration so that there is more chance of interleaving. It might also be that some or all of these should be changed to os_thread_yield() instead.

Marko Mäkelä added a comment - 2020-12-21 13:44

I think that the following should fix the suspected bug. This patch is basically a 10.2 (or pre-10.5) version of the simplest thinkable fix of ~~MDEV-24448~~:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc

index 4c3886caeaf..95179ec2271 100644

--- a/storage/innobase/log/log0recv.cc

+++ b/storage/innobase/log/log0recv.cc

@@ -2501,7 +2501,7 @@ void recv_apply_hashed_log_recs(bool last_batch)

 	/* Wait until all the pages have been processed */

-	while (recv_sys->n_addrs != 0) {

+	while (recv_sys->n_addrs || buf_get_n_pending_read_ios()) {

 		const bool abort = recv_sys->found_corrupt_log

 			|| recv_sys->found_corrupt_fs;

Marko Mäkelä added a comment - 2020-12-21 13:44 I think that the following should fix the suspected bug. This patch is basically a 10.2 (or pre-10.5) version of the simplest thinkable fix of MDEV-24448 : diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 4c3886caeaf..95179ec2271 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2501,7 +2501,7 @@ void recv_apply_hashed_log_recs(bool last_batch) /* Wait until all the pages have been processed */ - while (recv_sys->n_addrs != 0) { + while (recv_sys->n_addrs || buf_get_n_pending_read_ios()) { const bool abort = recv_sys->found_corrupt_log || recv_sys->found_corrupt_fs;

Marko Mäkelä added a comment - 2020-12-21 18:20

One more note: I think that the test to reproduce this must involve

SET GLOBAL innodb_change_buffering_debug=1;

SET GLOBAL innodb_limit_optimistic_insert_debug=1;

and an INSERT in to a table with many secondary indexes while mariabackup --backup is running. After the restore, we must execute CHECK TABLE. It could be useful to have a loop inside the test, with a backup, prepare, restore steps inside the loop, so that the error has a chance of accumulating. Either the secondary index or the change buffer should hopefully become corrupted.

Instead of an INSERT, it could be even better to have something like UPDATE t SET indexed_column=indexed_column+1, so that the number of rows in the table will remain constant, while the secondary index will make extensive use of the change buffer.

Marko Mäkelä added a comment - 2020-12-21 18:20 One more note: I think that the test to reproduce this must involve SET GLOBAL innodb_change_buffering_debug=1; SET GLOBAL innodb_limit_optimistic_insert_debug=1; and an INSERT in to a table with many secondary indexes while mariabackup --backup is running. After the restore, we must execute CHECK TABLE . It could be useful to have a loop inside the test, with a backup, prepare, restore steps inside the loop, so that the error has a chance of accumulating. Either the secondary index or the change buffer should hopefully become corrupted. Instead of an INSERT , it could be even better to have something like UPDATE t SET indexed_column=indexed_column+1 , so that the number of rows in the table will remain constant, while the secondary index will make extensive use of the change buffer.

Marko Mäkelä made changes - 2020-12-21 19:59

Status

In Progress [ 3 ]

Stalled [ 10000 ]

Marko Mäkelä added a comment - 2020-12-21 20:01

I will try to repeat this after returning to work in January. I am confident that the one-liner fix will work. The only challenge should be to repeat the corruption.

Marko Mäkelä added a comment - 2020-12-21 20:01 I will try to repeat this after returning to work in January. I am confident that the one-liner fix will work. The only challenge should be to repeat the corruption.

Marko Mäkelä made changes - 2020-12-21 20:01

Assignee

Thirunarayanan Balathandayuthapani [ thiru ]

Marko Mäkelä [ marko ]

Marko Mäkelä made changes - 2020-12-28 09:41

Link

This issue relates to ~~MDEV-15608~~ [ ~~MDEV-15608~~ ]

Marko Mäkelä added a comment - 2020-12-28 09:45

I made some success with a modified test and the above patch that adds os_thread_sleep():

10.2 8e3e87d2fc1e63d287f203d441dcb9360775c6b7
2020-12-28 11:05:32 140463661520832 [Note] InnoDB: Starting a batch to recover 30 pages from redo log.
2020-12-28 11:05:32 140463661520832 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=61] still fixed or dirty
…
#4 0x00007fc03ecb9537 in __GI_abort () at abort.c:79
#5 0x000056215f5bf1aa in ib::fatal::~fatal (this=0x7ffdff0614e0) at /mariadb/10.2o/storage/innobase/ut/ut0ut.cc:649
#6 0x000056215f62f162 in buf_all_freed_instance (buf_pool=buf_pool@entry=0x56216119d820) at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6238
#7 0x000056215f62c29b in buf_pool_invalidate_instance (buf_pool=0x56216119d820) at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6295
#8 buf_pool_invalidate () at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6326
#9 0x000056215f46f2da in recv_apply_hashed_log_recs (last_batch=<optimized out>) at /mariadb/10.2o/storage/innobase/log/log0recv.cc:2545
#10 0x000056215f473691 in recv_group_scan_log_recs (group=group@entry=0x56216123a450, checkpoint_lsn=3976126, contiguous_lsn=contiguous_lsn@entry=0x7ffdff062160, last_phase=true) at /mariadb/10.2o/storage/innobase/log/log0recv.cc:3492

This is not directly proving ~~MDEV-9663~~, but I would consider it as indirect proof. It occurred on the 298th run of the test, which was modified as follows:

diff --git a/mysql-test/suite/innodb/r/ibuf_not_empty.result b/mysql-test/suite/innodb/r/ibuf_not_empty.result

index 7c61e74850b..6c6ad3bf7d5 100644

--- a/mysql-test/suite/innodb/r/ibuf_not_empty.result

+++ b/mysql-test/suite/innodb/r/ibuf_not_empty.result

@@ -20,7 +20,5 @@ INSERT INTO t1 SELECT 0,b,c FROM t1;

 INSERT INTO t1 SELECT 0,b,c FROM t1;

 check table t1;

 Table	Op	Msg_type	Msg_text

-test.t1	check	Warning	InnoDB: Index 'b' contains #### entries, should be 4096.

-test.t1	check	error	Corrupt

-SET GLOBAL innodb_fast_shutdown=0;

+test.t1	check	status	OK

 DROP TABLE t1;

diff --git a/mysql-test/suite/innodb/t/ibuf_not_empty.test b/mysql-test/suite/innodb/t/ibuf_not_empty.test

index a5093892b35..5f7c2b24b7d 100644

--- a/mysql-test/suite/innodb/t/ibuf_not_empty.test

+++ b/mysql-test/suite/innodb/t/ibuf_not_empty.test

@@ -43,64 +43,11 @@ INSERT INTO t1 SELECT 0,b,c FROM t1;

 let MYSQLD_DATADIR=`select @@datadir`;

 let PAGE_SIZE=`select @@innodb_page_size`;

---source include/shutdown_mysqld.inc

-# Corrupt the change buffer bitmap, to claim that pages are clean

-perl;

-do "$ENV{MTR_SUITE_DIR}/include/crc32.pl";

-my $file = "$ENV{MYSQLD_DATADIR}/test/t1.ibd";

-open(FILE, "+<$file") || die "Unable to open $file";

-binmode FILE;

-my $ps= $ENV{PAGE_SIZE};

-my $page;

-sysseek(FILE, $ps, 0) || die "Unable to seek $file\n";

-die "Unable to read $file" unless sysread(FILE, $page, $ps) == $ps;

-# Clean the change buffer bitmap.

-substr($page,38,$ps - 38 - 8) = chr(0) x ($ps - 38 - 8);

-my $polynomial = 0x82f63b78; # CRC-32C

-my $ck= pack("N",mycrc32(substr($page, 4, 22), 0, $polynomial) ^

-		 mycrc32(substr($page, 38, $ps - 38 - 8), 0, $polynomial));

-substr($page,0,4)=$ck;

-substr($page,$ps-8,4)=$ck;

-sysseek(FILE, $ps, 0) || die "Unable to rewind $file\n";

-syswrite(FILE, $page, $ps)==$ps || die "Unable to write $file\n";

-close(FILE) || die "Unable to close $file";

-EOF

---let $restart_parameters= --innodb-force-recovery=6 --innodb-change-buffer-dump

---source include/start_mysqld.inc

+--let $shutdown_timeout=0

+--source include/restart_mysqld.inc

+--let $shutdown_timeout=

---replace_regex /contains \d+ entries/contains #### entries/

 check table t1;

---source include/shutdown_mysqld.inc

-# Truncate the file to 5 pages, as if it were empty

-perl;

-do "$ENV{MTR_SUITE_DIR}/include/crc32.pl";

-my $file = "$ENV{MYSQLD_DATADIR}/test/t1.ibd";

-open(FILE, "+<$file") || die "Unable to open $file";

-binmode FILE;

-my $ps= $ENV{PAGE_SIZE};

-my $pages=5;

-my $page;

-die "Unable to read $file" unless sysread(FILE, $page, $ps) == $ps;

-substr($page,46,4)=pack("N", $pages);

-my $polynomial = 0x82f63b78; # CRC-32C

-my $ck= pack("N",mycrc32(substr($page, 4, 22), 0, $polynomial) ^

-		 mycrc32(substr($page, 38, $ps - 38 - 8), 0, $polynomial));

-substr($page,0,4)=$ck;

-substr($page,$ps-8,4)=$ck;

-sysseek(FILE, 0, 0) || die "Unable to rewind $file\n";

-syswrite(FILE, $page, $ps)==$ps || die "Unable to write $file\n";

-truncate(FILE, $ps * $pages);

-close(FILE) || die "Unable to close $file";

-EOF

---let $restart_parameters=

---source include/start_mysqld.inc

-SET GLOBAL innodb_fast_shutdown=0;

---source include/restart_mysqld.inc

 # Cleanup

 DROP TABLE t1;

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc

index 4c3886caeaf..fde96575c59 100644

--- a/storage/innobase/log/log0recv.cc

+++ b/storage/innobase/log/log0recv.cc

@@ -2265,8 +2265,10 @@ void recv_recover_page(buf_page_t* bpage)

 	mtr.commit();

 func_exit:

+	const ulint n_addrs = recv_sys->n_addrs;

 	mutex_exit(&recv_sys->mutex);

 	ut_ad(mtr.has_committed());

+	if (!n_addrs) os_thread_sleep(1000);

 /** Reads in pages which have hashed log records, from an area around a given

@@ -2407,7 +2409,7 @@ void recv_apply_hashed_log_recs(bool last_batch)

 			return;

-		os_thread_sleep(500000);

+		os_thread_sleep(1000);

 		mutex_enter(&recv_sys->mutex);

@@ -2516,7 +2518,7 @@ void recv_apply_hashed_log_recs(bool last_batch)

 			return;

-		os_thread_sleep(500000);

+		os_thread_sleep(1000);

 		mutex_enter(&(recv_sys->mutex));

On a second try, it occurred on the 362nd attempt:

2020-12-28 11:09:12 140184457193408 [Note] InnoDB: Starting a batch to recover 30 pages from redo log.

2020-12-28 11:09:12 140184457193408 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=319] still fixed or dirty

On a third try, it failed on the 371st round:

2020-12-28 11:11:29 140563794130880 [Note] InnoDB: Starting a batch to recover 35 pages from redo log.

2020-12-28 11:11:29 140563794130880 [Note] InnoDB: Starting a batch to recover 38 pages from redo log.

2020-12-28 11:11:29 140563794130880 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=331] still fixed or dirty

201228 11:11:29 [ERROR] mysqld got signal 6 ;

With the 1-line fix additionally applied, 2,000 rounds of the test would pass:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc

index 4c3886caeaf..95179ec2271 100644

--- a/storage/innobase/log/log0recv.cc

+++ b/storage/innobase/log/log0recv.cc

@@ -2501,7 +2501,7 @@ void recv_apply_hashed_log_recs(bool last_batch)

 	/* Wait until all the pages have been processed */

-	while (recv_sys->n_addrs != 0) {

+	while (recv_sys->n_addrs || buf_get_n_pending_read_ios()) {

 		const bool abort = recv_sys->found_corrupt_log

 			|| recv_sys->found_corrupt_fs;

A search for "still fixed or dirty" turns up ~~MDEV-20481~~, where the message refers to page number 3 (clustered index root page), which cannot possibly be related to this bug, because the change buffer is only ever applied to secondary index pages (and never a root page).

I made one more try to reproduce a different failure, by reverting the above fix and applying one more patch to disable two assertions:

diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc

index 9f3d743ada1..a86a9b3da51 100644

--- a/storage/innobase/buf/buf0buf.cc

+++ b/storage/innobase/buf/buf0buf.cc

@@ -1698,6 +1698,7 @@ buf_pool_contains_zip(

 #endif /* UNIV_DEBUG */

+#if 0

 /*********************************************************************//**

 Checks that all file pages in the buffer chunk are in a replaceable state.

 @return address of a non-free block, or NULL if all freed */

@@ -1761,6 +1762,7 @@ buf_chunk_not_freed(

 	return(NULL);

+#endif

 /********************************************************************//**

 Set buffer pool size variables after resizing it */

@@ -6223,6 +6225,7 @@ buf_all_freed_instance(

 /*===================*/

 	buf_pool_t*	buf_pool)	/*!< in: buffer pool instancce */

+#if 0

 	ulint		i;

 	buf_chunk_t*	chunk;

@@ -6241,6 +6244,7 @@ buf_all_freed_instance(

 	buf_pool_mutex_exit(buf_pool);

+#endif

 	return(TRUE);

This would cause an assertion failure a little later:

2020-12-28 11:23:12 140594869794752 [Note] InnoDB: Starting a batch to recover 35 pages from redo log.

mysqld: /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6306: void buf_pool_invalidate_instance(buf_pool_t *): Assertion `(buf_pool->LRU).count == 0' failed.

Let me ”fix” that one as well:

@@ -6299,6 +6303,7 @@ buf_pool_invalidate_instance(

 	while (buf_LRU_scan_and_free_block(buf_pool, true)) {

+	UT_LIST_INIT(buf_pool->LRU, &buf_page_t::LRU);

 	ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0);

 	ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0);

And it fails differently again:

#5  0x0000558f6dd73fa2 in ut_dbg_assertion_failed (expr=0x558f6e1f1cb5 "UT_LIST_GET_LEN(buf_pool->LRU) == n_lru", file=<optimized out>, line=<optimized out>, line@entry=6542) at /mariadb/10.2o/storage/innobase/ut/ut0dbg.cc:60

#6  0x0000558f6dde4044 in buf_pool_validate_instance (buf_pool=<optimized out>) at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6542

#7  0x0000558f6dddbc59 in buf_validate () at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6577

#8  buf_page_optimistic_get (rw_latch=<optimized out>, block=block@entry=0x7fcd55abd580, modify_clock=modify_clock@entry=1, file=0x558f6e1ec05b "/mariadb/10.2o/storage/innobase/btr/btr0pcur.cc", line=line@entry=217, mtr=<optimized out>) at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:4926

“Fix”:

@@ -6562,6 +6567,7 @@ ibool

 buf_validate(void)

 /*==============*/

+#if 0

 	ulint	i;

 	for (i = 0; i < srv_buf_pool_instances; i++) {

@@ -6571,6 +6577,7 @@ buf_validate(void)

 		buf_pool_validate_instance(buf_pool);

+#endif

 	return(TRUE);

With those debug checks disabled, we are finally getting something that looks like ‘random’ corruption:

2020-12-28 11:29:07 139847306946496 [Note] InnoDB: Starting final batch to recover 28 pages from redo log.

2020-12-28 11:29:07 139847306946496 [Note] InnoDB: Cleaning up trx with id 0x291f

2020-12-28 11:29:07 0x7f30bd0da7c0  InnoDB: Assertion failure in file /mariadb/10.2o/storage/innobase/fsp/fsp0fsp.cc line 3197

InnoDB: Failing assertion: xdes_mtr_get_bit(descr, XDES_FREE_BIT, header_page % FSP_EXTENT_SIZE, mtr) == FALSE

…

#5  0x000055ee3dd9efa2 in ut_dbg_assertion_failed (expr=0x55ee3e22faf6 "xdes_mtr_get_bit(descr, XDES_FREE_BIT, header_page % FSP_EXTENT_SIZE, mtr) == FALSE", file=<optimized out>, line=<optimized out>, line@entry=3197) at /mariadb/10.2o/storage/innobase/ut/ut0dbg.cc:60

#6  0x000055ee3dea7caa in fseg_free_step (header=0x7f30b3ad403c "", mtr=mtr@entry=0x7ffdf24b56b8) at /mariadb/10.2o/storage/innobase/fsp/fsp0fsp.cc:3196

#7  0x000055ee3dd9cd5e in trx_undo_seg_free (undo=<optimized out>, noredo=<optimized out>) at /mariadb/10.2o/storage/innobase/trx/trx0undo.cc:1095

#8  trx_undo_commit_cleanup (undo=<optimized out>, is_temp=<optimized out>) at /mariadb/10.2o/storage/innobase/trx/trx0undo.cc:1789

#9  0x000055ee3dd8e7b5 in trx_cleanup_at_db_startup (trx=trx@entry=0x7f30b8171120) at /mariadb/10.2o/storage/innobase/trx/trx0trx.cc:1999

#10 0x000055ee3dd7f0ce in trx_rollback_resurrected (trx=<optimized out>, all=<optimized out>) at /mariadb/10.2o/storage/innobase/trx/trx0roll.cc:703

#11 trx_rollback_or_clean_recovered (all=<optimized out>, all@entry=0) at /mariadb/10.2o/storage/innobase/trx/trx0roll.cc:838

#12 0x000055ee3dd4b7f9 in innobase_start_or_create_for_mysql () at /mariadb/10.2o/storage/innobase/srv/srv0start.cc:2468

#13 0x000055ee3dbd90da in innobase_init (p=<optimized out>) at /mariadb/10.2o/storage/innobase/handler/ha_innodb.cc:4290

This corruption, which occurred on the 410th run of the test, appears to be related to an undo log page, not a secondary index or change buffer page. But, we must keep in mind that both undo log and change buffer pages are being allocated from the system tablespace in this test.

With the same settings, I repeated this corruption on the 460th run of the test:

2020-12-28 11:35:06 139865104271296 [Note] InnoDB: Starting a batch to recover 31 pages from redo log.

2020-12-28 11:35:06 139865104271296 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2020-12-28 11:35:06 139865104271296 [Note] InnoDB: Trx id counter is 13056

2020-12-28 11:35:06 139865104271296 [Note] InnoDB: Starting final batch to recover 28 pages from redo log.

2020-12-28 11:35:06 139865104271296 [Note] InnoDB: Cleaning up trx with id 0x311f

2020-12-28 11:35:06 0x7f34e1db47c0  InnoDB: Assertion failure in file /mariadb/10.2o/storage/innobase/fsp/fsp0fsp.cc line 3197

InnoDB: Failing assertion: xdes_mtr_get_bit(descr, XDES_FREE_BIT, header_page % FSP_EXTENT_SIZE, mtr) == FALSE

I am afraid that this is the closest I can get to repeating a ~~MDEV-9663~~ style corruption. Again, if I apply the 1-line fix, 2,000 rounds of the tests would succeed without any failures.

Results might be different if I had used real storage instead of a RAM disk, or if I enabled encryption slow down page I/O. I believe that the corruption that I repeated (a data page that is expected to be allocated is marked free for reuse) could explain ~~MDEV-15608~~: a page that is supposed to be an undo log page might also be used for the change buffer, and corruption would ensue. Without having access to page dumps we of course can only make educated guesses.

Marko Mäkelä added a comment - 2020-12-28 09:45 I made some success with a modified test and the above patch that adds os_thread_sleep() : 10.2 8e3e87d2fc1e63d287f203d441dcb9360775c6b7 2020-12-28 11:05:32 140463661520832 [Note] InnoDB: Starting a batch to recover 30 pages from redo log. 2020-12-28 11:05:32 140463661520832 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=61] still fixed or dirty … #4 0x00007fc03ecb9537 in __GI_abort () at abort.c:79 #5 0x000056215f5bf1aa in ib::fatal::~fatal (this=0x7ffdff0614e0) at /mariadb/10.2o/storage/innobase/ut/ut0ut.cc:649 #6 0x000056215f62f162 in buf_all_freed_instance (buf_pool=buf_pool@entry=0x56216119d820) at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6238 #7 0x000056215f62c29b in buf_pool_invalidate_instance (buf_pool=0x56216119d820) at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6295 #8 buf_pool_invalidate () at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6326 #9 0x000056215f46f2da in recv_apply_hashed_log_recs (last_batch=<optimized out>) at /mariadb/10.2o/storage/innobase/log/log0recv.cc:2545 #10 0x000056215f473691 in recv_group_scan_log_recs (group=group@entry=0x56216123a450, checkpoint_lsn=3976126, contiguous_lsn=contiguous_lsn@entry=0x7ffdff062160, last_phase=true) at /mariadb/10.2o/storage/innobase/log/log0recv.cc:3492 This is not directly proving MDEV-9663 , but I would consider it as indirect proof. It occurred on the 298th run of the test, which was modified as follows: diff --git a/mysql-test/suite/innodb/r/ibuf_not_empty.result b/mysql-test/suite/innodb/r/ibuf_not_empty.result index 7c61e74850b..6c6ad3bf7d5 100644 --- a/mysql-test/suite/innodb/r/ibuf_not_empty.result +++ b/mysql-test/suite/innodb/r/ibuf_not_empty.result @@ -20,7 +20,5 @@ INSERT INTO t1 SELECT 0,b,c FROM t1; INSERT INTO t1 SELECT 0,b,c FROM t1; check table t1; Table Op Msg_type Msg_text -test.t1 check Warning InnoDB: Index 'b' contains #### entries, should be 4096. -test.t1 check error Corrupt -SET GLOBAL innodb_fast_shutdown=0; +test.t1 check status OK DROP TABLE t1; diff --git a/mysql-test/suite/innodb/t/ibuf_not_empty.test b/mysql-test/suite/innodb/t/ibuf_not_empty.test index a5093892b35..5f7c2b24b7d 100644 --- a/mysql-test/suite/innodb/t/ibuf_not_empty.test +++ b/mysql-test/suite/innodb/t/ibuf_not_empty.test @@ -43,64 +43,11 @@ INSERT INTO t1 SELECT 0,b,c FROM t1; let MYSQLD_DATADIR=`select @@datadir`; let PAGE_SIZE=`select @@innodb_page_size`; ---source include/shutdown_mysqld.inc - -# Corrupt the change buffer bitmap, to claim that pages are clean -perl; -do "$ENV{MTR_SUITE_DIR}/include/crc32.pl"; -my $file = "$ENV{MYSQLD_DATADIR}/test/t1.ibd"; -open(FILE, "+<$file") || die "Unable to open $file"; -binmode FILE; -my $ps= $ENV{PAGE_SIZE}; -my $page; -sysseek(FILE, $ps, 0) || die "Unable to seek $file\n"; -die "Unable to read $file" unless sysread(FILE, $page, $ps) == $ps; -# Clean the change buffer bitmap. -substr($page,38,$ps - 38 - 8) = chr(0) x ($ps - 38 - 8); -my $polynomial = 0x82f63b78; # CRC-32C -my $ck= pack("N",mycrc32(substr($page, 4, 22), 0, $polynomial) ^ - mycrc32(substr($page, 38, $ps - 38 - 8), 0, $polynomial)); -substr($page,0,4)=$ck; -substr($page,$ps-8,4)=$ck; -sysseek(FILE, $ps, 0) || die "Unable to rewind $file\n"; -syswrite(FILE, $page, $ps)==$ps || die "Unable to write $file\n"; -close(FILE) || die "Unable to close $file"; -EOF - ---let $restart_parameters= --innodb-force-recovery=6 --innodb-change-buffer-dump ---source include/start_mysqld.inc +--let $shutdown_timeout=0 +--source include/restart_mysqld.inc +--let $shutdown_timeout= ---replace_regex /contains \d+ entries/contains #### entries/ check table t1; ---source include/shutdown_mysqld.inc - -# Truncate the file to 5 pages, as if it were empty -perl; -do "$ENV{MTR_SUITE_DIR}/include/crc32.pl"; -my $file = "$ENV{MYSQLD_DATADIR}/test/t1.ibd"; -open(FILE, "+<$file") || die "Unable to open $file"; -binmode FILE; -my $ps= $ENV{PAGE_SIZE}; -my $pages=5; -my $page; -die "Unable to read $file" unless sysread(FILE, $page, $ps) == $ps; -substr($page,46,4)=pack("N", $pages); -my $polynomial = 0x82f63b78; # CRC-32C -my $ck= pack("N",mycrc32(substr($page, 4, 22), 0, $polynomial) ^ - mycrc32(substr($page, 38, $ps - 38 - 8), 0, $polynomial)); -substr($page,0,4)=$ck; -substr($page,$ps-8,4)=$ck; -sysseek(FILE, 0, 0) || die "Unable to rewind $file\n"; -syswrite(FILE, $page, $ps)==$ps || die "Unable to write $file\n"; -truncate(FILE, $ps * $pages); -close(FILE) || die "Unable to close $file"; -EOF - ---let $restart_parameters= ---source include/start_mysqld.inc -SET GLOBAL innodb_fast_shutdown=0; ---source include/restart_mysqld.inc - # Cleanup DROP TABLE t1; diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 4c3886caeaf..fde96575c59 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2265,8 +2265,10 @@ void recv_recover_page(buf_page_t* bpage) mtr.commit(); func_exit: + const ulint n_addrs = recv_sys->n_addrs; mutex_exit(&recv_sys->mutex); ut_ad(mtr.has_committed()); + if (!n_addrs) os_thread_sleep(1000); } /** Reads in pages which have hashed log records, from an area around a given @@ -2407,7 +2409,7 @@ void recv_apply_hashed_log_recs(bool last_batch) return; } - os_thread_sleep(500000); + os_thread_sleep(1000); mutex_enter(&recv_sys->mutex); } @@ -2516,7 +2518,7 @@ void recv_apply_hashed_log_recs(bool last_batch) return; } - os_thread_sleep(500000); + os_thread_sleep(1000); mutex_enter(&(recv_sys->mutex)); } On a second try, it occurred on the 362nd attempt: 2020-12-28 11:09:12 140184457193408 [Note] InnoDB: Starting a batch to recover 30 pages from redo log. 2020-12-28 11:09:12 140184457193408 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=319] still fixed or dirty On a third try, it failed on the 371st round: 2020-12-28 11:11:29 140563794130880 [Note] InnoDB: Starting a batch to recover 35 pages from redo log. 2020-12-28 11:11:29 140563794130880 [Note] InnoDB: Starting a batch to recover 38 pages from redo log. 2020-12-28 11:11:29 140563794130880 [ERROR] [FATAL] InnoDB: Page [page id: space=0, page number=331] still fixed or dirty 201228 11:11:29 [ERROR] mysqld got signal 6 ; With the 1-line fix additionally applied, 2,000 rounds of the test would pass: diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 4c3886caeaf..95179ec2271 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -2501,7 +2501,7 @@ void recv_apply_hashed_log_recs(bool last_batch) /* Wait until all the pages have been processed */ - while (recv_sys->n_addrs != 0) { + while (recv_sys->n_addrs || buf_get_n_pending_read_ios()) { const bool abort = recv_sys->found_corrupt_log || recv_sys->found_corrupt_fs; A search for "still fixed or dirty" turns up MDEV-20481 , where the message refers to page number 3 (clustered index root page), which cannot possibly be related to this bug, because the change buffer is only ever applied to secondary index pages (and never a root page). I made one more try to reproduce a different failure, by reverting the above fix and applying one more patch to disable two assertions: diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc index 9f3d743ada1..a86a9b3da51 100644 --- a/storage/innobase/buf/buf0buf.cc +++ b/storage/innobase/buf/buf0buf.cc @@ -1698,6 +1698,7 @@ buf_pool_contains_zip( } #endif /* UNIV_DEBUG */ +#if 0 /*********************************************************************//** Checks that all file pages in the buffer chunk are in a replaceable state. @return address of a non-free block, or NULL if all freed */ @@ -1761,6 +1762,7 @@ buf_chunk_not_freed( return(NULL); } +#endif /********************************************************************//** Set buffer pool size variables after resizing it */ @@ -6223,6 +6225,7 @@ buf_all_freed_instance( /*===================*/ buf_pool_t* buf_pool) /*!< in: buffer pool instancce */ { +#if 0 ulint i; buf_chunk_t* chunk; @@ -6241,6 +6244,7 @@ buf_all_freed_instance( } buf_pool_mutex_exit(buf_pool); +#endif return(TRUE); } This would cause an assertion failure a little later: 2020-12-28 11:23:12 140594869794752 [Note] InnoDB: Starting a batch to recover 35 pages from redo log. mysqld: /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6306: void buf_pool_invalidate_instance(buf_pool_t *): Assertion `(buf_pool->LRU).count == 0' failed. Let me ”fix” that one as well: @@ -6299,6 +6303,7 @@ buf_pool_invalidate_instance( while (buf_LRU_scan_and_free_block(buf_pool, true)) { } + UT_LIST_INIT(buf_pool->LRU, &buf_page_t::LRU); ut_ad(UT_LIST_GET_LEN(buf_pool->LRU) == 0); ut_ad(UT_LIST_GET_LEN(buf_pool->unzip_LRU) == 0); And it fails differently again: #5 0x0000558f6dd73fa2 in ut_dbg_assertion_failed (expr=0x558f6e1f1cb5 "UT_LIST_GET_LEN(buf_pool->LRU) == n_lru", file=<optimized out>, line=<optimized out>, line@entry=6542) at /mariadb/10.2o/storage/innobase/ut/ut0dbg.cc:60 #6 0x0000558f6dde4044 in buf_pool_validate_instance (buf_pool=<optimized out>) at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6542 #7 0x0000558f6dddbc59 in buf_validate () at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:6577 #8 buf_page_optimistic_get (rw_latch=<optimized out>, block=block@entry=0x7fcd55abd580, modify_clock=modify_clock@entry=1, file=0x558f6e1ec05b "/mariadb/10.2o/storage/innobase/btr/btr0pcur.cc", line=line@entry=217, mtr=<optimized out>) at /mariadb/10.2o/storage/innobase/buf/buf0buf.cc:4926 “Fix”: @@ -6562,6 +6567,7 @@ ibool buf_validate(void) /*==============*/ { +#if 0 ulint i; for (i = 0; i < srv_buf_pool_instances; i++) { @@ -6571,6 +6577,7 @@ buf_validate(void) buf_pool_validate_instance(buf_pool); } +#endif return(TRUE); } With those debug checks disabled, we are finally getting something that looks like ‘random’ corruption: 2020-12-28 11:29:07 139847306946496 [Note] InnoDB: Starting final batch to recover 28 pages from redo log. 2020-12-28 11:29:07 139847306946496 [Note] InnoDB: Cleaning up trx with id 0x291f 2020-12-28 11:29:07 0x7f30bd0da7c0 InnoDB: Assertion failure in file /mariadb/10.2o/storage/innobase/fsp/fsp0fsp.cc line 3197 InnoDB: Failing assertion: xdes_mtr_get_bit(descr, XDES_FREE_BIT, header_page % FSP_EXTENT_SIZE, mtr) == FALSE … #5 0x000055ee3dd9efa2 in ut_dbg_assertion_failed (expr=0x55ee3e22faf6 "xdes_mtr_get_bit(descr, XDES_FREE_BIT, header_page % FSP_EXTENT_SIZE, mtr) == FALSE", file=<optimized out>, line=<optimized out>, line@entry=3197) at /mariadb/10.2o/storage/innobase/ut/ut0dbg.cc:60 #6 0x000055ee3dea7caa in fseg_free_step (header=0x7f30b3ad403c "", mtr=mtr@entry=0x7ffdf24b56b8) at /mariadb/10.2o/storage/innobase/fsp/fsp0fsp.cc:3196 #7 0x000055ee3dd9cd5e in trx_undo_seg_free (undo=<optimized out>, noredo=<optimized out>) at /mariadb/10.2o/storage/innobase/trx/trx0undo.cc:1095 #8 trx_undo_commit_cleanup (undo=<optimized out>, is_temp=<optimized out>) at /mariadb/10.2o/storage/innobase/trx/trx0undo.cc:1789 #9 0x000055ee3dd8e7b5 in trx_cleanup_at_db_startup (trx=trx@entry=0x7f30b8171120) at /mariadb/10.2o/storage/innobase/trx/trx0trx.cc:1999 #10 0x000055ee3dd7f0ce in trx_rollback_resurrected (trx=<optimized out>, all=<optimized out>) at /mariadb/10.2o/storage/innobase/trx/trx0roll.cc:703 #11 trx_rollback_or_clean_recovered (all=<optimized out>, all@entry=0) at /mariadb/10.2o/storage/innobase/trx/trx0roll.cc:838 #12 0x000055ee3dd4b7f9 in innobase_start_or_create_for_mysql () at /mariadb/10.2o/storage/innobase/srv/srv0start.cc:2468 #13 0x000055ee3dbd90da in innobase_init (p=<optimized out>) at /mariadb/10.2o/storage/innobase/handler/ha_innodb.cc:4290 This corruption, which occurred on the 410th run of the test, appears to be related to an undo log page, not a secondary index or change buffer page. But, we must keep in mind that both undo log and change buffer pages are being allocated from the system tablespace in this test. With the same settings, I repeated this corruption on the 460th run of the test: 2020-12-28 11:35:06 139865104271296 [Note] InnoDB: Starting a batch to recover 31 pages from redo log. 2020-12-28 11:35:06 139865104271296 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo 2020-12-28 11:35:06 139865104271296 [Note] InnoDB: Trx id counter is 13056 2020-12-28 11:35:06 139865104271296 [Note] InnoDB: Starting final batch to recover 28 pages from redo log. 2020-12-28 11:35:06 139865104271296 [Note] InnoDB: Cleaning up trx with id 0x311f 2020-12-28 11:35:06 0x7f34e1db47c0 InnoDB: Assertion failure in file /mariadb/10.2o/storage/innobase/fsp/fsp0fsp.cc line 3197 InnoDB: Failing assertion: xdes_mtr_get_bit(descr, XDES_FREE_BIT, header_page % FSP_EXTENT_SIZE, mtr) == FALSE I am afraid that this is the closest I can get to repeating a MDEV-9663 style corruption. Again, if I apply the 1-line fix, 2,000 rounds of the tests would succeed without any failures. Results might be different if I had used real storage instead of a RAM disk, or if I enabled encryption slow down page I/O. I believe that the corruption that I repeated (a data page that is expected to be allocated is marked free for reuse) could explain MDEV-15608 : a page that is supposed to be an undo log page might also be used for the change buffer, and corruption would ensue. Without having access to page dumps we of course can only make educated guesses.

Marko Mäkelä made changes - 2020-12-28 09:51

Summary

Potential corruption of last recovered page

Corruption of system tablespace or last recovered page

Marko Mäkelä added a comment - 2020-12-28 10:36

The race condition with the one-line fix could explain various types of hard-to-reproduce corruption. That race condition is present in the very first InnoDB commit. The probability of encountering the race condition was significantly increased by the widespread use of hot backup tools (at least Percona XtraBackup or Mariabackup).

Based on a quick read of the MySQL 5.6 source code and my distant memory of the closed-source ibbackup program (the predecessor of MySQL Enterprise Backup), it might be that some versions of those tools are not affected by the bug. But, InnoDB crash recovery for sure always was affected.

Marko Mäkelä added a comment - 2020-12-28 10:36 The race condition with the one-line fix could explain various types of hard-to-reproduce corruption. That race condition is present in the very first InnoDB commit . The probability of encountering the race condition was significantly increased by the widespread use of hot backup tools (at least Percona XtraBackup or Mariabackup). Based on a quick read of the MySQL 5.6 source code and my distant memory of the closed-source ibbackup program (the predecessor of MySQL Enterprise Backup), it might be that some versions of those tools are not affected by the bug. But, InnoDB crash recovery for sure always was affected.

Marko Mäkelä made changes - 2020-12-28 10:36

issue.field.resolutiondate

2020-12-28 10:36:40.0

2020-12-28 10:36:40.204

Marko Mäkelä made changes - 2020-12-28 10:36

Fix Version/s		10.2.37 [ 25112 ]
Fix Version/s		10.3.28 [ 25111 ]
Fix Version/s		10.4.18 [ 25110 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.3 [ 22126 ]
Fix Version/s	10.4 [ 22408 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Marko Mäkelä made changes - 2020-12-28 10:40

Link

This issue relates to ~~MDEV-24141~~ [ ~~MDEV-24141~~ ]

Marko Mäkelä made changes - 2021-01-04 10:43

Link

This issue relates to ~~MDEV-14643~~ [ ~~MDEV-14643~~ ]

Chris Calender (Inactive) made changes - 2021-01-08 17:49

Labels

corruption not-10.5

CS0180575 corruption not-10.5

Julien Fritsch made changes - 2021-01-19 17:51

Link

This issue blocks TODO-2696 [ TODO-2696 ]

Marko Mäkelä made changes - 2021-01-28 16:26

Link

This issue relates to ~~MDEV-24709~~ [ ~~MDEV-24709~~ ]

Maria M Pflaum (Inactive) made changes - 2021-02-02 22:54

Labels

CS0180575 corruption not-10.5

CS0156770 CS0180575 corruption not-10.5

Marko Mäkelä made changes - 2021-03-09 12:08

Link

This issue relates to ~~MDEV-23192~~ [ ~~MDEV-23192~~ ]

Marko Mäkelä made changes - 2021-03-19 06:57

Link

This issue relates to ~~MDEV-25198~~ [ ~~MDEV-25198~~ ]

Marko Mäkelä made changes - 2021-03-22 07:12

Link

This issue relates to ~~MDEV-19544~~ [ ~~MDEV-19544~~ ]

Marko Mäkelä added a comment - 2021-03-22 07:12 - edited

Edit: This comment was meant to be posted to ~~MDEV-20605~~.

Marko Mäkelä added a comment - 2021-03-22 07:12 - edited Edit: This comment was meant to be posted to MDEV-20605 .

Marko Mäkelä made changes - 2021-03-22 07:12

Link

This issue relates to ~~MDEV-24224~~ [ ~~MDEV-24224~~ ]

Marko Mäkelä made changes - 2021-03-22 07:14

Link

This issue relates to ~~MDEV-19544~~ [ ~~MDEV-19544~~ ]

Marko Mäkelä made changes - 2021-03-22 07:14

Link

This issue relates to ~~MDEV-24224~~ [ ~~MDEV-24224~~ ]

Marko Mäkelä made changes - 2021-04-06 16:59

Link

This issue relates to ~~MDEV-25344~~ [ ~~MDEV-25344~~ ]

Marko Mäkelä made changes - 2021-04-15 08:21

Link

This issue relates to ~~MDEV-21811~~ [ ~~MDEV-21811~~ ]

Marko Mäkelä made changes - 2021-04-30 10:32

Link

This issue relates to MENT-1004 [ MENT-1004 ]

Rob Schwyzer (Inactive) made changes - 2021-05-04 22:40

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31117 ]

Rob Schwyzer (Inactive) made changes - 2021-05-11 19:10

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 31165 ]

Rob Schwyzer (Inactive) made changes - 2021-06-23 21:09

Labels

CS0156770 CS0180575 corruption not-10.5

CS0156770 CS0180575 ServiceNow corruption not-10.5

Rob Schwyzer (Inactive) made changes - 2021-07-02 19:46

Labels

CS0156770 CS0180575 ServiceNow corruption not-10.5

76qDvLB8Gju6Hs7nk3VY3EX42G795W5z CS0156770 CS0180575 corruption not-10.5

Sergei Golubchik made changes - 2021-08-13 22:25

Labels

76qDvLB8Gju6Hs7nk3VY3EX42G795W5z CS0156770 CS0180575 corruption not-10.5

76qDvLB8Gju6Hs7nk3VY3EX42G795W5z corruption not-10.5

Sergei Golubchik made changes - 2021-08-13 22:33

Labels

76qDvLB8Gju6Hs7nk3VY3EX42G795W5z corruption not-10.5

corruption not-10.5

Rob Schwyzer (Inactive) made changes - 2021-11-04 19:42

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 32331 ]

Sergei Golubchik made changes - 2021-12-06 21:52

Workflow

MariaDB v3 [ 117227 ]

MariaDB v4 [ 158711 ]

Marko Mäkelä made changes - 2022-02-03 06:53

Link

This issue relates to ~~MDEV-27734~~ [ ~~MDEV-27734~~ ]

Rob Schwyzer (Inactive) made changes - 2022-03-11 20:07

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 33285 ]

Rob Schwyzer (Inactive) made changes - 2022-07-25 22:25

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 34202 ]

Marko Mäkelä made changes - 2022-11-28 12:38

Link

This issue relates to ~~MDEV-29694~~ [ ~~MDEV-29694~~ ]

Rob Schwyzer (Inactive) made changes - 2022-12-08 21:02

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 34687 ]

Rob Schwyzer (Inactive) made changes - 2023-01-11 22:54

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 34706 ]

Rob Schwyzer (Inactive) made changes - 2023-01-11 23:23

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 34706 ]

Marko Mäkelä made changes - 2023-01-17 13:30

Link

This issue relates to ~~MDEV-30422~~ [ ~~MDEV-30422~~ ]

Rob Schwyzer (Inactive) made changes - 2023-09-20 20:01

Remote Link

This issue links to "Page (MariaDB Confluence)" [ 35966 ]

Jira Automation (IT) made changes - 2024-07-04 02:35

Zendesk Related Tickets		201658 171635
Zendesk active tickets		201658

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Dates

Created:: 2020-12-18 17:53

Updated:: 2024-09-06 09:59

Resolved:: 2020-12-28 10:36

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.