[MDEV-11799] InnoDB can abort if the doublewrite buffer contains a bad and a good copy - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 5.5(EOL), 10.0(EOL), 10.1(EOL), 10.2(EOL), 10.3(EOL), 10.4(EOL), 10.5
Fix Version/s: 10.2.33, 10.3.24, 10.4.14, 10.5.5
Component/s: Storage Engine - InnoDB
Labels:
- crash
- recovery

Sprint:
10.1.21

Description

This came up when testing ~~MDEV-11623~~:

# 2017-01-14T19:58:03 [2066] --------- Contents of /home/buildbot/upgrade-10.1-to-10.1/current1_1/mysql.err -------------

2017-01-14 19:58:02 139960389887872 [Note] /home/buildbot/build/sql/mysqld (mysqld 10.1.21-MariaDB-debug) starting as process 2233 ...

2017-01-14 19:58:02 139960389887872 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used;  This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=debian-jessie-amd64' or '--log-bin=mysql-bin' to avoid this problem.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: innodb_page_size has been changed from default value 16384 to 8192.

2017-01-14 19:58:02 7f4b11526780 InnoDB: innodb-page-size has been changed from the default value 16384 to 8192.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using mutexes to ref count buffer pool pages

2017-01-14 19:58:02 139960389887872 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!

2017-01-14 19:58:02 139960389887872 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: The InnoDB memory heap is disabled

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Compressed tables use zlib 1.2.8

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using Linux native AIO

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using generic crc32 instructions

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Initializing buffer pool, size = 128.0M

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Completed initialization of buffer pool

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 0 name ./ibdata1 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 4294967280 name ./ib_logfile0 key_id 0 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 4294967281 name arch_log_space key_id 0 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Highest supported file format is Barracuda.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: The log sequence numbers 6373461 and 6373461 in ibdata files do not match the log sequence number 6373471 in the ib_logfiles!

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Database was not shutdown normally!

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Starting crash recovery.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Reading tablespace information from the .ibd files...

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 11 name test/t6 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 15 name test/oltp1 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 18 name test/oltp7 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 9 name test/t4 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 17 name test/oltp5 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 6 name test/t1 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 16 name test/oltp3 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 10 name test/t5 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 7 name test/t2 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 8 name test/t3 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 14 name test/t9 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 12 name test/t7 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 13 name test/t8 key_id 1 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 2 name mysql/innodb_index_stats key_id 0 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 3 name mysql/gtid_slave_pos key_id 0 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 1 name mysql/innodb_table_stats key_id 0 encryption 0.

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Restoring possible half-written data pages

2017-01-14 19:58:02 139960389887872 [Note] InnoDB: from the doublewrite buffer...

InnoDB: Database page corruption or a failed

InnoDB: file read of space 0 page 674.

InnoDB: Trying to recover it from the doublewrite buffer.

…

InnoDB: Dump of corresponding page in doublewrite buffer:

The last quoted line is displayed by the function buf_dblwr_process():

                                } else if (buf_page_is_corrupted(true,

                                                                 page,

                                                                 zip_size)) {

                                        fprintf(stderr,

                                                "InnoDB: Dump of the page:\n");

                                        buf_page_print(

                                                read_buf, zip_size,

                                                BUF_PAGE_PRINT_NO_CRASH);

                                        fprintf(stderr,

                                                "InnoDB: Dump of"

                                                " corresponding page"

                                                " in doublewrite buffer:\n");

The logic is flawed, because nothing prevents us from having multiple copies of the same page in the doublewrite buffer. If the server was killed during a write to the doublewrite page, then restarted, and killed again during a wite to the data page so that the doublewrite buffer still contains the corrupted copy from the first kill, recovery would unnecessarily abort the server process, making the database inaccessible.

Attachments

Issue Links

duplicates

MDEV-22942 InnoDB: Failing assertion: page_offset(rec) <= page_header_get_field(page, PAGE_HEAP_TOP)

Closed

includes

MDEV-23231 [ERROR] [FATAL] InnoDB: Apparent corruption of an index page ... during crash recovery

Closed

is part of

MDEV-11623 MariaDB 10.1 fails to start datadir created with MariaDB 10.0/MySQL 5.6 using innodb-page-size!=16K

Closed

relates to

MDEV-22942 InnoDB: Failing assertion: page_offset(rec) <= page_header_get_field(page, PAGE_HEAP_TOP)

Closed

MDEV-10217 innodb.innodb_bug59641 fails sporadically in buildbot: InnoDB: Failing assertion: current_rec != insert_rec in file page0cur.c line 1052

Closed

MDEV-12905 InnoDB occasionally skips the doublewrite buffer write for some pages

Closed

MDEV-13103 InnoDB: Deal with page_compressed page corruption

Closed

MDEV-16763 SIGNAL 6 ERROR

Closed

MDEV-18054 [Draft] Server crash in mach_read_from_1 or Failing assertion: ret > 0 in rem0rec.ic upon crash-upgrade from 10.1.22

Closed

(4 relates to)

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2017-01-15 17:11

Pushed to 10.1.21. I think that we should do it in 10.0 as well, so I am not closing this yet.

Marko Mäkelä added a comment - 2017-01-15 17:11 Pushed to 10.1.21. I think that we should do it in 10.0 as well, so I am not closing this yet.

Marko Mäkelä added a comment - 2017-01-16 06:56

I think that a slight improvement to the logic is needed. Currently, buf_dblwr_process() is choosing the first valid-looking copy of the page that it encounters, even though there could be newer copies (with later LSN). Restoring a too old copy of the page may cause information loss and corruption of the database. buf_dblwr_process() should restore a valid copy of the page with the latest LSN, similar to how fil_user_tablespace_restore_page() does it.

Marko Mäkelä added a comment - 2017-01-16 06:56 I think that a slight improvement to the logic is needed. Currently, buf_dblwr_process() is choosing the first valid-looking copy of the page that it encounters, even though there could be newer copies (with later LSN). Restoring a too old copy of the page may cause information loss and corruption of the database. buf_dblwr_process() should restore a valid copy of the page with the latest LSN, similar to how fil_user_tablespace_restore_page() does it.

Marko Mäkelä added a comment - 2017-01-16 08:10

While investigating this further, I noticed a bug in recv_dblwr_t::find_page() which is used by fil_user_tablespace_restore_page() for restoring the first page of each tablespace, if that page is corrupted.
The function recv_dblwr_t::find_page() is only looking for the page with the biggest LSN matching the tablespace ID and the page number. If that page is corrupted but there is a slightly older but good copy of the page, we will ruin our changes to recover from the crash.

The fix of this issue in 10.1.21 was incomplete, and more work will be needed.

Marko Mäkelä added a comment - 2017-01-16 08:10 While investigating this further, I noticed a bug in recv_dblwr_t::find_page() which is used by fil_user_tablespace_restore_page() for restoring the first page of each tablespace, if that page is corrupted. The function recv_dblwr_t::find_page() is only looking for the page with the biggest LSN matching the tablespace ID and the page number. If that page is corrupted but there is a slightly older but good copy of the page, we will ruin our changes to recover from the crash. The fix of this issue in 10.1.21 was incomplete, and more work will be needed.

Thirunarayanan Balathandayuthapani added a comment - 2020-07-29 06:37

Patch is in bb-10.2-~~MDEV-11799~~

Thirunarayanan Balathandayuthapani added a comment - 2020-07-29 06:37 Patch is in bb-10.2- MDEV-11799

Marko Mäkelä added a comment - 2020-07-29 08:29

Thank you, the high-level idea of the fix looks good. I spotted some further problems and potential for improvement.
I think that we will also need a 10.4 version of this for testing the ~~MDEV-23233~~ scenario.

This fix must be stress-tested with RQG (killing and restarting the server during a DML-only workload, to avoid hitting DDL-related problems that would hopefully some day be addressed by ~~MDEV-17567~~). Testing with Mariabackup is not going to be fruitful, because Mariabackup is not using the doublewrite buffer at all.

I do not think that this can be meaningfully tested with MTR. Only for the ~~MDEV-23233~~ scenario where we would wrongly replace a valid page_compressed page with an invalid uncompressed one, we could write a test, but only starting with 10.3, because the exact scenario depends on ~~MDEV-16328~~.

Marko Mäkelä added a comment - 2020-07-29 08:29 Thank you, the high-level idea of the fix looks good. I spotted some further problems and potential for improvement. I think that we will also need a 10.4 version of this for testing the MDEV-23233 scenario. This fix must be stress-tested with RQG (killing and restarting the server during a DML-only workload, to avoid hitting DDL-related problems that would hopefully some day be addressed by MDEV-17567 ). Testing with Mariabackup is not going to be fruitful, because Mariabackup is not using the doublewrite buffer at all. I do not think that this can be meaningfully tested with MTR. Only for the MDEV-23233 scenario where we would wrongly replace a valid page_compressed page with an invalid uncompressed one, we could write a test, but only starting with 10.3, because the exact scenario depends on MDEV-16328 .

Marko Mäkelä added a comment - 2020-07-31 15:13

I merged this up to 10.4 and tested 10.4 with the ~~MDEV-23231~~ data directory.

Marko Mäkelä added a comment - 2020-07-31 15:13 I merged this up to 10.4 and tested 10.4 with the MDEV-23231 data directory.

People

Assignee:: Marko Mäkelä

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2017-01-15 16:27

Updated:: 2020-08-07 16:05

Resolved:: 2020-07-31 15:13

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server