[MDEV-11799] InnoDB can abort if the doublewrite buffer contains a bad and a good copy Created: 2017-01-15  Updated: 2020-08-07  Resolved: 2020-07-31

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 5.5, 10.0, 10.1, 10.2, 10.3, 10.4, 10.5
Fix Version/s: 10.2.33, 10.3.24, 10.4.14, 10.5.5

Type: Bug Priority: Major
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: crash, recovery

Issue Links:
Duplicate
duplicates MDEV-22942 InnoDB: Failing assertion: page_offse... Closed
PartOf
includes MDEV-23231 [ERROR] [FATAL] InnoDB: Apparent corr... Closed
is part of MDEV-11623 MariaDB 10.1 fails to start datadir c... Closed
Relates
relates to MDEV-22942 InnoDB: Failing assertion: page_offse... Closed
relates to MDEV-10217 innodb.innodb_bug59641 fails sporadic... Closed
relates to MDEV-12905 InnoDB occasionally skips the doublew... Closed
relates to MDEV-13103 InnoDB: Deal with page_compressed pag... Closed
relates to MDEV-16763 SIGNAL 6 ERROR Closed
relates to MDEV-18054 [Draft] Server crash in mach_read_fro... Open
Sprint: 10.1.21

 Description   

This came up when testing MDEV-11623:

# 2017-01-14T19:58:03 [2066] --------- Contents of /home/buildbot/upgrade-10.1-to-10.1/current1_1/mysql.err -------------
2017-01-14 19:58:02 139960389887872 [Note] /home/buildbot/build/sql/mysqld (mysqld 10.1.21-MariaDB-debug) starting as process 2233 ...
2017-01-14 19:58:02 139960389887872 [Warning] No argument was provided to --log-bin and neither --log-basename or --log-bin-index where used;  This may cause repliction to break when this server acts as a master and has its hostname changed! Please use '--log-basename=debian-jessie-amd64' or '--log-bin=mysql-bin' to avoid this problem.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: innodb_page_size has been changed from default value 16384 to 8192.
2017-01-14 19:58:02 7f4b11526780 InnoDB: innodb-page-size has been changed from the default value 16384 to 8192.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2017-01-14 19:58:02 139960389887872 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_DEBUG switched on !!!!!!!!!
2017-01-14 19:58:02 139960389887872 [Note] InnoDB:  InnoDB: !!!!!!!! UNIV_SYNC_DEBUG switched on !!!!!!!!!
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: The InnoDB memory heap is disabled
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Compressed tables use zlib 1.2.8
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using Linux native AIO
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Using generic crc32 instructions
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Completed initialization of buffer pool
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 0 name ./ibdata1 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 4294967280 name ./ib_logfile0 key_id 0 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 4294967281 name arch_log_space key_id 0 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Highest supported file format is Barracuda.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: The log sequence numbers 6373461 and 6373461 in ibdata files do not match the log sequence number 6373471 in the ib_logfiles!
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Database was not shutdown normally!
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Starting crash recovery.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Reading tablespace information from the .ibd files...
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 11 name test/t6 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 15 name test/oltp1 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 18 name test/oltp7 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 9 name test/t4 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 17 name test/oltp5 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 6 name test/t1 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 16 name test/oltp3 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 10 name test/t5 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 7 name test/t2 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 8 name test/t3 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 14 name test/t9 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 12 name test/t7 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 13 name test/t8 key_id 1 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 2 name mysql/innodb_index_stats key_id 0 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 3 name mysql/gtid_slave_pos key_id 0 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Created tablespace for space 1 name mysql/innodb_table_stats key_id 0 encryption 0.
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: Restoring possible half-written data pages 
2017-01-14 19:58:02 139960389887872 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Database page corruption or a failed
InnoDB: file read of space 0 page 674.
InnoDB: Trying to recover it from the doublewrite buffer.
InnoDB: Dump of corresponding page in doublewrite buffer:

The last quoted line is displayed by the function buf_dblwr_process():

                                } else if (buf_page_is_corrupted(true,
                                                                 page,
                                                                 zip_size)) {
                                        fprintf(stderr,
                                                "InnoDB: Dump of the page:\n");
                                        buf_page_print(
                                                read_buf, zip_size,
                                                BUF_PAGE_PRINT_NO_CRASH);
                                        fprintf(stderr,
                                                "InnoDB: Dump of"
                                                " corresponding page"
                                                " in doublewrite buffer:\n");

The logic is flawed, because nothing prevents us from having multiple copies of the same page in the doublewrite buffer. If the server was killed during a write to the doublewrite page, then restarted, and killed again during a wite to the data page so that the doublewrite buffer still contains the corrupted copy from the first kill, recovery would unnecessarily abort the server process, making the database inaccessible.



 Comments   
Comment by Marko Mäkelä [ 2017-01-15 ]

Pushed to 10.1.21. I think that we should do it in 10.0 as well, so I am not closing this yet.

Comment by Marko Mäkelä [ 2017-01-16 ]

I think that a slight improvement to the logic is needed. Currently, buf_dblwr_process() is choosing the first valid-looking copy of the page that it encounters, even though there could be newer copies (with later LSN). Restoring a too old copy of the page may cause information loss and corruption of the database. buf_dblwr_process() should restore a valid copy of the page with the latest LSN, similar to how fil_user_tablespace_restore_page() does it.

Comment by Marko Mäkelä [ 2017-01-16 ]

While investigating this further, I noticed a bug in recv_dblwr_t::find_page() which is used by fil_user_tablespace_restore_page() for restoring the first page of each tablespace, if that page is corrupted.
The function recv_dblwr_t::find_page() is only looking for the page with the biggest LSN matching the tablespace ID and the page number. If that page is corrupted but there is a slightly older but good copy of the page, we will ruin our changes to recover from the crash.

The fix of this issue in 10.1.21 was incomplete, and more work will be needed.

Comment by Thirunarayanan Balathandayuthapani [ 2020-07-29 ]

Patch is in bb-10.2-MDEV-11799

Comment by Marko Mäkelä [ 2020-07-29 ]

Thank you, the high-level idea of the fix looks good. I spotted some further problems and potential for improvement.
I think that we will also need a 10.4 version of this for testing the MDEV-23233 scenario.

This fix must be stress-tested with RQG (killing and restarting the server during a DML-only workload, to avoid hitting DDL-related problems that would hopefully some day be addressed by MDEV-17567). Testing with Mariabackup is not going to be fruitful, because Mariabackup is not using the doublewrite buffer at all.

I do not think that this can be meaningfully tested with MTR. Only for the MDEV-23233 scenario where we would wrongly replace a valid page_compressed page with an invalid uncompressed one, we could write a test, but only starting with 10.3, because the exact scenario depends on MDEV-16328.

Comment by Marko Mäkelä [ 2020-07-31 ]

I merged this up to 10.4 and tested 10.4 with the MDEV-23231 data directory.

Generated at Thu Feb 08 07:52:44 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.