[MDEV-11125] Introduce a reduced doublewrite mode, handling error detection only Created: 2016-10-24  Updated: 2023-03-15  Resolved: 2021-10-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Jan Lindström (Inactive) Assignee: Marko Mäkelä
Resolution: Won't Fix Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-12699 Improve crash recovery of corrupted d... Closed
relates to MDEV-12905 InnoDB occasionally skips the doublew... Closed
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed
relates to MDEV-19738 Doublewrite buffer is unnecessarily u... Closed
Sprint: 10.3.1-2

 Description   

https://github.com/webscalesql/webscalesql-5.6/commit/3676902ccd6df75ddb51da9ef1b04c93d7f3da5c



 Comments   
Comment by Jan Lindström (Inactive) [ 2017-07-26 ]

https://github.com/MariaDB/server/commit/f3a9a45a70c351cab678c0ac7b95af5a04ba6679

Comment by Marko Mäkelä [ 2017-08-08 ]

Please request another review after addressing my comments. I mostly concentrated on the tests; I did not review all code yet.

Comment by Marko Mäkelä [ 2017-08-18 ]

After analyzing the failed dataset of MDEV-12905 today, it occurred to me to question the usefulness of the reduced doublewrite mode (innodb_doublewrite=2) that this patch is introducing.

InnoDB uses write-ahead logging, also known as ARIES.
The InnoDB doublewrite buffer can be viewed as an additional safety layer. If the doublewrite buffer is disabled, I believe that the normal InnoDB redo log based recovery (trying to apply changes to all pages that were changed since the latest checkpoint) would detect corrupted recently written pages, just like the proposed parameter innodb_doublewrite=2 would do. So, why not just set innodb_doublewrite=0 and get a similar result, with zero overhead for the page writes?

Every recently written page must be referenced by the redo log that was written since the latest checkpoint. A checkpoint is not updated in the redo log before all the before-checkpoint modified pages have been written out to the data files. If our only goal is to detect writes that were interrupted by server kill, the redo log will take care of that.

If the goal is to detect writes that were a bit older than the latest checkpoint, theoretically we could start the recovery from the older of the two latest checkpoints. (The InnoDB redo log header keeps track of the two most recent redo log checkpoints.) If the goal is to detect even older page corruption, CHECK TABLE could be used for covering most data pages, or the offline innochecksum utility could scan all data files.

Comment by Jan Lindström (Inactive) [ 2017-08-19 ]

You are correct on you analysis, this feature was developed for InnoDB version that easily would crash if corrupted page is read. Naturally, any page could be corrupted and as we have only two checkpoints, yes page that is older than both of them could be corrupted. Furthermore, page that we have not modified ages could also get corrupted by some lower level media failure. Do not know if we have enough proof that our InnoDB version could survive all page failures. Thus, usefulness of this feature is up to customers what risks they are willing to take. I know at least one customer that was interested to use this feature at that point using doublewrite=0 I would not have recommended.

Comment by Marko Mäkelä [ 2017-08-21 ]

Thank you for confirming my analysis.
Compared to innodb_doublewrite=0 and reading the recently written page identifiers from the redo log (starting from a recent or the latest checkpoint), the reduced innodb_doublewrite=2 mode could have the benefit that it might be able to record a larger number of recently written pages, especially if the most recent workload involves writing a few pages over and over, so that the redo log would only contain the few page identifiers of the busy pages.

There is another corruption mode, of which I have some anecdotal evidence: spontaneous corruption of data that was not accessed (read or written) for a long time. As far as I understand, neither the reduced doublewrite buffer nor the redo logs would help there.

I believe that before introducing the new innodb_doublewrite=2 mode, we should spend more effort on improving the corruption handling of InnoDB.

Comment by Marko Mäkelä [ 2017-10-12 ]

Instead of doing this, maybe we should disable the doublewrite buffering for pages that are being written for the first time. Once MDEV-12699 is implemented, we can also disable doublewrite buffering for pages that were (re)initialized by a mini-transaction.

Comment by Marko Mäkelä [ 2021-10-25 ]

I think that MDEV-12699, MDEV-19738 and MDEV-15528 are already improving this as much as is feasible.

Furthermore, MDEV-23855 greatly reduced the latency of the doublewrite buffer, by issuing a 128-page asynchronous write request for the entire doublewrite buffer, and filling another 128-byte doublewrite buffer in memory while one copy is being written.

Comment by Marko Mäkelä [ 2022-07-26 ]

For the record, MySQL 8.0.30 includes WL#14710: InnoDB: Introducing REDUCED double write mode (Facebook Contribution)
I do not think that MySQL implements any of the following MariaDB recovery improvements yet: MDEV-12699, MDEV-19738, MDEV-15528, MDEV-24626.

Comment by Marko Mäkelä [ 2023-03-15 ]

I believe that MariaDB should handle corrupted pages gracefully during recovery. This was recently extensively tested in combination with MDEV-13542 and some related changes.

Generated at Thu Feb 08 07:47:30 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.