[MDEV-17638] Improve error message about corruption of encrypted page Created: 2018-11-07  Updated: 2019-08-01  Resolved: 2019-08-01

Status: Closed
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB
Affects Version/s: 10.2.12, 10.3.10, 10.1.37
Fix Version/s: 10.2.27, 10.1.42, 10.3.18, 10.4.8

Type: Bug Priority: Major
Reporter: Geoff Montee (Inactive) Assignee: Eugene Kosov (Inactive)
Resolution: Fixed Votes: 2
Labels: None

Issue Links:
Relates
relates to MDEV-11939 innochecksum mistakes a file for an e... Closed
relates to MDEV-12112 corruption in encrypted table may be ... Closed
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed
relates to MDEV-17957 Make Innodb_checksum_algorithm strict... Closed
relates to MDEV-17958 Make bug-endian innodb_checksum_algor... Closed
relates to MDEV-18025 Mariabackup fails to detect corrupted... Closed
relates to MDEV-18529 InnoDB wrongly skips decryption of en... Closed

 Description   

See the following error message:

2018-11-05 3:53:42 139986889774848 [ERROR] InnoDB: Page 227027:2591 may be corrupted. Post encryption checksum 3926119398 stored [3479160576:3479160576] key_version 1
2018-11-05 03:53:42 0x7f513cd69700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/rem/rem0rec.cc line 580

This is printed here:

https://github.com/MariaDB/server/blob/5567a8c936ede50efcdf4a7f898dbaa058106d70/storage/innobase/fil/fil0crypt.cc#L2639

A lot of people seem to think that 227027 in the above message would refer to the page number, but it actually refers to the tablespace ID. I think we should improve this message. Maybe something like?:

2018-11-05 3:53:42 139986889774848 [ERROR] InnoDB: Page in tablespace 227027 with offset 2591 may be corrupted. Post encryption checksum 3926119398 stored [3479160576:3479160576] key_version 1
2018-11-05 03:53:42 0x7f513cd69700 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/storage/innobase/rem/rem0rec.cc line 580



 Comments   
Comment by Marko Mäkelä [ 2018-11-08 ]

MDEV-13542 would ultimately propagate the error to the SQL layer. Even before that, we definitely could improve the messages that are output to the error log.

Comment by Marko Mäkelä [ 2018-11-08 ]

I think that it would be better in these messages to display the file name. I would still keep the tablespace ID in the output.
Starting with 10.2, many such messages are emitted via

/** Print the given page_id_t object.
@param[in,out]	out	the output stream
@param[in]	page_id	the page_id_t object to be printed
@return the output stream */
std::ostream&
operator<<(
	std::ostream&		out,
	const page_id_t		page_id)
{
	out << "[page id: space=" << page_id.m_space
		<< ", page number=" << page_id.m_page_no << "]";
	return(out);
}

which produces more verbose output.

Comment by Marko Mäkelä [ 2019-01-31 ]

The message was originally introduced in MDEV-11759 (MariaDB 10.1.22, 10.2.5) and slightly changed in MDEV-11939 (MariaDB 10.1.26, 10.2.8). (The original wording was "Page %lu in space %s (%lu) maybe corrupted."

Ultimately, the message and the code path leading to it was removed in MDEV-12112 (MariaDB 10.1.38, 10.2.20, 10.3.12).

The presence of this message in an error log spells big trouble. The logic that was introduced in MDEV-11759 was based on the wrong assumption that a data page whose contents appears to carry a valid ‘before encryption’ checksum cannot have been encrypted. Then, MariaDB would wrongly skip the page decryption and hand over the garbled page contents to the caller of buf_page_get_gen(), which would then crash basically anywhere. If I remember correctly my debugging session from December 2018, in MariaDB Server 10.1, buf_page_get_gen() would return NULL in this case. Some of its callers are prepared for that error situation.

To make matters worse, there are several checksum algorithm implementations in InnoDB. It sufficed for one of the various algorithms to produce a matching checksum. MDEV-17957 and MDEV-17958 improved innodb_checksum_algorithm=strict_crc32 so that only one checksum will be attempted.

In the case of one customer, an encrypted page had the exact same CRC-32C checksum both for the before-encryption and after-encryption checksums.

MDEV-18025 extended the test mariabackup.encrypted_page_corruption to cover a scenario where an encrypted page appears to contain a valid checksum on the encrypted data. The test ensures that InnoDB will attempt page decryption of an intentionally corrupted encrypted page, and only then detect that the decrypted contents of the page does not match the before-encryption checksum.

Comment by Geoff Montee (Inactive) [ 2019-01-31 ]

Ah, OK. It's good to know that the code path was removed.

Thanks for the analysis!

It looks like innochecksum still prints a similar message about corruption in the "Page $space_id:$page_num" format that has confused some people if you'd want to change that one too:

https://github.com/MariaDB/server/blob/7d245083a43e34d94822e580037727bdbb50b6f0/extra/innochecksum.cc#L527

Comment by Marko Mäkelä [ 2019-02-05 ]

Thanks for the clarification. I am lowering the priority from Critical. I would keep the terse format in messages that are only present in instrumented debug builds, but I agree that a more verbose format could be easier for users.

Comment by Eugene Kosov (Inactive) [ 2019-08-01 ]

Diagnostics from topic message was removed in https://github.com/mariadb/server/commit/8c43f963882a9d5ac4e4289c8dd3dbcaeb40a0ce

Generated at Thu Feb 08 08:37:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.