[MDEV-31098] InnoDB Recovery doesn't display encryption message when no encryption configuration passed - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.6
Fix Version/s: 10.6.16, 10.10.7, 10.11.6, 11.0.4, 11.1.3
Component/s: Storage Engine - InnoDB
Labels:
- recovery

Description

An attempt to recover the attached data.tar.xz fails:

tar xJf data.tar.xz

mariadbd --innodb-page-size=32k --datadir "$(pwd)"/data

10.6 7e31a8e7fa97a87fc164381588d172bf0e76abb6
2023-04-20 16:23:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14303353,21931175
2023-04-20 16:23:45 0 [ERROR] InnoDB: Recovery cannot access file ./cool_down/t5.ibd (tablespace 28)
2023-04-20 16:23:45 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
2023-04-20 16:23:45 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1458] with error Generic error
2023-04-20 16:23:45 0 [Note] InnoDB: Starting shutdown...
2023-04-20 16:23:45 0 [ERROR] Plugin 'InnoDB' init function returned error.

A crude attempt at fixing this as follows caused a crash.

diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc

index f79eb015673..0f614068afd 100644

--- a/storage/innobase/fil/fil0fil.cc

+++ b/storage/innobase/fil/fil0fil.cc

@@ -2496,23 +2496,16 @@ fil_ibd_load(

 	os_offset_t	size;

-	bool		deferred_space = false;

 	/* Read and validate the first page of the tablespace.

 	Assign a tablespace name based on the tablespace type. */

 	switch (file.validate_for_recovery()) {

 		os_offset_t	minimum_size;

 	case DB_SUCCESS:

-		deferred_space = file.m_defer;

-		if (deferred_space) {

-			goto tablespace_check;

-		}

-		if (file.space_id() != space_id) {

+		if (!file.m_defer && file.space_id() != space_id) {

 			return(FIL_LOAD_ID_CHANGED);

-tablespace_check:

 		/* Get and test the file size. */

 		size = os_file_get_size(file.handle());

@@ -2528,9 +2521,7 @@ fil_ibd_load(

 			ib::error() << "Could not measure the size of"

 				" single-table tablespace file '"

 				<< file.filepath() << "'";

-		} else if (deferred_space) {

-			return FIL_LOAD_DEFER;

-		} else if (size < minimum_size) {

+		} else if (!file.m_defer && size < minimum_size) {

 			ib::error() << "The size of tablespace file '"

 				<< file.filepath() << "' is only " << size

 				<< ", should be at least " << minimum_size

The problem is that the FIL_LOAD_DEFER return value causes InnoDB crash recovery to "think" that the file will be initialized by redo log records. The FIL_PAGE_LSN in page 0 of the file is 61,024,384, that is, after the checkpoint LSN 14,303,353. The first log record (a WRITE) that recv_sys_t::recover_deferred() would see for the page carries start_lsn = 79742990 and lsn = 79743157, which is even later. The deferred recovery fails because skip_read was not set for the recv_sys.pages entry.

In case the problem is actually on the 'write' side (which I doubt), you can access rr replay of the previously killed server as follows:

ssh sdp

rr replay /data1/results/1681982706/TBR-1080/1/rr/mysqld-1

It starts straight after bootstrap, at LSN 41114.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

data.tar.xz
3.26 MB
2023-04-20 13:34
encryption_keys.txt
0.4 kB
2023-04-25 10:34

Issue Links

is caused by

MDEV-24626 Remove synchronous write of page0 and flushing file during file creation

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä created issue - 2023-04-20 13:37

Marko Mäkelä made changes - 2023-04-20 13:37

Field	Original Value	New Value
Link		This issue is caused by ~~MDEV-24626~~ [ ~~MDEV-24626~~ ]

Marko Mäkelä made changes - 2023-04-20 13:38

Fix Version/s		10.6 [ 24028 ]
Affects Version/s		10.6 [ 24028 ]

Marko Mäkelä made changes - 2023-04-20 13:38

Status

Open [ 1 ]

Confirmed [ 10101 ]

Thirunarayanan Balathandayuthapani made changes - 2023-04-25 10:34

Attachment

encryption_keys.txt [ 69747 ]

Thirunarayanan Balathandayuthapani added a comment - 2023-04-25 10:35

If we start the 10.6 server with the following command:

./sql/mysqld --basedir=$PWD --datadir=$PWD/data --socket=$PWD/data/mysql.sock --log-error=$PWD/data/mysql.log --plugin-dir=$PWD/plugin/file_key_management --file-key-management --plugin-load-add=file_key_management.so --loose-file-key-management-filename=$PWD/../encryption_keys.txt --file-key-management-encryption-algorithm=aes_cbc --innodb_page_size=32k

There is no failure in plain 10.6. This issue is related to ~~MDEV-29911~~ Patch.

Thirunarayanan Balathandayuthapani added a comment - 2023-04-25 10:35 If we start the 10.6 server with the following command: ./sql/mysqld --basedir=$PWD --datadir=$PWD/data --socket=$PWD/data/mysql.sock --log-error=$PWD/data/mysql.log --plugin-dir=$PWD/plugin/file_key_management --file-key-management --plugin-load-add=file_key_management.so --loose-file-key-management-filename=$PWD/../encryption_keys.txt --file-key-management-encryption-algorithm=aes_cbc --innodb_page_size=32k There is no failure in plain 10.6. This issue is related to MDEV-29911 Patch.

Thirunarayanan Balathandayuthapani made changes - 2023-04-25 10:35

Labels

recovery regression-10.6

recovery

Thirunarayanan Balathandayuthapani made changes - 2023-04-25 12:37

Priority

Critical [ 2 ]

Major [ 3 ]

Thirunarayanan Balathandayuthapani made changes - 2023-04-25 12:37

Summary

"InnoDB: Recovery cannot access file" on a valid data file

InnoDB Recovery doesn't display encryption message when there is no encryption configuration

Thirunarayanan Balathandayuthapani made changes - 2023-04-25 12:38

Summary

InnoDB Recovery doesn't display encryption message when there is no encryption configuration

InnoDB Recovery doesn't display encryption message when no encryption configuration passed

Marko Mäkelä added a comment - 2023-04-26 12:53

Thank you for debugging this and fixing the ~~MDEV-29911~~ code. As we discussed, it would be good to clarify the error message, to say that the page cannot be decrypted.

Marko Mäkelä added a comment - 2023-04-26 12:53 Thank you for debugging this and fixing the MDEV-29911 code. As we discussed, it would be good to clarify the error message, to say that the page cannot be decrypted.

Thirunarayanan Balathandayuthapani made changes - 2023-09-25 10:20

Status

Confirmed [ 10101 ]

In Progress [ 3 ]

Thirunarayanan Balathandayuthapani added a comment - 2023-09-26 09:13

Patch is in bb-10.6-MDEV-31851

Thirunarayanan Balathandayuthapani added a comment - 2023-09-26 09:13 Patch is in bb-10.6- MDEV-31851

Thirunarayanan Balathandayuthapani made changes - 2023-09-26 09:13

Assignee	Thirunarayanan Balathandayuthapani [ thiru ]	Marko Mäkelä [ marko ]
Status	In Progress [ 3 ]	In Review [ 10002 ]

Marko Mäkelä made changes - 2023-09-27 13:40

Description

An attempt to recover the attached [^data.tar.xz] fails:
{code:sh}
tar xJf data.tar.xz
mariadbd --innodb-page-size=32k --datadir "$(pwd)"/data
{code}
{noformat:title=10.6 7e31a8e7fa97a87fc164381588d172bf0e76abb6}
2023-04-20 16:23:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14303353,21931175
2023-04-20 16:23:45 0 [ERROR] InnoDB: Recovery cannot access file ./cool_down/t5.ibd (tablespace 28)
2023-04-20 16:23:45 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
2023-04-20 16:23:45 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1458] with error Generic error
2023-04-20 16:23:45 0 [Note] InnoDB: Starting shutdown...
2023-04-20 16:23:45 0 [ERROR] Plugin 'InnoDB' init function returned error.
{noformat}
A crude attempt at fixing this as follows caused a crash.
{code:diff}
diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index f79eb015673..0f614068afd 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -2496,23 +2496,16 @@ fil_ibd_load(
}

os_offset_t size;
- bool deferred_space = false;

/* Read and validate the first page of the tablespace.
Assign a tablespace name based on the tablespace type. */
switch (file.validate_for_recovery()) {
os_offset_t minimum_size;
case DB_SUCCESS:
- deferred_space = file.m_defer;
-
- if (deferred_space) {
- goto tablespace_check;
- }
-
- if (file.space_id() != space_id) {
+ if (!file.m_defer && file.space_id() != space_id) {
return(FIL_LOAD_ID_CHANGED);
}
-tablespace_check:
+
/* Get and test the file size. */
size = os_file_get_size(file.handle());

@@ -2528,9 +2521,7 @@ fil_ibd_load(
ib::error() << "Could not measure the size of"
" single-table tablespace file '"
<< file.filepath() << "'";
- } else if (deferred_space) {
- return FIL_LOAD_DEFER;
- } else if (size < minimum_size) {
+ } else if (!file.m_defer && size < minimum_size) {
ib::error() << "The size of tablespace file '"
<< file.filepath() << "' is only " << size
<< ", should be at least " << minimum_size
{code}
The problem is that the {{FIL_LOAD_DEFER}} return value causes InnoDB crash recovery to "think" that the file will be initialized by redo log records. The {{FIL_PAGE_LSN}} in page 0 of the file is 61,024,384, that is, after the checkpoint LSN 14,303,353. The first log record (a {{WRITE}}) that {{recv_sys_t::recover_deferred()}} would see for the page carries {{start_lsn = 79742990}} and {{lsn = 79743157}, which is even later. The deferred recovery fails because {{skip_read}} was not set for the {{recv_sys.pages}} entry.

In case the problem is actually on the 'write' side (which I doubt), you can access {{rr replay}} of the previously killed server as follows:
{code:sh}
ssh sdp
rr replay /data1/results/1681982706/TBR-1080/1/rr/mysqld-1
{code}
It starts straight after bootstrap, at LSN 41114.

An attempt to recover the attached [^data.tar.xz] fails:
{code:sh}
tar xJf data.tar.xz
mariadbd --innodb-page-size=32k --datadir "$(pwd)"/data
{code}
{noformat:title=10.6 7e31a8e7fa97a87fc164381588d172bf0e76abb6}
2023-04-20 16:23:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14303353,21931175
2023-04-20 16:23:45 0 [ERROR] InnoDB: Recovery cannot access file ./cool_down/t5.ibd (tablespace 28)
2023-04-20 16:23:45 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
2023-04-20 16:23:45 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1458] with error Generic error
2023-04-20 16:23:45 0 [Note] InnoDB: Starting shutdown...
2023-04-20 16:23:45 0 [ERROR] Plugin 'InnoDB' init function returned error.
{noformat}
A crude attempt at fixing this as follows caused a crash.
{code:diff}
diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
index f79eb015673..0f614068afd 100644
--- a/storage/innobase/fil/fil0fil.cc
+++ b/storage/innobase/fil/fil0fil.cc
@@ -2496,23 +2496,16 @@ fil_ibd_load(
}

os_offset_t size;
- bool deferred_space = false;

/* Read and validate the first page of the tablespace.
Assign a tablespace name based on the tablespace type. */
switch (file.validate_for_recovery()) {
os_offset_t minimum_size;
case DB_SUCCESS:
- deferred_space = file.m_defer;
-
- if (deferred_space) {
- goto tablespace_check;
- }
-
- if (file.space_id() != space_id) {
+ if (!file.m_defer && file.space_id() != space_id) {
return(FIL_LOAD_ID_CHANGED);
}
-tablespace_check:
+
/* Get and test the file size. */
size = os_file_get_size(file.handle());

@@ -2528,9 +2521,7 @@ fil_ibd_load(
ib::error() << "Could not measure the size of"
" single-table tablespace file '"
<< file.filepath() << "'";
- } else if (deferred_space) {
- return FIL_LOAD_DEFER;
- } else if (size < minimum_size) {
+ } else if (!file.m_defer && size < minimum_size) {
ib::error() << "The size of tablespace file '"
<< file.filepath() << "' is only " << size
<< ", should be at least " << minimum_size
{code}
The problem is that the {{FIL_LOAD_DEFER}} return value causes InnoDB crash recovery to "think" that the file will be initialized by redo log records. The {{FIL_PAGE_LSN}} in page 0 of the file is 61,024,384, that is, after the checkpoint LSN 14,303,353. The first log record (a {{WRITE}}) that {{recv_sys_t::recover_deferred()}} would see for the page carries {{start_lsn = 79742990}} and {{lsn = 79743157}}, which is even later. The deferred recovery fails because {{skip_read}} was not set for the {{recv_sys.pages}} entry.

In case the problem is actually on the 'write' side (which I doubt), you can access {{rr replay}} of the previously killed server as follows:
{code:sh}
ssh sdp
rr replay /data1/results/1681982706/TBR-1080/1/rr/mysqld-1
{code}
It starts straight after bootstrap, at LSN 41114.

Marko Mäkelä added a comment - 2023-09-28 09:46

Thank you, this is a simple fix. I’d suggest to modify the test encryption.innodb-redo-nokeys to ensure that the error message will be reported.

Marko Mäkelä added a comment - 2023-09-28 09:46 Thank you, this is a simple fix. I’d suggest to modify the test encryption.innodb-redo-nokeys to ensure that the error message will be reported.

Marko Mäkelä made changes - 2023-09-28 09:46

Assignee	Marko Mäkelä [ marko ]	Thirunarayanan Balathandayuthapani [ thiru ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Thirunarayanan Balathandayuthapani made changes - 2023-10-10 06:15

Assignee	Thirunarayanan Balathandayuthapani [ thiru ]	Marko Mäkelä [ marko ]
Status	Stalled [ 10000 ]	In Review [ 10002 ]

Marko Mäkelä added a comment - 2023-10-11 15:05

I understand that due to the somewhat nondeterministic nature of page I/O, it may be hard to write a test that would invoke source include/search_pattern_in_file.inc to ensure that the error message was output.

Can you still try to do that? Please also address my minor comments about the added function fil_crypt_check().

Marko Mäkelä added a comment - 2023-10-11 15:05 I understand that due to the somewhat nondeterministic nature of page I/O, it may be hard to write a test that would invoke source include/search_pattern_in_file.inc to ensure that the error message was output. Can you still try to do that? Please also address my minor comments about the added function fil_crypt_check() .

Marko Mäkelä made changes - 2023-10-11 15:05

Assignee	Marko Mäkelä [ marko ]	Thirunarayanan Balathandayuthapani [ thiru ]
Status	In Review [ 10002 ]	Stalled [ 10000 ]

Thirunarayanan Balathandayuthapani made changes - 2023-10-13 13:26

Fix Version/s		10.6.16 [ 29014 ]
Fix Version/s	10.6 [ 24028 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Julien Fritsch made changes - 2023-10-13 13:26

Fix Version/s		10.10.7 [ 29018 ]
Fix Version/s		10.11.6 [ 29020 ]
Fix Version/s		11.0.4 [ 29021 ]
Fix Version/s		11.1.3 [ 29023 ]

People

Assignee:: Thirunarayanan Balathandayuthapani

Reporter:: Marko Mäkelä

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2023-04-20 13:37

Updated:: 2023-10-13 13:26

Resolved:: 2023-10-13 13:26

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration