Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31098

InnoDB Recovery doesn't display encryption message when no encryption configuration passed

Details

    Description

      An attempt to recover the attached data.tar.xz fails:

      tar xJf data.tar.xz
      mariadbd --innodb-page-size=32k --datadir "$(pwd)"/data
      

      10.6 7e31a8e7fa97a87fc164381588d172bf0e76abb6

      2023-04-20 16:23:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14303353,21931175
      2023-04-20 16:23:45 0 [ERROR] InnoDB: Recovery cannot access file ./cool_down/t5.ibd (tablespace 28)
      2023-04-20 16:23:45 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
      2023-04-20 16:23:45 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1458] with error Generic error
      2023-04-20 16:23:45 0 [Note] InnoDB: Starting shutdown...
      2023-04-20 16:23:45 0 [ERROR] Plugin 'InnoDB' init function returned error.
      

      A crude attempt at fixing this as follows caused a crash.

      diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
      index f79eb015673..0f614068afd 100644
      --- a/storage/innobase/fil/fil0fil.cc
      +++ b/storage/innobase/fil/fil0fil.cc
      @@ -2496,23 +2496,16 @@ fil_ibd_load(
       	}
       
       	os_offset_t	size;
      -	bool		deferred_space = false;
       
       	/* Read and validate the first page of the tablespace.
       	Assign a tablespace name based on the tablespace type. */
       	switch (file.validate_for_recovery()) {
       		os_offset_t	minimum_size;
       	case DB_SUCCESS:
      -		deferred_space = file.m_defer;
      -
      -		if (deferred_space) {
      -			goto tablespace_check;
      -		}
      -
      -		if (file.space_id() != space_id) {
      +		if (!file.m_defer && file.space_id() != space_id) {
       			return(FIL_LOAD_ID_CHANGED);
       		}
      -tablespace_check:
      +
       		/* Get and test the file size. */
       		size = os_file_get_size(file.handle());
       
      @@ -2528,9 +2521,7 @@ fil_ibd_load(
       			ib::error() << "Could not measure the size of"
       				" single-table tablespace file '"
       				<< file.filepath() << "'";
      -		} else if (deferred_space) {
      -			return FIL_LOAD_DEFER;
      -		} else if (size < minimum_size) {
      +		} else if (!file.m_defer && size < minimum_size) {
       			ib::error() << "The size of tablespace file '"
       				<< file.filepath() << "' is only " << size
       				<< ", should be at least " << minimum_size
      

      The problem is that the FIL_LOAD_DEFER return value causes InnoDB crash recovery to "think" that the file will be initialized by redo log records. The FIL_PAGE_LSN in page 0 of the file is 61,024,384, that is, after the checkpoint LSN 14,303,353. The first log record (a WRITE) that recv_sys_t::recover_deferred() would see for the page carries start_lsn = 79742990 and lsn = 79743157, which is even later. The deferred recovery fails because skip_read was not set for the recv_sys.pages entry.

      In case the problem is actually on the 'write' side (which I doubt), you can access rr replay of the previously killed server as follows:

      ssh sdp
      rr replay /data1/results/1681982706/TBR-1080/1/rr/mysqld-1
      

      It starts straight after bootstrap, at LSN 41114.

      Attachments

        Issue Links

          Activity

            marko Marko Mäkelä created issue -
            marko Marko Mäkelä made changes -
            Field Original Value New Value
            marko Marko Mäkelä made changes -
            Fix Version/s 10.6 [ 24028 ]
            Affects Version/s 10.6 [ 24028 ]
            marko Marko Mäkelä made changes -
            Status Open [ 1 ] Confirmed [ 10101 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Attachment encryption_keys.txt [ 69747 ]

            If we start the 10.6 server with the following command:

            ./sql/mysqld --basedir=$PWD --datadir=$PWD/data --socket=$PWD/data/mysql.sock --log-error=$PWD/data/mysql.log --plugin-dir=$PWD/plugin/file_key_management --file-key-management --plugin-load-add=file_key_management.so --loose-file-key-management-filename=$PWD/../encryption_keys.txt --file-key-management-encryption-algorithm=aes_cbc --innodb_page_size=32k
            

            There is no failure in plain 10.6. This issue is related to MDEV-29911 Patch.

            thiru Thirunarayanan Balathandayuthapani added a comment - If we start the 10.6 server with the following command: ./sql/mysqld --basedir=$PWD --datadir=$PWD/data --socket=$PWD/data/mysql.sock --log-error=$PWD/data/mysql.log --plugin-dir=$PWD/plugin/file_key_management --file-key-management --plugin-load-add=file_key_management.so --loose-file-key-management-filename=$PWD/../encryption_keys.txt --file-key-management-encryption-algorithm=aes_cbc --innodb_page_size=32k There is no failure in plain 10.6. This issue is related to MDEV-29911 Patch.
            thiru Thirunarayanan Balathandayuthapani made changes -
            Labels recovery regression-10.6 recovery
            thiru Thirunarayanan Balathandayuthapani made changes -
            Priority Critical [ 2 ] Major [ 3 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Summary "InnoDB: Recovery cannot access file" on a valid data file InnoDB Recovery doesn't display encryption message when there is no encryption configuration
            thiru Thirunarayanan Balathandayuthapani made changes -
            Summary InnoDB Recovery doesn't display encryption message when there is no encryption configuration InnoDB Recovery doesn't display encryption message when no encryption configuration passed

            Thank you for debugging this and fixing the MDEV-29911 code. As we discussed, it would be good to clarify the error message, to say that the page cannot be decrypted.

            marko Marko Mäkelä added a comment - Thank you for debugging this and fixing the MDEV-29911 code. As we discussed, it would be good to clarify the error message, to say that the page cannot be decrypted.
            thiru Thirunarayanan Balathandayuthapani made changes -
            Status Confirmed [ 10101 ] In Progress [ 3 ]

            Patch is in bb-10.6-MDEV-31851

            thiru Thirunarayanan Balathandayuthapani added a comment - Patch is in bb-10.6- MDEV-31851
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            marko Marko Mäkelä made changes -
            Description An attempt to recover the attached [^data.tar.xz] fails:
            {code:sh}
            tar xJf data.tar.xz
            mariadbd --innodb-page-size=32k --datadir "$(pwd)"/data
            {code}
            {noformat:title=10.6 7e31a8e7fa97a87fc164381588d172bf0e76abb6}
            2023-04-20 16:23:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14303353,21931175
            2023-04-20 16:23:45 0 [ERROR] InnoDB: Recovery cannot access file ./cool_down/t5.ibd (tablespace 28)
            2023-04-20 16:23:45 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
            2023-04-20 16:23:45 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1458] with error Generic error
            2023-04-20 16:23:45 0 [Note] InnoDB: Starting shutdown...
            2023-04-20 16:23:45 0 [ERROR] Plugin 'InnoDB' init function returned error.
            {noformat}
            A crude attempt at fixing this as follows caused a crash.
            {code:diff}
            diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
            index f79eb015673..0f614068afd 100644
            --- a/storage/innobase/fil/fil0fil.cc
            +++ b/storage/innobase/fil/fil0fil.cc
            @@ -2496,23 +2496,16 @@ fil_ibd_load(
              }
             
              os_offset_t size;
            - bool deferred_space = false;
             
              /* Read and validate the first page of the tablespace.
              Assign a tablespace name based on the tablespace type. */
              switch (file.validate_for_recovery()) {
              os_offset_t minimum_size;
              case DB_SUCCESS:
            - deferred_space = file.m_defer;
            -
            - if (deferred_space) {
            - goto tablespace_check;
            - }
            -
            - if (file.space_id() != space_id) {
            + if (!file.m_defer && file.space_id() != space_id) {
              return(FIL_LOAD_ID_CHANGED);
              }
            -tablespace_check:
            +
              /* Get and test the file size. */
              size = os_file_get_size(file.handle());
             
            @@ -2528,9 +2521,7 @@ fil_ibd_load(
              ib::error() << "Could not measure the size of"
              " single-table tablespace file '"
              << file.filepath() << "'";
            - } else if (deferred_space) {
            - return FIL_LOAD_DEFER;
            - } else if (size < minimum_size) {
            + } else if (!file.m_defer && size < minimum_size) {
              ib::error() << "The size of tablespace file '"
              << file.filepath() << "' is only " << size
              << ", should be at least " << minimum_size
            {code}
            The problem is that the {{FIL_LOAD_DEFER}} return value causes InnoDB crash recovery to "think" that the file will be initialized by redo log records. The {{FIL_PAGE_LSN}} in page 0 of the file is 61,024,384, that is, after the checkpoint LSN 14,303,353. The first log record (a {{WRITE}}) that {{recv_sys_t::recover_deferred()}} would see for the page carries {{start_lsn = 79742990}} and {{lsn = 79743157}, which is even later. The deferred recovery fails because {{skip_read}} was not set for the {{recv_sys.pages}} entry.

            In case the problem is actually on the 'write' side (which I doubt), you can access {{rr replay}} of the previously killed server as follows:
            {code:sh}
            ssh sdp
            rr replay /data1/results/1681982706/TBR-1080/1/rr/mysqld-1
            {code}
            It starts straight after bootstrap, at LSN 41114.
            An attempt to recover the attached [^data.tar.xz] fails:
            {code:sh}
            tar xJf data.tar.xz
            mariadbd --innodb-page-size=32k --datadir "$(pwd)"/data
            {code}
            {noformat:title=10.6 7e31a8e7fa97a87fc164381588d172bf0e76abb6}
            2023-04-20 16:23:44 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=14303353,21931175
            2023-04-20 16:23:45 0 [ERROR] InnoDB: Recovery cannot access file ./cool_down/t5.ibd (tablespace 28)
            2023-04-20 16:23:45 0 [Note] InnoDB: You may set innodb_force_recovery=1 to ignore this and possibly get a corrupted database.
            2023-04-20 16:23:45 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1458] with error Generic error
            2023-04-20 16:23:45 0 [Note] InnoDB: Starting shutdown...
            2023-04-20 16:23:45 0 [ERROR] Plugin 'InnoDB' init function returned error.
            {noformat}
            A crude attempt at fixing this as follows caused a crash.
            {code:diff}
            diff --git a/storage/innobase/fil/fil0fil.cc b/storage/innobase/fil/fil0fil.cc
            index f79eb015673..0f614068afd 100644
            --- a/storage/innobase/fil/fil0fil.cc
            +++ b/storage/innobase/fil/fil0fil.cc
            @@ -2496,23 +2496,16 @@ fil_ibd_load(
              }
             
              os_offset_t size;
            - bool deferred_space = false;
             
              /* Read and validate the first page of the tablespace.
              Assign a tablespace name based on the tablespace type. */
              switch (file.validate_for_recovery()) {
              os_offset_t minimum_size;
              case DB_SUCCESS:
            - deferred_space = file.m_defer;
            -
            - if (deferred_space) {
            - goto tablespace_check;
            - }
            -
            - if (file.space_id() != space_id) {
            + if (!file.m_defer && file.space_id() != space_id) {
              return(FIL_LOAD_ID_CHANGED);
              }
            -tablespace_check:
            +
              /* Get and test the file size. */
              size = os_file_get_size(file.handle());
             
            @@ -2528,9 +2521,7 @@ fil_ibd_load(
              ib::error() << "Could not measure the size of"
              " single-table tablespace file '"
              << file.filepath() << "'";
            - } else if (deferred_space) {
            - return FIL_LOAD_DEFER;
            - } else if (size < minimum_size) {
            + } else if (!file.m_defer && size < minimum_size) {
              ib::error() << "The size of tablespace file '"
              << file.filepath() << "' is only " << size
              << ", should be at least " << minimum_size
            {code}
            The problem is that the {{FIL_LOAD_DEFER}} return value causes InnoDB crash recovery to "think" that the file will be initialized by redo log records. The {{FIL_PAGE_LSN}} in page 0 of the file is 61,024,384, that is, after the checkpoint LSN 14,303,353. The first log record (a {{WRITE}}) that {{recv_sys_t::recover_deferred()}} would see for the page carries {{start_lsn = 79742990}} and {{lsn = 79743157}}, which is even later. The deferred recovery fails because {{skip_read}} was not set for the {{recv_sys.pages}} entry.

            In case the problem is actually on the 'write' side (which I doubt), you can access {{rr replay}} of the previously killed server as follows:
            {code:sh}
            ssh sdp
            rr replay /data1/results/1681982706/TBR-1080/1/rr/mysqld-1
            {code}
            It starts straight after bootstrap, at LSN 41114.

            Thank you, this is a simple fix. I’d suggest to modify the test encryption.innodb-redo-nokeys to ensure that the error message will be reported.

            marko Marko Mäkelä added a comment - Thank you, this is a simple fix. I’d suggest to modify the test encryption.innodb-redo-nokeys to ensure that the error message will be reported.
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Assignee Thirunarayanan Balathandayuthapani [ thiru ] Marko Mäkelä [ marko ]
            Status Stalled [ 10000 ] In Review [ 10002 ]

            I understand that due to the somewhat nondeterministic nature of page I/O, it may be hard to write a test that would invoke source include/search_pattern_in_file.inc to ensure that the error message was output.

            Can you still try to do that? Please also address my minor comments about the added function fil_crypt_check().

            marko Marko Mäkelä added a comment - I understand that due to the somewhat nondeterministic nature of page I/O, it may be hard to write a test that would invoke source include/search_pattern_in_file.inc to ensure that the error message was output. Can you still try to do that? Please also address my minor comments about the added function fil_crypt_check() .
            marko Marko Mäkelä made changes -
            Assignee Marko Mäkelä [ marko ] Thirunarayanan Balathandayuthapani [ thiru ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            thiru Thirunarayanan Balathandayuthapani made changes -
            Fix Version/s 10.6.16 [ 29014 ]
            Fix Version/s 10.6 [ 24028 ]
            Resolution Fixed [ 1 ]
            Status Stalled [ 10000 ] Closed [ 6 ]
            julien.fritsch Julien Fritsch made changes -
            Fix Version/s 10.10.7 [ 29018 ]
            Fix Version/s 10.11.6 [ 29020 ]
            Fix Version/s 11.0.4 [ 29021 ]
            Fix Version/s 11.1.3 [ 29023 ]

            People

              thiru Thirunarayanan Balathandayuthapani
              marko Marko Mäkelä
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.