[MDEV-31373] innodb_undo_log_truncate=ON recovery results in a corrupted undo log Created: 2023-05-31  Updated: 2023-07-05  Resolved: 2023-06-01

Status: Closed
Project: MariaDB Server
Component/s: Backup, Storage Engine - InnoDB
Affects Version/s: 10.5, 10.6, 10.7, 10.8, 10.9, 10.10, 10.11, 11.0, 11.1
Fix Version/s: 10.5.22, 10.6.15, 10.9.8, 10.10.6, 10.11.5, 11.0.3, 11.1.2

Type: Bug Priority: Critical
Reporter: Marko Mäkelä Assignee: Marko Mäkelä
Resolution: Fixed Votes: 1
Labels: corruption, recovery, rr-profile-analyzed

Attachments: File data.tar.xz    
Issue Links:
Blocks
blocks MDEV-14795 InnoDB system tablespace cannot be sh... Closed
Relates
relates to MDEV-28840 innodb_undo_log_truncate is not crash... Closed
relates to MDEV-29999 innodb_undo_log_truncate=ON is not cr... Closed

 Description   

mleich produced rr replay traces of something that led to a corrupted database after mariadb-backup --prepare claimed to have completed successfully.

To reproduce the corruption with the attached copy of the backup data.tar.xz:

tar xJf data.tar.xz
mariabackup --prepare --target-dir "$(pwd)"/data
mariadbd --innodb-page-size=8k --datadir $(pwd)/data

The database will fail to start up as follows:

10.6 a6c0a2769663850c055e41a83b0cca9dd9ab89a3

2023-05-31 14:11:49 0 [Note] InnoDB: Opened 3 undo tablespaces
2023-05-31 14:11:49 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
2023-05-31 14:11:49 0 [ERROR] InnoDB: Trying to read 8192 bytes at 6545408 outside the bounds of the file: .//undo001
2023-05-31 14:11:49 0 [ERROR] InnoDB: File './/undo001' is corrupted
2023-05-31 14:11:49 0 [Note] InnoDB: Retry with innodb_force_recovery=5
2023-05-31 14:11:49 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1484] with error Data structure corruption
2023-05-31 14:11:49 0 [Note] InnoDB: Starting shutdown...
2023-05-31 14:11:49 0 [ERROR] Plugin 'InnoDB' init function returned error.
2023-05-31 14:11:49 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

With the following fix, the recovery will succeed:

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 2764cc79829..91e87643ab1 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -3645,10 +3645,12 @@ void recv_sys_t::apply(bool last_batch)
         if (fil_space_t *space = fil_space_get(id + srv_undo_space_id_start))
         {
           ut_ad(UT_LIST_GET_LEN(space->chain) == 1);
+          ut_ad(space->recv_size >= t.pages);
           fil_node_t *file= UT_LIST_GET_FIRST(space->chain);
           ut_ad(file->is_open());
           os_file_truncate(file->name, file->handle,
-                           os_offset_t{t.pages} << srv_page_size_shift, true);
+                           os_offset_t{space->recv_size} <<
+                           srv_page_size_shift, true);
         }
       }
     }

An analysis of the rr replay trace of mariadb-backup --prepare showed the following:

  1. space->size will be recovered as 1536 by reading the first page of undo001, at FIL_PAGE_LSN=579965, in fil_node_t::read_page0()
  2. space->recv_size will be recovered as 640, 768, …, 1536 by recv_sys_t::parse()
  3. The file undo001 will be incorrectly truncated to t.pages (640) pages by the above code (without the patch).
  4. There will be subsequent calls to write pages after the truncated size. InnoDB is always supposed to extend data files before any writes take place. The server error message is referring to page 799, on which buf_page_t::write_complete() was invoked in the trace.


 Comments   
Comment by Matthias Leich [ 2023-06-01 ]

origin/bb-10.6-MDEV-30986 cf8cbe086bbe5db33d7fc78e67e8dbc5b1c93f06 2023-05-31T17:57:25+03:00
which contains the fix for the current MDEV performed well in RQG testing.

Generated at Thu Feb 08 10:23:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.