[MDEV-13471] Test failure on innodb.log_file_size,4k Created: 2017-08-08  Updated: 2021-05-13  Resolved: 2017-08-31

Status: Closed
Project: MariaDB Server
Component/s: Galera, Storage Engine - InnoDB, Storage Engine - XtraDB
Affects Version/s: 10.1, 10.2
Fix Version/s: 10.1.27, 10.0.32-galera, 10.2.9, 10.3.2

Type: Bug Priority: Critical
Reporter: Jan Lindström (Inactive) Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None


 Description   

WSREP XID is stored in the TRX_SYS page at innodb_page_size-3500, which overlaps with rseg undo slots, causing the following kind of error:

170808 12:51:17 [ERROR] InnoDB: Unable to open undo tablespace './undo30579'.
170808 12:51:17 [ERROR] Plugin 'InnoDB' init function returned error.
170808 12:51:17 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.

In debug build this could also cause innodb-alter-tempfile,4k,innodb_plugin:

2017-08-08 12:56:39 7f1f4b3edf40  InnoDB: Assertion failure in thread 139772383125312 in file buf0buf.cc line 2635
InnoDB: Failing assertion: (!((zip_size) & ((zip_size) - 1)))

For reference see:
https://github.com/codership/galera/issues/398



 Comments   
Comment by Jan Lindström (Inactive) [ 2017-08-09 ]

https://github.com/MariaDB/server/commit/b4a4e79865399a443abc6a54c809eaf60d460fee

This does not really require Galera knowledge.

Comment by Marko Mäkelä [ 2017-08-10 ]

The problem is that with innodb_page_size=4k, the Galera WSREP XID would be written to byte offset 4096-3500, overwriting rollback segment slots in a TRX_SYS page.
The first affected slot would be partly overwritten with a "wsre" tag:

#define TRX_SYS_WSREP_XID_MAGIC_N 0x77737265

In the first affected slot, the least significant 16 bits of TRX_SYS_RSEG_SPACE would be overwritten with 0x7773, and the most significant 16 bits of TRX_SYS_RSEG_PAGE_NO would be overwritten with 0x7265.

The default value of innodb_undo_logs (or innodb_rollback_segments) always was 128 in MariaDB Server 10.x. It might have been possible to avoid this collision when using a non-default smaller value for that parameter. But, by default, Galera would never work with innodb_page_size=4k due to this collision.

Note: In slot 0, we should always have the TRX_SYS_RSEG_SPACE:TRX_SYS_RSEG_PAGE_NO pair 0:6. In slots 1 to 127, if the space_id is 0, the page number would be allocated from the system tablespace. If the database was used for a long time with innodb_undo_logs=1 (or if it was originally created before this parameter was introduced), it is possible that the subsequent rollback segment slots would have very high page numbers.

In slots 1 to 127, if the space_id is not 0, the page number should always be 3, because the page is allocated straight after the undo tablespace creation.

Originally, the undo tablespace ID would always be between 0 and 127. Starting with MySQL 5.6.36 which introduced
Bug #25551311 BACKPORT BUG #23517560 REMOVE SPACE_ID RESTRICTION FOR UNDO TABLESPACES
(merged to MariaDB 10.0.31)
it is possible for an undo tablespace ID to be 0x7773. But in this case, the page number should be 3, not 0x72650003.

This is just the first collision. The WSREP XID data would overwrite subsequent slots.
Because it looks like Galera never really worked with innodb_page_size=4k, we can simply move the WSREP XID data fields to a safe place.

Comment by Marko Mäkelä [ 2017-08-10 ]

Please simplify the patch. We do not need to deal with 2 locations of the WSREP XID information when using innodb_page_size=4k.

Comment by Jan Lindström (Inactive) [ 2017-08-10 ]

5.5-galera not effected.

Comment by Jan Lindström (Inactive) [ 2017-08-22 ]

10.0.32-galera:

commit 391b1af0fbb9723ce768d0b06830865fa983a8dd
Author: Jan Lindström <jan.lindstrom@mariadb.com>
Date:   Thu Aug 10 13:09:27 2017 +0300
 
    MDEV-13471: Test failure on innodb.log_file_size,4k
    
    Problem was that 4k page size is not really supported in
    Galera. For reference see:
            codership/galera#398
    
    Page size 4k is problematic because WSREP XID info location
    that was set to constant UNIV_PAGE_SIZE - 3500 and that is conflicting
    with rseg undo slots location if there is lot of undo tablespaces.
    Undo tablespace identifiers and page numbers require
    at least 126*8=1024 bytes starting from offset 56. Therefore,
    WSREP XID startig from offset 596 would overwrite several
    space_id,page_no pairs starting from 72th undo log tablespace
    space_id,page_no pair at offset 594.
    This will cause InnoDB startup failure seen as
    [ERROR] InnoDB: Unable to open undo tablespace './undo30579'.
    
    Originally, the undo tablespace ID would always be between
    0 and 127. Starting with MySQL 5.6.36 which introduced
    Bug #25551311 BACKPORT BUG #23517560 REMOVE SPACE_ID RESTRICTION
    FOR UNDO TABLESPACES (merged to MariaDB 10.0.31)
    it is possible for an undo tablespace ID to be 0x7773. But in
    this case, the page number should be 3, not 0x72650003.
    This is just the first collision. The WSREP XID data would
    overwrite subsequent slots.
    
    trx0sys.h
    
    trx0sys.cc
            Code formatting and add comments.

Generated at Thu Feb 08 08:05:51 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.