[MDEV-28168] Compressed page is read for non-compressed space during recovery Created: 2022-03-24  Updated: 2023-11-28

Status: Confirmed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6
Fix Version/s: 10.4, 10.5, 10.6

Type: Bug Priority: Major
Reporter: Vladislav Lesin Assignee: Vladislav Lesin
Resolution: Unresolved Votes: 0
Labels: None

Attachments: File page-0.bin     File page.bin    
Issue Links:
Relates
relates to MDEV-18644 Support FULL_CRC32 for compressed pages. Closed

 Description   

During MDEV-27557 testing we found the following crash:

#2  0x000055cbfd573981 in ib::fatal::~fatal (this=0x27936b9ce500, __in_chrg=<optimized out>)
    at ./storage/innobase/ut/ut0ut.cc:519                                       
#3  0x000055cbfd6257b0 in fil_page_type_validate (space=0x612000024f40, page=0x7fed98338000 "")
    at ./storage/innobase/include/fil0fil.inl:131                               
#4  0x000055cbfd626f5e in buf_page_decrypt_after_read (bpage=0x7fed97b0d1a0, node=...)
    at ./storage/innobase/buf/buf0buf.cc:472                                    
#5  0x000055cbfd6386d3 in buf_page_t::read_complete (this=0x7fed97b0d1a0, node=...)
    at ./storage/innobase/buf/buf0buf.cc:3617                                   
#6  0x000055cbfd7523ae in fil_aio_callback (request=...) at ./storage/innobase/fil/fil0fil.cc:2930

buf_page_decrypt_after_read() does decompression only if node.space->is_compressed() && buf_page_is_compressed(dst_frame, flags) condition is true. For our case node.space->is_compressed() returns false as both FSP_FLAGS_FCRC32_GET_COMPRESSED_ALGO() and FSP_FLAGS_HAS_PAGE_COMPRESSION() are false. The space flags is 0x16. What means the space is not compressed. But the 15-th bit of FIL_PAGE_TYPE is set, what means the full_crc32 page is compressed, i.e. buf_page_is_compressed(dst_frame, flags) should return true.

So we have compressed page on non-compressed space. The question is how is it possible?



 Comments   
Comment by Vladislav Lesin [ 2022-03-24 ]

In the attachment there is page itself page.bin, and the first page of the space page-0.bin.

Comment by Marko Mäkelä [ 2022-03-24 ]

I think that we need an rr replay of the server that was killed before the failed recovery.

My current hypothesis is that this is repeatable with the default innodb_checksum_algorithm=full_crc32 and a test that runs something like this:

CREATE TABLE t ENGINE=InnoDB AS SELECT * FROM seq_1_to_10000;
ALTER TABLE t page_compressed=1;
ALTER TABLE t page_compressed=0;
ALTER TABLE t page_compressed=1;
ALTER TABLE t page_compressed=0;

and kills the server during the execution of one of the ALTER TABLE statements.

Disabling page_compressed should always cause the table to be rebuilt. It should be possible to enable page_compressed=1 instantly, by just updating the tablespace flags in the first page, because a page_compressed file may contain uncompressed pages. That is apparently being violated.

Here, the first page does not identify the table as being in page_compressed=1 format, but a page is in the compressed format nevertheless.

If I remember correctly, changes to page_compressed should not be reflected by any SYS_ tables, so this could theoretically also be a failure of DDL recovery, which currently largely depends on the DB_TRX_ID fields that are read from records of SYS_TABLES, SYS_INDEXES, and SYS_COLUMNS.

Comment by Vladislav Lesin [ 2022-03-24 ]

The history if space->flags changing:

1. the space creating, the initial value is 0x16:

0x0000558cf5f0b453 in fil_space_t::create (id=281, flags=22, purpose=FIL_TYPE_TABLESPACE, crypt_data=0x6130000011c0, mode=FIL_ENCRYPTION_DEFAULT)
    at ./storage/innobase/fil/fil0fil.cc:940                                    
940             space->flags = flags;                                           
(rr) bt                                                                         
#0  0x0000558cf5f0b453 in fil_space_t::create (id=281, flags=22, purpose=FIL_TYPE_TABLESPACE, crypt_data=0x6130000011c0,
    mode=FIL_ENCRYPTION_DEFAULT) at ./storage/innobase/fil/fil0fil.cc:940       
#1  0x0000558cf5f17add in fil_ibd_load (space_id=281, filename=0x65e06169e03a "./test/t7.ibd", space=@0x7fff50b3bb50: 0x0)
    at ./storage/innobase/fil/fil0fil.cc:2647                                   
#2  0x0000558cf5a05d1b in fil_name_process (name=0x65e06169e03a "./test/t7.ibd", len=13, space_id=281, deleted=false, lsn=333147302,
    store=0x7fff50b3ed50) at ./storage/innobase/log/log0recv.cc:1201            
#3  0x0000558cf5a11458 in recv_sys_t::parse (this=0x558cf7d10340 <recv_sys>, checkpoint_lsn=324596146, store=0x7fff50b3ed50, apply=false)
    at ./storage/innobase/log/log0recv.cc:2578

2. redo record type WRITE is applied, flags value 0x76, so bits 6,7 are set, what means FSP_FLAGS_FCRC32_GET_COMPRESSED_ALGO() should return 3, i.e. the space is compressed:

Thread 1 hit Hardware watchpoint 1: -location node.space->flags                 
                                                                                
Old value = 22                                                                  
New value = 118                                                                 
fil_space_set_recv_size_and_flags (id=281, size=0, flags=118)                   
    at ./storage/innobase/fil/fil0fil.cc:1113                                   
1113      mysql_mutex_unlock(&fil_system.mutex);                                
(rr) bt                                                                         
#0  fil_space_set_recv_size_and_flags (id=281, size=0, flags=118)               
    at ./storage/innobase/fil/fil0fil.cc:1113                                   
#1  0x0000558cf5a0fd07 in recv_sys_t::parse (this=0x558cf7d10340 <recv_sys>, checkpoint_lsn=324596146, store=0x7fff50b3ed50, apply=true)
    at ./storage/innobase/log/log0recv.cc:2409                                  
#2  0x0000558cf5a1a39e in recv_scan_log_recs (store=0x7fff50b3ed50, log_block=0x7b102cbf6000 "", checkpoint_lsn=324596146, start_lsn=332460032,
    end_lsn=332591104, contiguous_lsn=0x7fff50b3ee80, group_scanned_lsn=0x558cf8a6e408 <log_sys+520>)
    at ./storage/innobase/log/log0recv.cc:3554             

3. page 0 recovery, sets space flags to 0x16 again, i.e. cleans "compressed" flags:

Old value = 118                                                                 
New value = 22                                                                  
recv_recover_page (block=0x769534a7ba00, mtr=..., p=                            
      {first = {m_id = 1206885810176}, second = {state = page_recv_t::RECV_BEING_PROCESSED, last_offset = 62, log = {head = 0x7695351d7ca8, tail = 0x769534afe6f8}}}, space=0x612000024f40, init=0x0) at ./storage/innobase/log/log0recv.cc:2744
2744                                            + frame);                       
(rr) bt                                                                         
#0  recv_recover_page (block=0x769534a7ba00, mtr=..., p=                        
      {first = {m_id = 1206885810176}, second = {state = page_recv_t::RECV_BEING_PROCESSED, last_offset = 62, log = {head = 0x7695351d7ca8, tail = 0x769534afe6f8}}}, space=0x612000024f40, init=0x0) at ./storage/innobase/log/log0recv.cc:2744
#1  0x0000558cf5a150de in recv_recover_page (space=0x612000024f40, bpage=0x769534a7ba00)
    at ./storage/innobase/log/log0recv.cc:2906                                  
#2  0x0000558cf5e014d9 in buf_page_t::read_complete (this=0x769534a7ba00, node=...)
    at ./storage/innobase/buf/buf0buf.cc:3721                                   
#3  0x0000558cf5f1a3ae in fil_aio_callback (request=...) at ./storage/innobase/fil/fil0fil.cc:2930

And then the recovery of page 67 starts and the crash happens.

The tablespace flags on disk is 0x16, i.e. the space without compression. It looks like the page was not flushed. When redo log record on (2) was parsed, the the correct space flags were set. And then, when the page 0 was read, the space flags was set from non-flushed before killing page 0, from FSP_HEADER_OFFSET + FSP_SPACE_FLAGS (0x36) offset.

It looks like the issue is in recovery itself.

Comment by Marko Mäkelä [ 2022-03-24 ]

That adjustment of fil_space_t::flags in recv_recover_page() may be redundant.

If recv_sys_t::apply() wrote "Starting a batch to recover" instead of "Starting final batch", then I think that this is pretty clear: the tablespace flags in page 0 would only be set to the final value during a later batch.

Comment by Marko Mäkelä [ 2022-03-25 ]

I suspect that this may have been broken ever since MDEV-18644. The possibly redundant and harmful assignment would be this one (10.4 version, before the code was refactored in MDEV-12353):

diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 4359fb6b308..cca94530e4c 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -1503,7 +1503,6 @@ recv_parse_or_apply_log_rec_body(
 
 				switch (offs) {
 				case FSP_HEADER_OFFSET + FSP_SPACE_FLAGS:
-					space->flags = val;
 					break;
 				case FSP_HEADER_OFFSET + FSP_SIZE:
 					space->size_in_header = val;

Possibly, we should adjust all these fil_space_t fields once when initially parsing the redo log, before starting to apply any records to any data pages. In MDEV-11556 we only did that for the FSP_SIZE, so that files could be extended to fil_space_t::recv_size before starting to access any data pages.

Generated at Thu Feb 08 09:58:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.