Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28168

Compressed page is read for non-compressed space during recovery

Details

    Description

      During MDEV-27557 testing we found the following crash:

      #2  0x000055cbfd573981 in ib::fatal::~fatal (this=0x27936b9ce500, __in_chrg=<optimized out>)
          at ./storage/innobase/ut/ut0ut.cc:519                                       
      #3  0x000055cbfd6257b0 in fil_page_type_validate (space=0x612000024f40, page=0x7fed98338000 "")
          at ./storage/innobase/include/fil0fil.inl:131                               
      #4  0x000055cbfd626f5e in buf_page_decrypt_after_read (bpage=0x7fed97b0d1a0, node=...)
          at ./storage/innobase/buf/buf0buf.cc:472                                    
      #5  0x000055cbfd6386d3 in buf_page_t::read_complete (this=0x7fed97b0d1a0, node=...)
          at ./storage/innobase/buf/buf0buf.cc:3617                                   
      #6  0x000055cbfd7523ae in fil_aio_callback (request=...) at ./storage/innobase/fil/fil0fil.cc:2930
      

      buf_page_decrypt_after_read() does decompression only if node.space->is_compressed() && buf_page_is_compressed(dst_frame, flags) condition is true. For our case node.space->is_compressed() returns false as both FSP_FLAGS_FCRC32_GET_COMPRESSED_ALGO() and FSP_FLAGS_HAS_PAGE_COMPRESSION() are false. The space flags is 0x16. What means the space is not compressed. But the 15-th bit of FIL_PAGE_TYPE is set, what means the full_crc32 page is compressed, i.e. buf_page_is_compressed(dst_frame, flags) should return true.

      So we have compressed page on non-compressed space. The question is how is it possible?

      Attachments

        1. page.bin
          32 kB
        2. page-0.bin
          32 kB

        Issue Links

          Activity

            In the attachment there is page itself page.bin, and the first page of the space page-0.bin.

            vlad.lesin Vladislav Lesin added a comment - In the attachment there is page itself page.bin , and the first page of the space page-0.bin .

            I think that we need an rr replay of the server that was killed before the failed recovery.

            My current hypothesis is that this is repeatable with the default innodb_checksum_algorithm=full_crc32 and a test that runs something like this:

            CREATE TABLE t ENGINE=InnoDB AS SELECT * FROM seq_1_to_10000;
            ALTER TABLE t page_compressed=1;
            ALTER TABLE t page_compressed=0;
            ALTER TABLE t page_compressed=1;
            ALTER TABLE t page_compressed=0;
            …
            

            and kills the server during the execution of one of the ALTER TABLE statements.

            Disabling page_compressed should always cause the table to be rebuilt. It should be possible to enable page_compressed=1 instantly, by just updating the tablespace flags in the first page, because a page_compressed file may contain uncompressed pages. That is apparently being violated.

            Here, the first page does not identify the table as being in page_compressed=1 format, but a page is in the compressed format nevertheless.

            If I remember correctly, changes to page_compressed should not be reflected by any SYS_ tables, so this could theoretically also be a failure of DDL recovery, which currently largely depends on the DB_TRX_ID fields that are read from records of SYS_TABLES, SYS_INDEXES, and SYS_COLUMNS.

            marko Marko Mäkelä added a comment - I think that we need an rr replay of the server that was killed before the failed recovery. My current hypothesis is that this is repeatable with the default innodb_checksum_algorithm=full_crc32 and a test that runs something like this: CREATE TABLE t ENGINE=InnoDB AS SELECT * FROM seq_1_to_10000; ALTER TABLE t page_compressed=1; ALTER TABLE t page_compressed=0; ALTER TABLE t page_compressed=1; ALTER TABLE t page_compressed=0; … and kills the server during the execution of one of the ALTER TABLE statements. Disabling page_compressed should always cause the table to be rebuilt. It should be possible to enable page_compressed=1 instantly, by just updating the tablespace flags in the first page, because a page_compressed file may contain uncompressed pages. That is apparently being violated. Here, the first page does not identify the table as being in page_compressed=1 format, but a page is in the compressed format nevertheless. If I remember correctly, changes to page_compressed should not be reflected by any SYS_ tables, so this could theoretically also be a failure of DDL recovery, which currently largely depends on the DB_TRX_ID fields that are read from records of SYS_TABLES , SYS_INDEXES , and SYS_COLUMNS .
            vlad.lesin Vladislav Lesin added a comment - - edited

            The history if space->flags changing:

            1. the space creating, the initial value is 0x16:

            0x0000558cf5f0b453 in fil_space_t::create (id=281, flags=22, purpose=FIL_TYPE_TABLESPACE, crypt_data=0x6130000011c0, mode=FIL_ENCRYPTION_DEFAULT)
                at ./storage/innobase/fil/fil0fil.cc:940                                    
            940             space->flags = flags;                                           
            (rr) bt                                                                         
            #0  0x0000558cf5f0b453 in fil_space_t::create (id=281, flags=22, purpose=FIL_TYPE_TABLESPACE, crypt_data=0x6130000011c0,
                mode=FIL_ENCRYPTION_DEFAULT) at ./storage/innobase/fil/fil0fil.cc:940       
            #1  0x0000558cf5f17add in fil_ibd_load (space_id=281, filename=0x65e06169e03a "./test/t7.ibd", space=@0x7fff50b3bb50: 0x0)
                at ./storage/innobase/fil/fil0fil.cc:2647                                   
            #2  0x0000558cf5a05d1b in fil_name_process (name=0x65e06169e03a "./test/t7.ibd", len=13, space_id=281, deleted=false, lsn=333147302,
                store=0x7fff50b3ed50) at ./storage/innobase/log/log0recv.cc:1201            
            #3  0x0000558cf5a11458 in recv_sys_t::parse (this=0x558cf7d10340 <recv_sys>, checkpoint_lsn=324596146, store=0x7fff50b3ed50, apply=false)
                at ./storage/innobase/log/log0recv.cc:2578
            

            2. redo record type WRITE is applied, flags value 0x76, so bits 6,7 are set, what means FSP_FLAGS_FCRC32_GET_COMPRESSED_ALGO() should return 3, i.e. the space is compressed:

            Thread 1 hit Hardware watchpoint 1: -location node.space->flags                 
                                                                                            
            Old value = 22                                                                  
            New value = 118                                                                 
            fil_space_set_recv_size_and_flags (id=281, size=0, flags=118)                   
                at ./storage/innobase/fil/fil0fil.cc:1113                                   
            1113      mysql_mutex_unlock(&fil_system.mutex);                                
            (rr) bt                                                                         
            #0  fil_space_set_recv_size_and_flags (id=281, size=0, flags=118)               
                at ./storage/innobase/fil/fil0fil.cc:1113                                   
            #1  0x0000558cf5a0fd07 in recv_sys_t::parse (this=0x558cf7d10340 <recv_sys>, checkpoint_lsn=324596146, store=0x7fff50b3ed50, apply=true)
                at ./storage/innobase/log/log0recv.cc:2409                                  
            #2  0x0000558cf5a1a39e in recv_scan_log_recs (store=0x7fff50b3ed50, log_block=0x7b102cbf6000 "", checkpoint_lsn=324596146, start_lsn=332460032,
                end_lsn=332591104, contiguous_lsn=0x7fff50b3ee80, group_scanned_lsn=0x558cf8a6e408 <log_sys+520>)
                at ./storage/innobase/log/log0recv.cc:3554             
            

            3. page 0 recovery, sets space flags to 0x16 again, i.e. cleans "compressed" flags:

            Old value = 118                                                                 
            New value = 22                                                                  
            recv_recover_page (block=0x769534a7ba00, mtr=..., p=                            
                  {first = {m_id = 1206885810176}, second = {state = page_recv_t::RECV_BEING_PROCESSED, last_offset = 62, log = {head = 0x7695351d7ca8, tail = 0x769534afe6f8}}}, space=0x612000024f40, init=0x0) at ./storage/innobase/log/log0recv.cc:2744
            2744                                            + frame);                       
            (rr) bt                                                                         
            #0  recv_recover_page (block=0x769534a7ba00, mtr=..., p=                        
                  {first = {m_id = 1206885810176}, second = {state = page_recv_t::RECV_BEING_PROCESSED, last_offset = 62, log = {head = 0x7695351d7ca8, tail = 0x769534afe6f8}}}, space=0x612000024f40, init=0x0) at ./storage/innobase/log/log0recv.cc:2744
            #1  0x0000558cf5a150de in recv_recover_page (space=0x612000024f40, bpage=0x769534a7ba00)
                at ./storage/innobase/log/log0recv.cc:2906                                  
            #2  0x0000558cf5e014d9 in buf_page_t::read_complete (this=0x769534a7ba00, node=...)
                at ./storage/innobase/buf/buf0buf.cc:3721                                   
            #3  0x0000558cf5f1a3ae in fil_aio_callback (request=...) at ./storage/innobase/fil/fil0fil.cc:2930
            

            And then the recovery of page 67 starts and the crash happens.

            The tablespace flags on disk is 0x16, i.e. the space without compression. It looks like the page was not flushed. When redo log record on (2) was parsed, the the correct space flags were set. And then, when the page 0 was read, the space flags was set from non-flushed before killing page 0, from FSP_HEADER_OFFSET + FSP_SPACE_FLAGS (0x36) offset.

            It looks like the issue is in recovery itself.

            vlad.lesin Vladislav Lesin added a comment - - edited The history if space->flags changing: 1. the space creating, the initial value is 0x16: 0x0000558cf5f0b453 in fil_space_t::create (id=281, flags=22, purpose=FIL_TYPE_TABLESPACE, crypt_data=0x6130000011c0, mode=FIL_ENCRYPTION_DEFAULT) at ./storage/innobase/fil/fil0fil.cc:940 940 space->flags = flags; (rr) bt #0 0x0000558cf5f0b453 in fil_space_t::create (id=281, flags=22, purpose=FIL_TYPE_TABLESPACE, crypt_data=0x6130000011c0, mode=FIL_ENCRYPTION_DEFAULT) at ./storage/innobase/fil/fil0fil.cc:940 #1 0x0000558cf5f17add in fil_ibd_load (space_id=281, filename=0x65e06169e03a "./test/t7.ibd", space=@0x7fff50b3bb50: 0x0) at ./storage/innobase/fil/fil0fil.cc:2647 #2 0x0000558cf5a05d1b in fil_name_process (name=0x65e06169e03a "./test/t7.ibd", len=13, space_id=281, deleted=false, lsn=333147302, store=0x7fff50b3ed50) at ./storage/innobase/log/log0recv.cc:1201 #3 0x0000558cf5a11458 in recv_sys_t::parse (this=0x558cf7d10340 <recv_sys>, checkpoint_lsn=324596146, store=0x7fff50b3ed50, apply=false) at ./storage/innobase/log/log0recv.cc:2578 2. redo record type WRITE is applied, flags value 0x76, so bits 6,7 are set, what means FSP_FLAGS_FCRC32_GET_COMPRESSED_ALGO() should return 3, i.e. the space is compressed: Thread 1 hit Hardware watchpoint 1: -location node.space->flags Old value = 22 New value = 118 fil_space_set_recv_size_and_flags (id=281, size=0, flags=118) at ./storage/innobase/fil/fil0fil.cc:1113 1113 mysql_mutex_unlock(&fil_system.mutex); (rr) bt #0 fil_space_set_recv_size_and_flags (id=281, size=0, flags=118) at ./storage/innobase/fil/fil0fil.cc:1113 #1 0x0000558cf5a0fd07 in recv_sys_t::parse (this=0x558cf7d10340 <recv_sys>, checkpoint_lsn=324596146, store=0x7fff50b3ed50, apply=true) at ./storage/innobase/log/log0recv.cc:2409 #2 0x0000558cf5a1a39e in recv_scan_log_recs (store=0x7fff50b3ed50, log_block=0x7b102cbf6000 "", checkpoint_lsn=324596146, start_lsn=332460032, end_lsn=332591104, contiguous_lsn=0x7fff50b3ee80, group_scanned_lsn=0x558cf8a6e408 <log_sys+520>) at ./storage/innobase/log/log0recv.cc:3554 3. page 0 recovery, sets space flags to 0x16 again, i.e. cleans "compressed" flags: Old value = 118 New value = 22 recv_recover_page (block=0x769534a7ba00, mtr=..., p= {first = {m_id = 1206885810176}, second = {state = page_recv_t::RECV_BEING_PROCESSED, last_offset = 62, log = {head = 0x7695351d7ca8, tail = 0x769534afe6f8}}}, space=0x612000024f40, init=0x0) at ./storage/innobase/log/log0recv.cc:2744 2744 + frame); (rr) bt #0 recv_recover_page (block=0x769534a7ba00, mtr=..., p= {first = {m_id = 1206885810176}, second = {state = page_recv_t::RECV_BEING_PROCESSED, last_offset = 62, log = {head = 0x7695351d7ca8, tail = 0x769534afe6f8}}}, space=0x612000024f40, init=0x0) at ./storage/innobase/log/log0recv.cc:2744 #1 0x0000558cf5a150de in recv_recover_page (space=0x612000024f40, bpage=0x769534a7ba00) at ./storage/innobase/log/log0recv.cc:2906 #2 0x0000558cf5e014d9 in buf_page_t::read_complete (this=0x769534a7ba00, node=...) at ./storage/innobase/buf/buf0buf.cc:3721 #3 0x0000558cf5f1a3ae in fil_aio_callback (request=...) at ./storage/innobase/fil/fil0fil.cc:2930 And then the recovery of page 67 starts and the crash happens. The tablespace flags on disk is 0x16, i.e. the space without compression. It looks like the page was not flushed. When redo log record on (2) was parsed, the the correct space flags were set. And then, when the page 0 was read, the space flags was set from non-flushed before killing page 0, from FSP_HEADER_OFFSET + FSP_SPACE_FLAGS (0x36) offset. It looks like the issue is in recovery itself.

            That adjustment of fil_space_t::flags in recv_recover_page() may be redundant.

            If recv_sys_t::apply() wrote "Starting a batch to recover" instead of "Starting final batch", then I think that this is pretty clear: the tablespace flags in page 0 would only be set to the final value during a later batch.

            marko Marko Mäkelä added a comment - That adjustment of fil_space_t::flags in recv_recover_page() may be redundant. If recv_sys_t::apply() wrote "Starting a batch to recover" instead of "Starting final batch", then I think that this is pretty clear: the tablespace flags in page 0 would only be set to the final value during a later batch.

            I suspect that this may have been broken ever since MDEV-18644. The possibly redundant and harmful assignment would be this one (10.4 version, before the code was refactored in MDEV-12353):

            diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
            index 4359fb6b308..cca94530e4c 100644
            --- a/storage/innobase/log/log0recv.cc
            +++ b/storage/innobase/log/log0recv.cc
            @@ -1503,7 +1503,6 @@ recv_parse_or_apply_log_rec_body(
             
             				switch (offs) {
             				case FSP_HEADER_OFFSET + FSP_SPACE_FLAGS:
            -					space->flags = val;
             					break;
             				case FSP_HEADER_OFFSET + FSP_SIZE:
             					space->size_in_header = val;
            

            Possibly, we should adjust all these fil_space_t fields once when initially parsing the redo log, before starting to apply any records to any data pages. In MDEV-11556 we only did that for the FSP_SIZE, so that files could be extended to fil_space_t::recv_size before starting to access any data pages.

            marko Marko Mäkelä added a comment - I suspect that this may have been broken ever since MDEV-18644 . The possibly redundant and harmful assignment would be this one (10.4 version, before the code was refactored in MDEV-12353 ): diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc index 4359fb6b308..cca94530e4c 100644 --- a/storage/innobase/log/log0recv.cc +++ b/storage/innobase/log/log0recv.cc @@ -1503,7 +1503,6 @@ recv_parse_or_apply_log_rec_body( switch (offs) { case FSP_HEADER_OFFSET + FSP_SPACE_FLAGS: - space->flags = val; break; case FSP_HEADER_OFFSET + FSP_SIZE: space->size_in_header = val; Possibly, we should adjust all these fil_space_t fields once when initially parsing the redo log, before starting to apply any records to any data pages. In MDEV-11556 we only did that for the FSP_SIZE , so that files could be extended to fil_space_t::recv_size before starting to access any data pages.

            MDEV-34830 could have changed some of this logic, and MDEV-35226 might be related to this. Also MDEV-34898 may change something related to this.

            marko Marko Mäkelä added a comment - MDEV-34830 could have changed some of this logic, and MDEV-35226 might be related to this. Also MDEV-34898 may change something related to this.

            People

              vlad.lesin Vladislav Lesin
              vlad.lesin Vladislav Lesin
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.