Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-23156

[ERROR] InnoDB: Record <number1> is above rec heap top <number2> hit during restart after crash

Details

    Description

      Workflow of the test
      1. Start the server
      2. One session creates and fills several tables
      2. One session runs  again and again   ALTER TABLE `t8` FORCE 
      3. SIGKILL the server process at some point of time with ongoing 2.
      4. Attempt to restart that server which fails with
            [ERROR] InnoDB: Record <number1> is above rec heap top <number2>
       
      Snip from the server error log
      ----------------------------------------------
      [rr 59733 3756]2020-07-13 17:59:00 0 [Note] InnoDB: Starting a batch to recover 649 pages from redo log.
      [rr 59733 6404]2020-07-13 17:59:00 0 [ERROR] InnoDB: Record 125 is above rec heap top 120
      [rr 59733 6408]2020-07-13 17:59:00 0 [Note] InnoDB: Uncompressed page, stored checksum in field1 0, calculated checksums for field1: crc32 326468837, innodb 2664178674,  page type 17855 == INDEX.none 3735928559, stored checksum in field2 41467269, calculated checksums for field2: crc32 326468837, innodb 1737752431, none 3735928559,  page LSN 0 41467269, low 4 bytes of LSN at page end 638991405, page number (if stored to page already) 7, space id (if created with >= MySQL-4.1.1 and stored already) 14
       [rr 59733 6412]2020-07-13 17:59:00 0 [Note] InnoDB: Page may be an index page where index id is 43
      [rr 59733 6416]2020-07-13 17:59:00 0 [ERROR] [FATAL] InnoDB: Apparent corruption of an index page [page id: space=14, page number=7] to be written to data file. We intentionally crash the server to prevent corrupt data from ending up in data files.
      [rr 59733 6437]200713 17:59:00 [rr 59733 6439][ERROR] mysqld got signal 6 ;
       
      (rr) bt
      #0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
      #1  0x000038fd5b9648b1 in __GI_abort () at abort.c:79
      #2  0x000055c3ade1616c in ib::fatal::~fatal (this=<optimized out>, __in_chrg=<optimized out>) at storage/innobase/ut/ut0ut.cc:580
      #3  0x000055c3adea15f0 in buf_dblwr_assert_on_corrupt_block (block=block@entry=0x2b9e00e53600) at storage/innobase/buf/buf0dblwr.cc:769
      #4  0x000055c3adea16a0 in buf_dblwr_check_block (block=block@entry=0x2b9e00e53600) at storage/innobase/buf/buf0dblwr.cc:828
      #5  0x000055c3adea21b1 in buf_dblwr_flush_buffered_writes () at storage/innobase/buf/buf0dblwr.cc:944
      #6  0x000055c3adea368a in buf_dblwr_t::add_to_batch (this=0x55c3afe9a298, bpage=bpage@entry=0x2b9e00e51f80, flush=flush@entry=IORequest::FLUSH_LIST, size=<optimized out>) at storage/innobase/buf/buf0dblwr.cc:1060
      #7  0x000055c3adeb5592 in buf_flush_page (bpage=bpage@entry=0x2b9e00e51f80, flush_type=flush_type@entry=IORequest::FLUSH_LIST, space=space@entry=0x55c3afecc0a8, sync=sync@entry=false) at storage/innobase/buf/buf0flu.cc:1185
      #8  0x000055c3adeb6264 in buf_flush_try_neighbors (page_id=page_id@entry=..., flush=flush@entry=IORequest::FLUSH_LIST, n_flushed=n_flushed@entry=100, n_to_flush=n_to_flush@entry=18446744073709551614)
          at storage/innobase/buf/buf0flu.cc:1432
      #9  0x000055c3adeb710d in buf_do_flush_list_batch (min_n=min_n@entry=18446744073709551614, lsn_limit=lsn_limit@entry=18446744073709551615) at storage/innobase/buf/buf0flu.cc:1640
      #10 0x000055c3adeb755d in buf_flush_batch (lru=lru@entry=false, min_n=min_n@entry=18446744073709551614, lsn_limit=lsn_limit@entry=18446744073709551615, n=n@entry=0x7ca320410c70) at storage/innobase/buf/buf0flu.cc:1697
      #11 0x000055c3adeb75cf in buf_flush_do_batch (lru=lru@entry=false, min_n=18446744073709551614, lsn_limit=18446744073709551615, n=n@entry=0x7ca320410c70) at storage/innobase/buf/buf0flu.cc:1800
      #12 0x000055c3adeb79f3 in pc_flush_slot () at storage/innobase/buf/buf0flu.cc:2397
      #13 0x000055c3adeb7c7e in buf_flush_page_cleaner () at storage/innobase/buf/buf0flu.cc:2559
      #14 0x00006c31626036db in start_thread (arg=0x7ca320411700) at pthread_create.c:463
      #15 0x000038fd5ba45a3f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
      (rr)
       
       
      10.5.5 origin/10.5 272620399425a1b59f6565ff130970bca6d9b4f9 2020-07-11T14:04:53+00:00
       
      RQG
      git clone https://github.com/mleich1/rqg --branch experimental RQG
      origin/experimental fb64e6bde15cd4efcd466562873156ebb9f9c8b0 2020-07-10T21:35:50+02:00
       
      Simplified YY grammar (the full grammar was innodb_compression_encryption.yy)
      query:
          ALTER TABLE `t8` FORCE ;
      query_init:
          ;
      thread_connect:
          ;
       
      perl rqg.pl \                        
      --duration=100 \
      --queries=10000000 \
      --threads=1 \
      --no_mask \
      --seed=random \
      --rpl_mode=none \
      --engine=InnoDB \
      --gendata=conf/mariadb/innodb_compression_encryption.zz \
      --sqltrace \
      --mysqld=--innodb-lock-wait-timeout=50 \
      --mysqld=--loose-idle_readonly_transaction_timeout=0 \
      --mysqld=--loose_innodb_use_native_aio=0 \
      --mysqld=--connect_timeout=60 \
      --mysqld=--plugin-load-add=file_key_management.so \
      --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
      --mysqld=--file-key-management-filename=conf/mariadb/encryption_keys.txt \
      --mysqld=--lock-wait-timeout=86400 \
      --mysqld=--log-bin \
      --mysqld=--innodb_stats_persistent=off \
      --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
      --mysqld=--wait_timeout=28800 \
      --mysqld=--loose-idle_transaction_timeout=0 \
      --mysqld=--log_bin_trust_function_creators=1 \
      --mysqld=--log-output=none \
      --mysqld=--net_read_timeout=30 \
      --mysqld=--slave_net_timeout=60 \
      --mysqld=--loose-table_lock_wait_timeout=50 \
      --mysqld=--loose-idle_write_transaction_timeout=0 \
      --mysqld=--net_write_timeout=60 \
      --mysqld=--interactive_timeout=28800 \
      --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
      --reporters=Backtrace,CrashRecovery1,Deadlock1,ErrorLog \
      --validators=None \
      --grammar=<simplified grammar> \
      --workdir=<local settings> \
      --vardir=<local settings> \
      --mtr-build-thread=<local settings> \
      --basedir1=<local settings> \
      --script_debug=_nix_ \
      --rr=Server \
      --rr_options=--chaos
      

      Attachments

        Issue Links

          Activity

            I debugged a failure:

            10.5 272620399425a1b59f6565ff130970bca6d9b4f9

            [rr 59733 6404]2020-07-13 17:59:00 0 [ERROR] InnoDB: Record 125 is above rec heap top 120
            …
            Thread 1 hit Breakpoint 2, log_phys_t::apply (this=this@entry=0x2b9e018cd6b8, 
                block=..., last_offset=@0x55c3afeb7864: 1)
                at /home/mleich/10.5/storage/innobase/log/log0recv.cc:198
            198	  apply_status apply(const buf_block_t &block, uint16_t &last_offset) const
            (rr) n
            …
            208	      const byte b= *l++;
            (rr) p/x *l@6
            $1 = {0x12, 0xe, 0x7, 0xa1, 0x1, 0x0}
            (rr) c
            Continuing.
             
            Thread 1 hit Breakpoint 2, log_phys_t::apply (this=this@entry=0x2b9e018cd6d8, 
                block=..., last_offset=@0x55c3afeb7864: 24)
                at /home/mleich/10.5/storage/innobase/log/log0recv.cc:198
            198	  apply_status apply(const buf_block_t &block, uint16_t &last_offset) const
            (rr) n
            …
            208	      const byte b= *l++;
            (rr) p/x *l@13
            $2 = {0x45, 0xe, 0x7, 0x8, 0x8, 0xff, 0xb2, 0x31, 0x1, 0xb2, 0x7, 0x2b, 0x0}
            (rr) c
            Continuing.
             
            Thread 1 hit Breakpoint 2, log_phys_t::apply (this=this@entry=0x2b9e018cd700, 
                block=..., last_offset=@0x55c3afeb7864: 74)
                at /home/mleich/10.5/storage/innobase/log/log0recv.cc:198
            198	  apply_status apply(const buf_block_t &block, uint16_t &last_offset) const
            (rr) n
            …
            (rr) p/x *l@4
            $3 = {0xb2, 0x18, 0x1a, 0x0}
            

            Pretty-printing the records:
            0x12, 0xe, 0x7 = INIT_PAGE(14:7)
            0xa1, 0x1 = EXTENDED(INIT_ROW_FORMAT_DYNAMIC)
            0x0 = end of records
            0x45, 0xe, 0x7, 0x8, 0x8, 0xff = MEMSET(14:7, offset 8, 8 bytes) (initializing FIL_PAGE_PREV, FIL_PAGE_NEXT which were already initialized)
            0xb2, 0x31, 0x1 = WRITE(+0x31 bytes, 1) (PAGE_LEVEL=1)
            0xb2, 0x7, 0x2b = WRITE(+0x7 bytes, 0x2b) (PAGE_INDEX_ID=0x2b)
            0x0 = end of records
            0xb2, 0x18, 0x1a = WRITE(+0x18 bytes, 0x1a) (set the next record of the infimum record)
            0x0 = end of records

            The start is looking good. At the time of the crash, we have a sorted list of records on the page:

            od -Ax -t x1 -w13 -j 0x78 page.bin
            

            000078 10 00 11 00 0d 00 00 00 01 00 00 00 05
            000085 00 00 19 00 0d 00 00 00 4a 00 00 00 06
            000092 00 00 21 00 0d 00 00 00 93 00 00 00 08
            …
            00198e 00 0f 81 00 0d 00 00 8c df 00 00 02 0d
            00199b 00 0f 89 00 0d 00 00 8d 28 00 00 02 0e
            0019a8 00 0f 91 e6 c3 00 00 8d 71 00 00 02 0f
            0019b5 00 00 00 00 00 00 00 00 00 00 00 00 00
            *
            003ff2 00 00 00 70 00 63 02 78 bd 85 26 16 3c
            003fff 2d
            004000
            

            Based on the above output, the PAGE_HEAP_TOP should be 0x19b5 but it is 0x78 instead. Also the page directory is empty (only pointing to the infimum at 0x63 and the supremum at 0x70). The corruption is detected because the next-record pointer of the infimum record is pointing to the above chain of records. If that pointer were pointing to the page supremum, the page would be valid.

            The record list was created by the following:

            10.5 272620399425a1b59f6565ff130970bca6d9b4f9

            #0  mtr_t::write<2u, (mtr_t::write_type)0, unsigned short> (
                this=0x3dc8285539d8, block=..., ptr=0x472058a9c061, val=<optimized out>)
                at /home/mleich/10.5/storage/innobase/include/mtr0log.h:211
            #1  0x000055e1e9a1f641 in PageBulk::insertPage<(PageBulk::format)1> (
                this=this@entry=0x3dc8285539c8, rec=rec@entry=0x3dc828553fad "", 
                offsets=offsets@entry=0x3dc828553fb8)
                at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:190
            

            A little later, we do update the page header fields:

            10.5 272620399425a1b59f6565ff130970bca6d9b4f9

            #0  PageBulk::finishPage<(PageBulk::format)1> (this=this@entry=0x3dc8285539c8)
                at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:482
            #1  0x000055e1e9a1c533 in PageBulk::finish (this=0x3dc8285539c8)
                at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:507
            #2  BtrBulk::pageCommit (this=this@entry=0x3dc8280bb998, 
                page_bulk=page_bulk@entry=0x3dc8285539c8, 
                next_page_bulk=next_page_bulk@entry=0x0, insert_father=<optimized out>)
                at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:908
            #3  0x000055e1e9a1d32c in BtrBulk::finish (this=this@entry=0x3dc8280bb998, 
                err=DB_SUCCESS) at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:1157
            

            I checked, and the log_sys.flushed_to_disk_lsn advanced well beyond our mtr_t::m_commit_lsn before the server was killed.

            The problem seems to occur during the recovery itself. We are executing a multi-batch recovery, and the batch boundary happens to occur between those two PageBulk mini-transactions. This bug should be fixed by fixing MDEV-21347 (which would ensure that the index page always is in a valid state, even after the first mini-transaction).

            marko Marko Mäkelä added a comment - I debugged a failure: 10.5 272620399425a1b59f6565ff130970bca6d9b4f9 [rr 59733 6404]2020-07-13 17:59:00 0 [ERROR] InnoDB: Record 125 is above rec heap top 120 … Thread 1 hit Breakpoint 2, log_phys_t::apply (this=this@entry=0x2b9e018cd6b8, block=..., last_offset=@0x55c3afeb7864: 1) at /home/mleich/10.5/storage/innobase/log/log0recv.cc:198 198 apply_status apply(const buf_block_t &block, uint16_t &last_offset) const (rr) n … 208 const byte b= *l++; (rr) p/x *l@6 $1 = {0x12, 0xe, 0x7, 0xa1, 0x1, 0x0} (rr) c Continuing.   Thread 1 hit Breakpoint 2, log_phys_t::apply (this=this@entry=0x2b9e018cd6d8, block=..., last_offset=@0x55c3afeb7864: 24) at /home/mleich/10.5/storage/innobase/log/log0recv.cc:198 198 apply_status apply(const buf_block_t &block, uint16_t &last_offset) const (rr) n … 208 const byte b= *l++; (rr) p/x *l@13 $2 = {0x45, 0xe, 0x7, 0x8, 0x8, 0xff, 0xb2, 0x31, 0x1, 0xb2, 0x7, 0x2b, 0x0} (rr) c Continuing.   Thread 1 hit Breakpoint 2, log_phys_t::apply (this=this@entry=0x2b9e018cd700, block=..., last_offset=@0x55c3afeb7864: 74) at /home/mleich/10.5/storage/innobase/log/log0recv.cc:198 198 apply_status apply(const buf_block_t &block, uint16_t &last_offset) const (rr) n … (rr) p/x *l@4 $3 = {0xb2, 0x18, 0x1a, 0x0} Pretty-printing the records: 0x12, 0xe, 0x7 = INIT_PAGE(14:7) 0xa1, 0x1 = EXTENDED(INIT_ROW_FORMAT_DYNAMIC) 0x0 = end of records 0x45, 0xe, 0x7, 0x8, 0x8, 0xff = MEMSET(14:7, offset 8, 8 bytes) (initializing FIL_PAGE_PREV, FIL_PAGE_NEXT which were already initialized) 0xb2, 0x31, 0x1 = WRITE(+0x31 bytes, 1) (PAGE_LEVEL=1) 0xb2, 0x7, 0x2b = WRITE(+0x7 bytes, 0x2b) (PAGE_INDEX_ID=0x2b) 0x0 = end of records 0xb2, 0x18, 0x1a = WRITE(+0x18 bytes, 0x1a) (set the next record of the infimum record) 0x0 = end of records The start is looking good. At the time of the crash, we have a sorted list of records on the page: od -Ax -t x1 -w13 -j 0x78 page.bin 000078 10 00 11 00 0d 00 00 00 01 00 00 00 05 000085 00 00 19 00 0d 00 00 00 4a 00 00 00 06 000092 00 00 21 00 0d 00 00 00 93 00 00 00 08 … 00198e 00 0f 81 00 0d 00 00 8c df 00 00 02 0d 00199b 00 0f 89 00 0d 00 00 8d 28 00 00 02 0e 0019a8 00 0f 91 e6 c3 00 00 8d 71 00 00 02 0f 0019b5 00 00 00 00 00 00 00 00 00 00 00 00 00 * 003ff2 00 00 00 70 00 63 02 78 bd 85 26 16 3c 003fff 2d 004000 Based on the above output, the PAGE_HEAP_TOP should be 0x19b5 but it is 0x78 instead. Also the page directory is empty (only pointing to the infimum at 0x63 and the supremum at 0x70). The corruption is detected because the next-record pointer of the infimum record is pointing to the above chain of records. If that pointer were pointing to the page supremum, the page would be valid. The record list was created by the following: 10.5 272620399425a1b59f6565ff130970bca6d9b4f9 #0 mtr_t::write<2u, (mtr_t::write_type)0, unsigned short> ( this=0x3dc8285539d8, block=..., ptr=0x472058a9c061, val=<optimized out>) at /home/mleich/10.5/storage/innobase/include/mtr0log.h:211 #1 0x000055e1e9a1f641 in PageBulk::insertPage<(PageBulk::format)1> ( this=this@entry=0x3dc8285539c8, rec=rec@entry=0x3dc828553fad "", offsets=offsets@entry=0x3dc828553fb8) at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:190 A little later, we do update the page header fields: 10.5 272620399425a1b59f6565ff130970bca6d9b4f9 #0 PageBulk::finishPage<(PageBulk::format)1> (this=this@entry=0x3dc8285539c8) at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:482 #1 0x000055e1e9a1c533 in PageBulk::finish (this=0x3dc8285539c8) at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:507 #2 BtrBulk::pageCommit (this=this@entry=0x3dc8280bb998, page_bulk=page_bulk@entry=0x3dc8285539c8, next_page_bulk=next_page_bulk@entry=0x0, insert_father=<optimized out>) at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:908 #3 0x000055e1e9a1d32c in BtrBulk::finish (this=this@entry=0x3dc8280bb998, err=DB_SUCCESS) at /home/mleich/10.5/storage/innobase/btr/btr0bulk.cc:1157 I checked, and the log_sys.flushed_to_disk_lsn advanced well beyond our mtr_t::m_commit_lsn before the server was killed. The problem seems to occur during the recovery itself. We are executing a multi-batch recovery, and the batch boundary happens to occur between those two PageBulk mini-transactions. This bug should be fixed by fixing MDEV-21347 (which would ensure that the index page always is in a valid state, even after the first mini-transaction).

            I was no more able to replay the bug on
            origin/10.5 054f10365c49b70a3868edc0c955ffe187f8344c 2020-07-16T07:15:06+03:00
            with the simplified RQG test.

            mleich Matthias Leich added a comment - I was no more able to replay the bug on origin/10.5 054f10365c49b70a3868edc0c955ffe187f8344c 2020-07-16T07:15:06+03:00 with the simplified RQG test.

            This was fixed by MDEV-21347.

            marko Marko Mäkelä added a comment - This was fixed by MDEV-21347 .

            People

              marko Marko Mäkelä
              mleich Matthias Leich
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.