Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-29374

Frequent "Data structure corruption" in InnoDB after OOM

Details

    Description

      My company has around 200 CentOS 7 servers running MariaDB 10.6. Last week, after the 10.6.9 update was applied, we started seeing lots of InnoDB failures after an OOM had killed MariaDB.

      Our systems are kind of light on memory and do hit OOM's sometimes but it shouldn't cause a failure in InnoDB recovery. When 10.6.9 was updated, we started seeing multiple failures per day. I rolled back to 10.6.8 yesterday and have not seen any more issues. So I think something in 10.6.9 is causing a problem.

      Each time the issue happens, the error log looks like this.

      2022-08-21 10:41:34 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2022-08-21 10:41:34 0 [Note] InnoDB: Number of pools: 1
      2022-08-21 10:41:34 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2022-08-21 10:41:34 0 [Note] InnoDB: Using Linux native AIO
      2022-08-21 10:41:34 0 [Note] InnoDB: Initializing buffer pool, total size = 2550136832, chunk size = 134217728
      2022-08-21 10:41:34 0 [Note] InnoDB: Completed initialization of buffer pool
      2022-08-21 10:41:34 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1383270444633,1383271021489
      2022-08-21 10:41:36 0 [Note] InnoDB: Starting final batch to recover 15643 pages from redo log.
      2022-08-21 10:41:37 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
      2022-08-21 10:41:37 0 [Note] InnoDB: Starting shutdown...
      2022-08-21 10:41:38 0 [ERROR] Plugin 'InnoDB' init function returned error.
      2022-08-21 10:41:38 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
      2022-08-21 10:41:38 0 [Note] Plugin 'FEEDBACK' is disabled.
      2022-08-21 10:41:38 0 [ERROR] Unknown/unsupported storage engine: InnoDB
      2022-08-21 10:41:38 0 [ERROR] Aborting
      

      There isn't anything unusual about our config. Here is one of them.

      default-storage-engine=InnoDB
      innodb_file_per_table=1
      innodb_buffer_pool_size=2400M
      innodb_log_file_size=600M
      innodb_strict_mode = 0
      innodb_use_native_aio = 1
      innodb_write_io_threads = 8
      innodb_read_io_threads = 8
      query_cache_type=0
      open_files_limit=50000
      thread_cache_size=4
      join_buffer_size=1024K
      tmp_table_size=30M
      max_heap_table_size=30M
      sql_mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
      table_definition_cache=14000
      

      Fortunately, it's very easy to recover from this but it does take manual intervention to do. i.e. innodb_force_recovery=1

      Any ideas what could be causing this new issue and what we can do to correct it?

      Attachments

        Issue Links

          Activity

            I am reopening this, because mleich provided an rr replay trace of something that reproduces exactly this message, even when the fix of MDEV-29383 is present.

            What I found out so far:

            • The problematic page 10:255 is an index page that was freed near LSN 0x40296d.
            • The first recovered record for the page at LSN 0x3fee0f fails to apply because the page was read back as all-zero.
            • I did not check yet if a FREE_PAGE record for the page had been written or if it was ignored by recovery.
            marko Marko Mäkelä added a comment - I am reopening this, because mleich provided an rr replay trace of something that reproduces exactly this message, even when the fix of MDEV-29383 is present. What I found out so far: The problematic page 10:255 is an index page that was freed near LSN 0x40296d. The first recovered record for the page at LSN 0x3fee0f fails to apply because the page was read back as all-zero. I did not check yet if a FREE_PAGE record for the page had been written or if it was ignored by recovery.

            It turns out that we fail to write a FREE_PAGE record (it can only be written by invoking mtr_t::free()) for the following operation:

            bb-10.6-MDEV-19229 23ab688cec4b3031262bc93921592a04749a4341

            #2  0x0000564aad8eacef in buf_page_t::set_freed (this=0x7fb1125e7120, 
                prev_state=1610612739, count=0)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/include/buf0buf.h:741
            #3  0x0000564aadca9fc5 in buf_page_free (space=0x612000041740, page=255, 
                mtr=0x7fb10f496ae0)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/buf/buf0buf.cc:2130
            #4  0x0000564aadc00c2d in btr_page_free (index=0x6160008ba208, 
                block=0x7fb1125e7120, mtr=0x7fb10f496ae0, blob=false, space_latched=false)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:657
            #5  0x0000564aadc186da in btr_compress (cursor=0x61a00000a9a8, adjust=false, 
                mtr=0x7fb10f496ae0)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:3964
            #6  0x0000564aadc68034 in btr_cur_compress_if_useful (cursor=0x61a00000a9a8, 
                adjust=false, mtr=0x7fb10f496ae0)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0cur.cc:5406
            #7  0x0000564aadc6b3d8 in btr_cur_pessimistic_delete (err=0x7fb10f4967d0, 
                has_reserved_extents=0, cursor=0x61a00000a9a8, flags=0, rollback=false, 
                mtr=0x7fb10f496ae0)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0cur.cc:5863
            #8  0x0000564aadac19fa in row_purge_remove_clust_if_poss_low (
                node=0x61a00000a908, mode=522)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/row/row0purge.cc:220
            

            Due to the missing record for freeing the page, the recovery will attempt to load the page and fail, as follows.

            2022-08-29  9:26:51 0 [Note] InnoDB: Starting final batch to recover 143 pages from redo log.
            2022-08-29  9:26:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1555] with error Data structure corruption
            

            I must make the failure diagnostics more verbose; it was probably "cleaned up" too much in MDEV-13542.

            The root cause of this failure does not look like a recent regression. I think that it will affect 10.5 as well. 10.5 has a compatible log format, and it would crash on this corrupted data directory:

            10.5 29fa9bcee01cf5457c096bf37bb25c84ddee5a30

            2022-08-31 11:37:26 0 [Note] InnoDB: Starting final batch to recover 143 pages from redo log.
            mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:270: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.
            

            marko Marko Mäkelä added a comment - It turns out that we fail to write a FREE_PAGE record (it can only be written by invoking mtr_t::free() ) for the following operation: bb-10.6-MDEV-19229 23ab688cec4b3031262bc93921592a04749a4341 #2 0x0000564aad8eacef in buf_page_t::set_freed (this=0x7fb1125e7120, prev_state=1610612739, count=0) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/include/buf0buf.h:741 #3 0x0000564aadca9fc5 in buf_page_free (space=0x612000041740, page=255, mtr=0x7fb10f496ae0) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/buf/buf0buf.cc:2130 #4 0x0000564aadc00c2d in btr_page_free (index=0x6160008ba208, block=0x7fb1125e7120, mtr=0x7fb10f496ae0, blob=false, space_latched=false) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:657 #5 0x0000564aadc186da in btr_compress (cursor=0x61a00000a9a8, adjust=false, mtr=0x7fb10f496ae0) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:3964 #6 0x0000564aadc68034 in btr_cur_compress_if_useful (cursor=0x61a00000a9a8, adjust=false, mtr=0x7fb10f496ae0) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0cur.cc:5406 #7 0x0000564aadc6b3d8 in btr_cur_pessimistic_delete (err=0x7fb10f4967d0, has_reserved_extents=0, cursor=0x61a00000a9a8, flags=0, rollback=false, mtr=0x7fb10f496ae0) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0cur.cc:5863 #8 0x0000564aadac19fa in row_purge_remove_clust_if_poss_low ( node=0x61a00000a908, mode=522) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/row/row0purge.cc:220 Due to the missing record for freeing the page, the recovery will attempt to load the page and fail, as follows. 2022-08-29 9:26:51 0 [Note] InnoDB: Starting final batch to recover 143 pages from redo log. 2022-08-29 9:26:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1555] with error Data structure corruption I must make the failure diagnostics more verbose; it was probably "cleaned up" too much in MDEV-13542 . The root cause of this failure does not look like a recent regression. I think that it will affect 10.5 as well. 10.5 has a compatible log format, and it would crash on this corrupted data directory: 10.5 29fa9bcee01cf5457c096bf37bb25c84ddee5a30 2022-08-31 11:37:26 0 [Note] InnoDB: Starting final batch to recover 143 pages from redo log. mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:270: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.

            The root cause turns out to be that a write of a FREE_PAGE record is being skipped due to an early return here:

            #0  fseg_free_page_low (seg_inode=0x7fb112adc0f2 "", iblock=0x7fb1125ebe50, 
                space=0x612000041740, offset=255, mtr=0x7fb10f496ae0, ahi=false)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/fsp/fsp0fsp.cc:2596
            #1  0x0000564aaddfcef7 in fseg_free_page (seg_header=0x7fb112ae004a "", 
                space=0x612000041740, offset=255, mtr=0x7fb10f496ae0, have_latch=false)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/fsp/fsp0fsp.cc:2629
            #2  0x0000564aadc00b9e in btr_page_free (index=0x6160008ba208, 
                block=0x7fb1125e7120, mtr=0x7fb10f496ae0, blob=false, space_latched=false)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:650
            #3  0x0000564aadc186da in btr_compress (cursor=0x61a00000a9a8, adjust=false, 
                mtr=0x7fb10f496ae0)
                at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:3964
            

            The culprit is an accidentally added extra return err; statement. (Only in case of an error, we should not write the FREE_PAGE record.)

            		if (UNIV_UNLIKELY(err != DB_SUCCESS)) {
            			return err;
            		}
            		return err;
            	}
             
            	mtr->free(*space, static_cast<uint32_t>(offset));
            	return DB_SUCCESS;
            }
            

            marko Marko Mäkelä added a comment - The root cause turns out to be that a write of a FREE_PAGE record is being skipped due to an early return here: #0 fseg_free_page_low (seg_inode=0x7fb112adc0f2 "", iblock=0x7fb1125ebe50, space=0x612000041740, offset=255, mtr=0x7fb10f496ae0, ahi=false) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/fsp/fsp0fsp.cc:2596 #1 0x0000564aaddfcef7 in fseg_free_page (seg_header=0x7fb112ae004a "", space=0x612000041740, offset=255, mtr=0x7fb10f496ae0, have_latch=false) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/fsp/fsp0fsp.cc:2629 #2 0x0000564aadc00b9e in btr_page_free (index=0x6160008ba208, block=0x7fb1125e7120, mtr=0x7fb10f496ae0, blob=false, space_latched=false) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:650 #3 0x0000564aadc186da in btr_compress (cursor=0x61a00000a9a8, adjust=false, mtr=0x7fb10f496ae0) at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:3964 The culprit is an accidentally added extra return err; statement. (Only in case of an error, we should not write the FREE_PAGE record.) if (UNIV_UNLIKELY(err != DB_SUCCESS)) { return err; } return err; }   mtr-> free (*space, static_cast <uint32_t>(offset)); return DB_SUCCESS; }

            As part of fixing this, I am going to add some error messages to identify the corrupted page:

            innodb.corrupted_during_recovery 'innodb' w20 [ fail ]  Found warnings/errors in server log file!
                    Test ended at 2022-08-31 13:12:59
            line
            2022-08-31 13:12:56 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=3]; set innodb_force_recovery to ignore
            2022-08-31 13:12:58 0 [Warning] InnoDB: Discarding log for corrupted page [page id: space=5, page number=3]
            

            marko Marko Mäkelä added a comment - As part of fixing this, I am going to add some error messages to identify the corrupted page: innodb.corrupted_during_recovery 'innodb' w20 [ fail ] Found warnings/errors in server log file! Test ended at 2022-08-31 13:12:59 line 2022-08-31 13:12:56 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=3]; set innodb_force_recovery to ignore 2022-08-31 13:12:58 0 [Warning] InnoDB: Discarding log for corrupted page [page id: space=5, page number=3]

            Summary: a crash shortly after a page merge (can be triggered by an UPDATE or DELETE or, say, rollback of an INSERT) can cause data corruption.

            serg Sergei Golubchik added a comment - Summary: a crash shortly after a page merge (can be triggered by an UPDATE or DELETE or, say, rollback of an INSERT) can cause data corruption.

            People

              marko Marko Mäkelä
              wk_bradp Brad
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.