[MDEV-29374] Frequent "Data structure corruption" in InnoDB after OOM Created: 2022-08-24  Updated: 2023-03-23  Resolved: 2022-09-01

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.1
Fix Version/s: 10.6.10, 10.7.6, 10.8.5, 10.9.3, 10.10.2

Type: Bug Priority: Blocker
Reporter: Brad Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None
Environment:

CentOS 7


Issue Links:
Blocks
Duplicate
is duplicated by MDEV-28784 mariadb-backup prepare fails with srv... Closed
Problem/Incident
is caused by MDEV-13542 Crashing on a corrupted page is unhel... Closed
Relates
relates to MDEV-29364 CRITICAL - MariaDB 10.8.4 creating co... Closed
relates to MDEV-29511 "Page read from tablespace is corrupt... Closed

 Description   

My company has around 200 CentOS 7 servers running MariaDB 10.6. Last week, after the 10.6.9 update was applied, we started seeing lots of InnoDB failures after an OOM had killed MariaDB.

Our systems are kind of light on memory and do hit OOM's sometimes but it shouldn't cause a failure in InnoDB recovery. When 10.6.9 was updated, we started seeing multiple failures per day. I rolled back to 10.6.8 yesterday and have not seen any more issues. So I think something in 10.6.9 is causing a problem.

Each time the issue happens, the error log looks like this.

2022-08-21 10:41:34 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2022-08-21 10:41:34 0 [Note] InnoDB: Number of pools: 1
2022-08-21 10:41:34 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-08-21 10:41:34 0 [Note] InnoDB: Using Linux native AIO
2022-08-21 10:41:34 0 [Note] InnoDB: Initializing buffer pool, total size = 2550136832, chunk size = 134217728
2022-08-21 10:41:34 0 [Note] InnoDB: Completed initialization of buffer pool
2022-08-21 10:41:34 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1383270444633,1383271021489
2022-08-21 10:41:36 0 [Note] InnoDB: Starting final batch to recover 15643 pages from redo log.
2022-08-21 10:41:37 0 [ERROR] InnoDB: Plugin initialization aborted with error Data structure corruption
2022-08-21 10:41:37 0 [Note] InnoDB: Starting shutdown...
2022-08-21 10:41:38 0 [ERROR] Plugin 'InnoDB' init function returned error.
2022-08-21 10:41:38 0 [ERROR] Plugin 'InnoDB' registration as a STORAGE ENGINE failed.
2022-08-21 10:41:38 0 [Note] Plugin 'FEEDBACK' is disabled.
2022-08-21 10:41:38 0 [ERROR] Unknown/unsupported storage engine: InnoDB
2022-08-21 10:41:38 0 [ERROR] Aborting

There isn't anything unusual about our config. Here is one of them.

default-storage-engine=InnoDB
innodb_file_per_table=1
innodb_buffer_pool_size=2400M
innodb_log_file_size=600M
innodb_strict_mode = 0
innodb_use_native_aio = 1
innodb_write_io_threads = 8
innodb_read_io_threads = 8
query_cache_type=0
open_files_limit=50000
thread_cache_size=4
join_buffer_size=1024K
tmp_table_size=30M
max_heap_table_size=30M
sql_mode="NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
table_definition_cache=14000

Fortunately, it's very easy to recover from this but it does take manual intervention to do. i.e. innodb_force_recovery=1

Any ideas what could be causing this new issue and what we can do to correct it?



 Comments   
Comment by Marko Mäkelä [ 2022-08-25 ]

Error handling was refactored in MDEV-13542 and in a few related tickets. The aim was that data corruption should no longer crash the server but result in errors being reported.

Comment by Marko Mäkelä [ 2022-08-25 ]

wk_bradp, can you provide a copy of a data directory on which MariaDB Server 10.6.9 fails to start up, and 10.6.8 does start up?

Comment by Marko Mäkelä [ 2022-08-25 ]

MDEV-29383 is a crash recovery bug that was introduced in MariaDB Server 10.6.9. It could potentially explain this.

Comment by Brad [ 2022-08-25 ]

Thanks for looking into this, Marko! I don't think I can provide a copy of the data directory for any of these servers because it does contain customer data. I haven't tried to reproduce this issue in a testing environment yet. But I can say that it started with 10.6.9 and hasn't happened again since the rollback to 10.6.8.

I can try to run some tests to try to reproduce the crash if needed but it does sound like that crash recovery bug you mentioned might be related.

Also, just to be clear on the order of things. I didn't downgrade to fix the data corruption, I used innodb_force_recovery=1 to ignore the problem. Only after that did I downgrade to 10.6.8. Meaning, I didn't downgrade to get it to start up, I downgraded to prevent the corruption from happening again. Just wanted to make that clear but let me know if it's not

Also, let me know what you need from me and I'll be happy to provide it.

Comment by Marko Mäkelä [ 2022-08-26 ]

wk_bradp, thank you, and sorry for introducing MDEV-29383. I am now convinced that it can have caused your recovery trouble. Specifically, since your corruption looks like something with the undo logs, the culprit could be a change to trx_purge_free_segment().

Ironically, the corruption was caused by MDEV-13542, the very change that aimed to improve our handling of data corruption. Even though the changes were tested very extensively internally, we did not catch the problem until this week, almost 3 months after the change had been implemented.

I think that you’d better stay at 10.6.8 until a 10.6.10 release is available.

Comment by Brad [ 2022-08-26 ]

Thanks so much for looking into this Marko! I'll leave our systems on 10.6.8 until the next patch is released.

Comment by Sergei Golubchik [ 2022-08-27 ]

I'll close it as a duplicate then, but please, do comment if the next release won't fix the issue

Comment by Marko Mäkelä [ 2022-08-30 ]

I am reopening this, because mleich provided an rr replay trace of something that reproduces exactly this message, even when the fix of MDEV-29383 is present.

What I found out so far:

  • The problematic page 10:255 is an index page that was freed near LSN 0x40296d.
  • The first recovered record for the page at LSN 0x3fee0f fails to apply because the page was read back as all-zero.
  • I did not check yet if a FREE_PAGE record for the page had been written or if it was ignored by recovery.
Comment by Marko Mäkelä [ 2022-08-31 ]

It turns out that we fail to write a FREE_PAGE record (it can only be written by invoking mtr_t::free()) for the following operation:

bb-10.6-MDEV-19229 23ab688cec4b3031262bc93921592a04749a4341

#2  0x0000564aad8eacef in buf_page_t::set_freed (this=0x7fb1125e7120, 
    prev_state=1610612739, count=0)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/include/buf0buf.h:741
#3  0x0000564aadca9fc5 in buf_page_free (space=0x612000041740, page=255, 
    mtr=0x7fb10f496ae0)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/buf/buf0buf.cc:2130
#4  0x0000564aadc00c2d in btr_page_free (index=0x6160008ba208, 
    block=0x7fb1125e7120, mtr=0x7fb10f496ae0, blob=false, space_latched=false)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:657
#5  0x0000564aadc186da in btr_compress (cursor=0x61a00000a9a8, adjust=false, 
    mtr=0x7fb10f496ae0)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:3964
#6  0x0000564aadc68034 in btr_cur_compress_if_useful (cursor=0x61a00000a9a8, 
    adjust=false, mtr=0x7fb10f496ae0)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0cur.cc:5406
#7  0x0000564aadc6b3d8 in btr_cur_pessimistic_delete (err=0x7fb10f4967d0, 
    has_reserved_extents=0, cursor=0x61a00000a9a8, flags=0, rollback=false, 
    mtr=0x7fb10f496ae0)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0cur.cc:5863
#8  0x0000564aadac19fa in row_purge_remove_clust_if_poss_low (
    node=0x61a00000a908, mode=522)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/row/row0purge.cc:220

Due to the missing record for freeing the page, the recovery will attempt to load the page and fail, as follows.

2022-08-29  9:26:51 0 [Note] InnoDB: Starting final batch to recover 143 pages from redo log.
2022-08-29  9:26:53 0 [ERROR] InnoDB: Plugin initialization aborted at srv0start.cc[1555] with error Data structure corruption

I must make the failure diagnostics more verbose; it was probably "cleaned up" too much in MDEV-13542.

The root cause of this failure does not look like a recent regression. I think that it will affect 10.5 as well. 10.5 has a compatible log format, and it would crash on this corrupted data directory:

10.5 29fa9bcee01cf5457c096bf37bb25c84ddee5a30

2022-08-31 11:37:26 0 [Note] InnoDB: Starting final batch to recover 143 pages from redo log.
mariadbd: /mariadb/10.5/storage/innobase/log/log0recv.cc:270: log_phys_t::apply_status log_phys_t::apply(const buf_block_t&, uint16_t&) const: Assertion `mach_read_from_4(frame + 4U) == block.page.id().page_no()' failed.

Comment by Marko Mäkelä [ 2022-08-31 ]

The root cause turns out to be that a write of a FREE_PAGE record is being skipped due to an early return here:

#0  fseg_free_page_low (seg_inode=0x7fb112adc0f2 "", iblock=0x7fb1125ebe50, 
    space=0x612000041740, offset=255, mtr=0x7fb10f496ae0, ahi=false)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/fsp/fsp0fsp.cc:2596
#1  0x0000564aaddfcef7 in fseg_free_page (seg_header=0x7fb112ae004a "", 
    space=0x612000041740, offset=255, mtr=0x7fb10f496ae0, have_latch=false)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/fsp/fsp0fsp.cc:2629
#2  0x0000564aadc00b9e in btr_page_free (index=0x6160008ba208, 
    block=0x7fb1125e7120, mtr=0x7fb10f496ae0, blob=false, space_latched=false)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:650
#3  0x0000564aadc186da in btr_compress (cursor=0x61a00000a9a8, adjust=false, 
    mtr=0x7fb10f496ae0)
    at /data/Server/bb-10.6-MDEV-19229/storage/innobase/btr/btr0btr.cc:3964

The culprit is an accidentally added extra return err; statement. (Only in case of an error, we should not write the FREE_PAGE record.)

		if (UNIV_UNLIKELY(err != DB_SUCCESS)) {
			return err;
		}
		return err;
	}
 
	mtr->free(*space, static_cast<uint32_t>(offset));
	return DB_SUCCESS;
}

Comment by Marko Mäkelä [ 2022-08-31 ]

As part of fixing this, I am going to add some error messages to identify the corrupted page:

innodb.corrupted_during_recovery 'innodb' w20 [ fail ]  Found warnings/errors in server log file!
        Test ended at 2022-08-31 13:12:59
line
2022-08-31 13:12:56 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=5, page number=3]; set innodb_force_recovery to ignore
2022-08-31 13:12:58 0 [Warning] InnoDB: Discarding log for corrupted page [page id: space=5, page number=3]

Comment by Sergei Golubchik [ 2022-09-08 ]

Summary: a crash shortly after a page merge (can be triggered by an UPDATE or DELETE or, say, rollback of an INSERT) can cause data corruption.

Generated at Thu Feb 08 10:08:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.