[MDEV-35577] Broken recovery after SET GLOBAL innodb_log_file_size - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: 10.11, 11.4, 11.8
Fix Version/s: 10.11.11, 11.4.5
Component/s: Storage Engine - InnoDB
Labels:
None

Description

While executing the innodb.log_file_size_online test case, it fails with

CURRENT_TEST: innodb.log_file_size_online
mysqltest: At line 59: query 'SELECT * FROM t WHERE a<10' failed: ER_UNKNOWN_STORAGE_ENGINE (1286): Unknown storage engine 'InnoDB'

2024-12-05 16:19:56 0 [ERROR] InnoDB: OPT_PAGE_CHECKSUM mismatch on [page id: space=5, page number=345]

2024-12-05 16:19:56 0 [ERROR] InnoDB: Set innodb_force_recovery=1 to ignore corruption.

2024-12-05 16:19:56 0 [Note] InnoDB: Set innodb_force_recovery=1 to ignore corrupted pages.

Debugged the test case failure, while startup fails with

Thread 2 hit Breakpoint 1, log_phys_t::page_checksum (block=...,

    l=0x78eeebcf335f "\267H\251N")

    at /home/thiru/source_file/server/10.11/storage/innobase/log/log0recv.cc:197

197	    ib::error() << "OPT_PAGE_CHECKSUM mismatch on " << block.page.id();

(rr) f 2

p#2  0x000063c80564908d in recv_recover_page (block=0x78eee99011e0,

    mtr=..., recs=..., space=0x63c807c73c68, init=0x0)

    at /home/thiru/source_file/server/10.11/storage/innobase/log/log0recv.cc:3248

3248			log_phys_t::apply_status a= l->apply(*block, recs.last_offset);

(rr) p *l

$2 = {<log_rec_t> = {next = 0x78eeebcf3368, lsn = 7415422},

  start_lsn = 7415393}

Checksum fails to match with page id is the following

(rr) p mach_read_from_4(l)

$1 = 3074992462

During recovery, these are the things to be noticed.

(rr) p checkpoint_lsn

$5 = 7294363

(rr) p end_lsn

$6 = 7415362

(rr) p first_lsn

$7 = 7292882

Let's visit before restart:
=====================
page_checksum calculated for page_id 21474836825 is 3074992462 during m_commit_lsn: 7323774

mtr::page_checksum calculated 7323774 for the page_id

(rr) p bpage.id_.m_id

$1 = 21474836825

(rr) p checksum

$2 = 3074992462

Thread 2 hit Hardware watchpoint 3: -location m_commit_lsn

Old value = 0

New value = 7323774

mtr_t::finish_writer<false, false> (mtr=0x7673079ff4e0, len=29)

    at /home/thiru/source_file/server/10.11/storage/innobase/mtr/mtr0mtr.cc:1391

1391	  return {start.first, log_close(mtr->m_commit_lsn)};

(rr) f 0

#0  log_t::append (

    d=@0x7673079ff0b8: 0x76730c80348f "G\005\200َ\022\006",

    s=0x7673079ff658, size=24)

    at /home/thiru/source_file/server/10.11/storage/innobase/mtr/mtr0m

tr.cc:1308

(rr) x/24b d

0x76730c80348f: 0x47    0x05    0x80    0xd9    0x8e    0x12    0x06  0x00

0x76730c803497: 0xb2    0x00    0x80    0xc3    0x00    0x06    0x00  0x78

0x76730c80349f: 0x05    0x80    0xd9    0x00    0xb7    0x48    0xa9  0x4e

Above is the redo log written for checksum for the page

After that, InnoDB does flushes the log thrice. There are 3 log_t::write_buf() calls

At the end of the first log_t::write_buf()

(rr) p write_lsn

$2 = 7353588

checksum redo log was flushed to disk during first write_buf() call itself.
Note: Every write_buf() swaps resize_buf with resize_flush_buf

At the end of the 2nd log_t::write_buf():

(rr) p write_lsn

$4 = 7401982

let's have detailed look at the last log_t::write_buf():

At the start

(rr) p write_lsn

$16 = 7401982

(rr) p lsn

$17 = 7415393

Initially, length is 13455. We copy the last non-aligned
block to resize_flush_buf. Round off the length to 13824 (multiple of 512 bytes)
InnoDB writes the old resize_buffer garbage value to the log file for the remaining
349 bytes. Since InnoDb fails to mark the end of the log in resize_buf. During
recovery, InnoDB reads the old mtr record and fails.

(rr) x/24b resize_buf + 13455

0x76730420348f: 0x47    0x05    0x80    0xd9    0x8e    0x12    0x06  0x00

0x767304203497: 0xb2    0x00    0x80    0xc3    0x00    0x06    0x00  0x78

0x76730420349f: 0x05    0x80    0xd9    0x00    0xb7    0x48    0xa9  0x4e

Above bytes matches with mtr checksum value which was written during 7323774

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending
- Thumbnails
- List
- Download All

data.tar.gz
1.06 MB
2024-12-05 13:53
ib_logfile0.tar.gz
64 kB
2024-12-05 15:16

Issue Links

relates to

MDEV-35608 Fake PMEM on /dev/shm no longer works

Closed

MDEV-35411 innodb.log_file_size_online occasionally fails

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Thirunarayanan Balathandayuthapani

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2024-12-05 13:49

Updated:: 2024-12-17 09:06

Resolved:: 2024-12-17 09:06

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates

Git Integration