[ 'TBR-1535', 'ERROR: Reporter Mariabackup : First prepare returned.+ Executing backup: .+ Executing first prepare: .+mariadb-backup: srv_start\(\) returned 37 \(Data structure corruption\).+RESULT: The RQG run ended with status STATUS_BACKUP_FAILURE' ],
Attachments
Issue Links
duplicates
MDEV-29374Frequent "Data structure corruption" in InnoDB after OOM
The page is all-zero, so it did pass the consistency check of mariadb-backup --backup. But, we do have log records to be applied to this page. Hence, the page should actually be nonzero.
I think that in order to find the exact reason of the failure, the rr replay trace of mariadb-backup --backup will have to be analyzed. It might be something that cannot be fixed without implementing some type of server-assisted backup (MDEV-14992) where the server would stream log and data pages to the client.
Because this trace is in the 10.8 MDEV-14425 format, the log files should be easier to read, since they only contain plain log records, no block headers or footers.
Marko Mäkelä
added a comment - Thank you. I think that a similar problem occurred during the testing of MDEV-13542 .
I did some initial analysis with the following:
ssh pluto
rr replay /data/results/1654601840/TBR-1535/home/mleich/rqg/1654601840/219/1_clone/rr/latest-trace
watch -l recv_sys.found_corrupt_fs
continue
break buf_page_t::read_complete
reverse-continue
print/x *frame@srv_page_size
The page is all-zero, so it did pass the consistency check of mariadb-backup --backup . But, we do have log records to be applied to this page. Hence, the page should actually be nonzero.
I think that in order to find the exact reason of the failure, the rr replay trace of mariadb-backup --backup will have to be analyzed. It might be something that cannot be fixed without implementing some type of server-assisted backup ( MDEV-14992 ) where the server would stream log and data pages to the client.
Because this trace is in the 10.8 MDEV-14425 format, the log files should be easier to read, since they only contain plain log records, no block headers or footers.
An attempt to run a recent mariadb-backup --prepare on the corrupted backup provides a little more diagnostics:
10.8 e4b83f0febbd24b04687d4814871c758cbede111
2023-03-23 13:59:39 0 [Note] InnoDB: Starting final batch to recover 314 pages from redo log.
2023-03-23 13:59:39 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=7, page number=255]; set innodb_force_recovery to ignore
[00] 2023-03-23 13:59:39 mariadb-backup: srv_start() returned 37 (Data structure corruption).
I see that we could spend a little more effort and display the file name as well here.
The branch on which this was reproduced is 1 commit ahead of 10.8 cf57fa8d879d3621122acd26bfa21b605be79064. It includes the initial commit of MDEV-13542 but not the fixes of MDEV-29374 or MDEV-29383. The corruption is being flagged because the page is filled with NUL bytes.
10.8 e4b83f0febbd24b04687d4814871c758cbede111
#2 0x0000564bf54690b9 in buf_pool_t::corrupted_evict (
During the execution of the server that ran concurrently with the backup, the function buf_page_t::write_complete() was never invoked on this page. At log_sys.lsn=4997963 the page was removed from buf_pool.flush_list:
#2 0x00005620f96aa7f2 in buf_page_t::clear_oldest_modification (
this=0x7fb4ba113b60)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/include/buf0buf.h:2047
#3 0x00005620f96ab7fa in buf_pool_t::release_freed_page (
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0flu.cc:728
#4 0x00005620f96abe37 in buf_page_t::flush (this=0x7fb4ba113b60, lru=false,
space=0x612000097540)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0flu.cc:756
#5 0x00005620f96a259a in buf_do_flush_list_batch (max_n=200,
lsn=18446744073709551615)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0flu.cc:1426
#6 0x00005620f96a2f23 in buf_flush_list (max_n=200, lsn=18446744073709551615)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0flu.cc:1502
#7 0x00005620f96a800e in buf_flush_page_cleaner ()
At this time, the latest checkpoint LSN was 3495408. The backup preparation started at checkpoint LSN 4258151. After the page had been freed, the checkpoint LSN was updated to 3925936 and subsequently to 4258151. Now, why is recovery interested in a freed page? Here is where we recovery registers its interest in the page:
m = {<std::__atomic_base<unsigned long>> = {_M_i = 0x39ed8a}…
At this point of time, we still had log_sys.last_checkpoint_lsn=3377749. The recv_sys.lsn during recovery advances all the way to 5447384. This write started after the checkpoint LSN 4258151, but the oldest modification to the page was 0x39ed8a=3796362, before the checkpoint. Therefore, recovery is right in complaining that it fails to read a copy of the page so that it can apply this log record to it. Why was the block freed and never written out? Let us check this call to buf_page_free() at LSN=4824182:
/** Get read access to a compressed page (usually of type
That revision should not actually make any difference in this case, because at the time this thread invokes mtr_t::commit(), we do have another record that marked the page as modified in the mini-transaction:
(rr) p *(mtr_memo_slot_t*)m_memo.m_first_block.m_data@20
$26 = {{object = 0x616000906278, type = MTR_MEMO_SX_LOCK}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_PAGE_X_FIX}, {
object = 0x7fb4ba113b60, type = MTR_MEMO_BUF_FIX}, {
object = 0x7fb4ba015560, type = MTR_MEMO_PAGE_X_FIX}, {
object = 0x7fb4ba113b60, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba514860, type = MTR_MEMO_PAGE_X_FIX}, {
object = 0x612000097540, type = MTR_MEMO_SPACE_X_LOCK}, {
object = 0x7fb4ba213750, type = MTR_MEMO_PAGE_SX_FIX}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_BUF_FIX}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba015560, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba015560, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba514860, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_PAGE_SX_FIX}, {
object = 0x7fb4ba2138f0, type = MTR_MEMO_PAGE_SX_MODIFY}, {
object = 0x7fb4ba213750, type = MTR_MEMO_PAGE_SX_MODIFY}, {
object = 0x7fb4ba213750, type = MTR_MEMO_PAGE_SX_MODIFY}, {
object = 0x7fb4ba113b60, type = MTR_MEMO_PAGE_X_FIX}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_BUF_FIX}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_PAGE_X_FIX}}
Besides, our block 0x7fb4ba113b60 still carries the oldest_modification LSN of 3796362.
The mini-transaction is going to write the following log records, at LSN=4824182:
The first record is DELETE_ROW_FORMAT_DYNAMIC for the page 7:255. It is followed by 2 records for page 7:169, written by btr_page_reorganize_low(). For some reason, no FREE_PAGE record for our page 255 was written by this mini-transaction. I double checked that during the recovery, no FREE_PAGE record for the page was parsed:
last_offset= 1; /* the next record must not be same_page */
goto free_or_init_page;
Neither assertion was hit.
One more possible explanation could have been that the change under test was different from what was finally pushed as MDEV-28708. Even if this difference had caused buf_pool.flush_list to be in wrong order, it should not explain this recovery error, because the checkpoint from which the recovery started was executed much after the time the page had been evicted from the buffer pool.
fseg_free_page_low(): Remove an accidentally added return statement
that prevented mtr_t::free() from being called. This fixes a regression
that was introduced in
commit 0b47c126e31cddda1e94588799599e138400bcf8 (MDEV-13542).
That is, the accidental stray return statement that was removed in MDEV-29374 caused this bug. In other words, this bug report is a duplicate of MDEV-29374.
Marko Mäkelä
added a comment - An attempt to run a recent mariadb-backup --prepare on the corrupted backup provides a little more diagnostics:
10.8 e4b83f0febbd24b04687d4814871c758cbede111
2023-03-23 13:59:39 0 [Note] InnoDB: Starting final batch to recover 314 pages from redo log.
2023-03-23 13:59:39 0 [ERROR] InnoDB: Unable to apply log to corrupted page [page id: space=7, page number=255]; set innodb_force_recovery to ignore
[00] 2023-03-23 13:59:39 mariadb-backup: srv_start() returned 37 (Data structure corruption).
I see that we could spend a little more effort and display the file name as well here.
The branch on which this was reproduced is 1 commit ahead of 10.8 cf57fa8d879d3621122acd26bfa21b605be79064. It includes the initial commit of MDEV-13542 but not the fixes of MDEV-29374 or MDEV-29383 . The corruption is being flagged because the page is filled with NUL bytes.
10.8 e4b83f0febbd24b04687d4814871c758cbede111
#2 0x0000564bf54690b9 in buf_pool_t::corrupted_evict (
this=this@entry=0x564bf5942b00 <buf_pool>,
bpage=bpage@entry=0x7f44edc211c0, state=state@entry=2147483648)
at /mariadb/10.8/storage/innobase/buf/buf0lru.cc:1261
#3 0x0000564bf5445619 in buf_page_t::read_complete (this=0x7f44edc211c0,
node=@0x564bf6420300: {space = 0x564bf64201c0, name = 0x564bf6420350 "test/oltp3.ibd", handle = {m_file = 12}, on_ssd = 0, punch_hole = 0, atomic_write = 0, is_raw_disk = 0, deferred = 0, size = 704, init_size = 0, max_size = 4294967295, being_extended = {m = std::atomic<bool> = { false }}, chain = {prev = 0x0, next = 0x0}, block_size = 4096})
Page 255 would be at the end of this range of the file:
2e0000 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
*
400000 …
During the execution of the server that ran concurrently with the backup, the function buf_page_t::write_complete() was never invoked on this page. At log_sys.lsn=4997963 the page was removed from buf_pool.flush_list :
#2 0x00005620f96aa7f2 in buf_page_t::clear_oldest_modification (
this=0x7fb4ba113b60)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/include/buf0buf.h:2047
#3 0x00005620f96ab7fa in buf_pool_t::release_freed_page (
this=0x5620fb732f40 <buf_pool>, bpage=0x7fb4ba113b60)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0flu.cc:728
#4 0x00005620f96abe37 in buf_page_t::flush (this=0x7fb4ba113b60, lru=false,
space=0x612000097540)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0flu.cc:756
#5 0x00005620f96a259a in buf_do_flush_list_batch (max_n=200,
lsn=18446744073709551615)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0flu.cc:1426
#6 0x00005620f96a2f23 in buf_flush_list (max_n=200, lsn=18446744073709551615)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0flu.cc:1502
#7 0x00005620f96a800e in buf_flush_page_cleaner ()
At this time, the latest checkpoint LSN was 3495408. The backup preparation started at checkpoint LSN 4258151. After the page had been freed, the checkpoint LSN was updated to 3925936 and subsequently to 4258151. Now, why is recovery interested in a freed page? Here is where we recovery registers its interest in the page:
10.8 e4b83f0febbd24b04687d4814871c758cbede111
Thread 1 hit Breakpoint 6, recv_sys_t::add (
this=this@entry=0x564bf594e920 <recv_sys>,
it=it@entry={first = {m_id = 30064771327}, second = {state = page_recv_t::RECV_NOT_PROCESSED, last_offset = 1, log = {head = 0x0, tail = 0x0}}},
start_lsn=start_lsn@entry=4296217, lsn=4296441,
l=0x7f44f8d8e4b2 " \200B\a\200\177\006\205<", len=210)
at /mariadb/10.8/storage/innobase/log/log0recv.cc:1993
The log record is INSERT_HEAP_DYNAMIC . Let us see under which circumstances this log record had been written:
Thread 16 hit Hardware watchpoint 12: -location log_sys.lsn._M_i
Old value = 4296217
New value = 4296441
0x00005620f92e65e6 in std::__atomic_base<unsigned long>::store (
__m=std::memory_order_relaxed, __i=4296441, this=0x5620fc347b40 <log_sys>)
at /usr/include/c++/9/bits/atomic_base.h:397
397 __atomic_store_n(&_M_i, __i, int(__m));
(rr) when
Current event: 295265
(rr) bt
#0 0x00005620f92e65e6 in std::__atomic_base<unsigned long>::store (
__m=std::memory_order_relaxed, __i=4296441, this=0x5620fc347b40 <log_sys>)
at /usr/include/c++/9/bits/atomic_base.h:397
#1 log_t::append_prepare<false> (this=0x5620fc347b40 <log_sys>, size=224,
ex=false)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/mtr/mtr0mtr.cc:1129
#2 0x00005620f92d72f8 in mtr_t::finish_write (this=0x7fb4a5b3dcb0, len=224)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/mtr/mtr0mtr.cc:1302
#3 0x00005620f92d69ec in mtr_t::do_write (this=0x7fb4a5b3dcb0)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/mtr/mtr0mtr.cc:1280
#4 0x00005620f92d22f6 in mtr_t::commit (this=0x7fb4a5b3dcb0)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/mtr/mtr0mtr.cc:715
#5 0x00005620f93f75c7 in row_ins_clust_index_entry_low (flags=0, mode=2,
index=0x616000906108, n_uniq=1, entry=0x6160054f7908, n_ext=0,
thr=0x6210004148d8)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/row/row0ins.cc:2797
Here is our page of interest:
(rr) p *(mtr_memo_slot_t*)m_memo.m_first_block.m_data@3
$7 = {{object = 0x0, type = MTR_MEMO_S_LOCK}, {object = 0x7fb4ba2139c0,
type = MTR_MEMO_PAGE_SX_FIX}, {object = 0x7fb4ba113b60,
type = MTR_MEMO_PAGE_X_MODIFY}}
(rr) p/x *(buf_block_t*)0x7fb4ba113b60
$8 = {page = {id_ = {m_id = 0x7000000ff}, hash = 0x0, oldest_modification_ = {
m = {<std::__atomic_base<unsigned long>> = {_M_i = 0x39ed8a}…
At this point of time, we still had log_sys.last_checkpoint_lsn=3377749 . The recv_sys.lsn during recovery advances all the way to 5447384. This write started after the checkpoint LSN 4258151, but the oldest modification to the page was 0x39ed8a=3796362, before the checkpoint. Therefore, recovery is right in complaining that it fails to read a copy of the page so that it can apply this log record to it. Why was the block freed and never written out? Let us check this call to buf_page_free() at LSN=4824182:
#0 buf_page_free (
space=0x5620f97bc18a <fseg_free_page(unsigned char*, fil_space_t*, unsigned int, mtr_t*, bool)+789>, page=32692, mtr=0x7fb4b70ca960)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/buf/buf0buf.cc:2184
#1 0x00005620f95b99c1 in btr_page_free (index=0x616000906108,
block=0x7fb4ba113b60, mtr=0x7fb4b70cbbe0, blob=false, space_latched=false)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/btr/btr0btr.cc:619
#2 0x00005620f95d154e in btr_compress (cursor=0x61a00000e5a8, adjust=false,
mtr=0x7fb4b70cbbe0)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/btr/btr0btr.cc:3850
#3 0x00005620f9621ac5 in btr_cur_compress_if_useful (cursor=0x61a00000e5a8,
adjust=false, mtr=0x7fb4b70cbbe0)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/btr/btr0cur.cc:5393
#4 0x00005620f9624d47 in btr_cur_pessimistic_delete (err=0x7fb4b70cb8d0,
has_reserved_extents=0, cursor=0x61a00000e5a8, flags=0, rollback=false,
mtr=0x7fb4b70cbbe0)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/btr/btr0cur.cc:5838
#5 0x00005620f947c36b in row_purge_remove_clust_if_poss_low (
node=0x61a00000e508, mode=522)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/row/row0purge.cc:220
#6 0x00005620f947c619 in row_purge_remove_clust_if_poss (node=0x61a00000e508)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/row/row0purge.cc:260
#7 0x00005620f947e4fb in row_purge_del_mark (node=0x61a00000e508)
…
(rr) display
1: log_sys.lsn._M_i = 4824182
3: buf_pool.flush_list.start.oldest_modification_.m._M_i = 4816180
4: /x id_.m_id = <error: current stack frame does not contain a variable named `this'>
6: log_sys.last_checkpoint_lsn.m._M_i = 3377749
(rr) when
Current event: 393629
This looks like something that was fixed in MDEV-29374 , which revised buf_page_free() as follows:
diff --git a/storage/innobase/buf/buf0buf.cc b/storage/innobase/buf/buf0buf.cc
index eaf0f955a1f..c1e9dedf72d 100644
--- a/storage/innobase/buf/buf0buf.cc
+++ b/storage/innobase/buf/buf0buf.cc
@@ -2128,7 +2128,7 @@ void buf_page_free(fil_space_t *space, uint32_t page, mtr_t *mtr)
btr_search_drop_page_hash_index(block);
#endif /* BTR_CUR_HASH_ADAPT */
block->page.set_freed(block->page.state());
- mtr->memo_push(block, MTR_MEMO_PAGE_X_FIX);
+ mtr->memo_push(block, MTR_MEMO_PAGE_X_MODIFY);
}
/** Get read access to a compressed page (usually of type
That revision should not actually make any difference in this case, because at the time this thread invokes mtr_t::commit() , we do have another record that marked the page as modified in the mini-transaction:
(rr) p *(mtr_memo_slot_t*)m_memo.m_first_block.m_data@20
$26 = {{object = 0x616000906278, type = MTR_MEMO_SX_LOCK}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_PAGE_X_FIX}, {
object = 0x7fb4ba113b60, type = MTR_MEMO_BUF_FIX}, {
object = 0x7fb4ba015560, type = MTR_MEMO_PAGE_X_FIX}, {
object = 0x7fb4ba113b60, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba514860, type = MTR_MEMO_PAGE_X_FIX}, {
object = 0x612000097540, type = MTR_MEMO_SPACE_X_LOCK}, {
object = 0x7fb4ba213750, type = MTR_MEMO_PAGE_SX_FIX}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_BUF_FIX}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba015560, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba015560, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba514860, type = MTR_MEMO_PAGE_X_MODIFY}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_PAGE_SX_FIX}, {
object = 0x7fb4ba2138f0, type = MTR_MEMO_PAGE_SX_MODIFY}, {
object = 0x7fb4ba213750, type = MTR_MEMO_PAGE_SX_MODIFY}, {
object = 0x7fb4ba213750, type = MTR_MEMO_PAGE_SX_MODIFY}, {
object = 0x7fb4ba113b60, type = MTR_MEMO_PAGE_X_FIX}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_BUF_FIX}, {
object = 0x7fb4ba2139c0, type = MTR_MEMO_PAGE_X_FIX}}
Besides, our block 0x7fb4ba113b60 still carries the oldest_modification LSN of 3796362.
The mini-transaction is going to write the following log records, at LSN=4824182:
(rr) p/x *m_log.m_first_block.m_data@39
$33 = {0x29, 0x7, 0x80, 0x7f, 0x9, 0x8e, 0x16, 0x0, 0x80, 0x49,
0x30, 0x5, 0x7, 0x80, 0x29, 0x27, 0xe, 0x2b, 0xec, 0x80, 0x38, 0x0, 0x0, 0x0, 0x0,
0x2b, 0x23, 0x0, 0x2, 0x0, 0x35, 0xb3, 0x2b, 0x0, 0x1a, 0xb0, 0xaa, 0x30,
0x19}
The first record is DELETE_ROW_FORMAT_DYNAMIC for the page 7:255. It is followed by 2 records for page 7:169, written by btr_page_reorganize_low() . For some reason, no FREE_PAGE record for our page 255 was written by this mini-transaction. I double checked that during the recovery, no FREE_PAGE record for the page was parsed:
diff --git a/storage/innobase/log/log0recv.cc b/storage/innobase/log/log0recv.cc
index 793f7b327c8..88ed55f577a 100644
--- a/storage/innobase/log/log0recv.cc
+++ b/storage/innobase/log/log0recv.cc
@@ -1994,6 +1994,7 @@ inline void recv_sys_t::add(map::iterator it, lsn_t start_lsn, lsn_t lsn,
switch (*l & 0x70) {
case FREE_PAGE: case INIT_PAGE:
+ ut_ad(page_id.page_no() != 255);
recs.will_not_read();
mlog_init.add(page_id, start_lsn); /* FIXME: remove this! */
/* fall through */
@@ -2547,6 +2548,7 @@ inline recv_sys_t::parse_mtr_result recv_sys_t::parse(store_t store, source &l)
ut_ad(freed.find(id) == freed.end());
switch (b & 0x70) {
case FREE_PAGE:
+ ut_ad(id.page_no() != 255);
ut_ad(freed.emplace(id).second);
last_offset= 1; /* the next record must not be same_page */
goto free_or_init_page;
Neither assertion was hit.
One more possible explanation could have been that the change under test was different from what was finally pushed as MDEV-28708 . Even if this difference had caused buf_pool.flush_list to be in wrong order, it should not explain this recovery error, because the checkpoint from which the recovery started was executed much after the time the page had been evicted from the buffer pool.
The last clue in the MDEV-29374 commit message is useful:
fseg_free_page_low(): Remove an accidentally added return statement
that prevented mtr_t::free() from being called. This fixes a regression
that was introduced in
commit 0b47c126e31cddda1e94588799599e138400bcf8 ( MDEV-13542 ).
Let us check for that:
(rr) bt
#0 fseg_free_page_low (
seg_inode=0xabfbd9f3423bf700 <error: Cannot access memory at address 0xabfbd9f3423bf700>, iblock=0x612000097540, space=0x7f00b70cbc48, offset=22048,
mtr=0x5620f92d8535 <mtr_t::memo_contains(fil_space_t const&, bool)+365>,
ahi=12)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/fsp/fsp0fsp.cc:2486
#1 0x00005620f97bc18a in fseg_free_page (seg_header=0x7fb4ba24804a "",
space=0x612000097540, offset=255, mtr=0x7fb4b70cbbe0, have_latch=false)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/fsp/fsp0fsp.cc:2614
#2 0x00005620f95b9932 in btr_page_free (index=0x616000906108,
block=0x7fb4ba113b60, mtr=0x7fb4b70cbbe0, blob=false, space_latched=false)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/btr/btr0btr.cc:613
#3 0x00005620f95d154e in btr_compress (cursor=0x61a00000e5a8, adjust=false,
mtr=0x7fb4b70cbbe0)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/btr/btr0btr.cc:3850
(rr) finish
Run till exit from #0 fseg_free_page_low (
seg_inode=0xabfbd9f3423bf700 <error: Cannot access memory at address 0xabfbd9f3423bf700>, iblock=0x612000097540, space=0x7f00b70cbc48, offset=22048,
mtr=0x5620f92d8535 <mtr_t::memo_contains(fil_space_t const&, bool)+365>,
ahi=12)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/fsp/fsp0fsp.cc:2486
fseg_free_page (seg_header=0x7fb4ba24804a "", space=0x612000097540,
offset=255, mtr=0x7fb4b70cbbe0, have_latch=false)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/fsp/fsp0fsp.cc:2614
2614 return fseg_free_page_low(seg_inode, iblock, space, offset, mtr);
1: log_sys.lsn._M_i = 4824182
3: buf_pool.flush_list.start.oldest_modification_.m._M_i = 4816180
4: /x id_.m_id = <error: current stack frame does not contain a variable named `this'>
6: log_sys.last_checkpoint_lsn.m._M_i = 3377749
Value returned is $35 = DB_SUCCESS
(rr) disable display
(rr) reverse-step
fseg_free_page_low (seg_inode=0x7fb4ba2440f2 "", iblock=0x7fb4ba2138f0,
space=0x612000097540, offset=255, mtr=0x7fb4b70cbbe0, ahi=false)
at /data/Server/bb-10.8-MDEV-28708A/storage/innobase/fsp/fsp0fsp.cc:2586
2586 }
(rr) reverse-next
2486 {
(rr)
2581 return err;
(rr) list
2576 }
2577 err = fsp_free_extent(space, offset, mtr);
2578 if (UNIV_UNLIKELY(err != DB_SUCCESS)) {
2579 return err;
2580 }
2581 return err;
2582 }
2583
2584 mtr->free(*space, static_cast<uint32_t>(offset));
2585 return DB_SUCCESS;
That is, the accidental stray return statement that was removed in MDEV-29374 caused this bug. In other words, this bug report is a duplicate of MDEV-29374 .
People
Marko Mäkelä
Matthias Leich
Votes:
1Vote for this issue
Watchers:
3Start watching this issue
Dates
Created:
Updated:
Resolved:
Git Integration
Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.
Thank you. I think that a similar problem occurred during the testing of
MDEV-13542.I did some initial analysis with the following:
ssh pluto
rr replay /data/results/1654601840/TBR-1535/home/mleich/rqg/1654601840/219/1_clone/rr/latest-trace
watch -l recv_sys.found_corrupt_fs
continue
break buf_page_t::read_complete
reverse-continue
print/x *frame@srv_page_size
The page is all-zero, so it did pass the consistency check of mariadb-backup --backup. But, we do have log records to be applied to this page. Hence, the page should actually be nonzero.
I think that in order to find the exact reason of the failure, the rr replay trace of mariadb-backup --backup will have to be analyzed. It might be something that cannot be fixed without implementing some type of server-assisted backup (MDEV-14992) where the server would stream log and data pages to the client.
Because this trace is in the 10.8
MDEV-14425format, the log files should be easier to read, since they only contain plain log records, no block headers or footers.