|
The page in question was written here:
ssh pluto
|
rr replay /data/results/1674736672/TBR-1646/1/rr/mysqld-1
|
run 1928000
|
watch -l log_sys.lsn._M_i
|
reverse-continue
|
backtrace
|
…
|
Thread 1 hit Hardware watchpoint 1: -location log_sys.lsn._M_i
|
|
Old value = 93515647
|
New value = 93515562
|
0x0000563239a5a349 in std::__atomic_base<unsigned long>::store (
|
__m=std::memory_order_release, __i=93515647, this=<optimized out>)
|
at /usr/include/c++/9/bits/atomic_base.h:390
|
390 store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
|
…
|
#5 0x0000563239a4cb77 in mtr_t::commit (this=this@entry=0x7fbb382d6cb0)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/mtr/mtr0mtr.cc:148
|
#6 0x0000563239d04be4 in trx_purge_free_segment (
|
rseg=rseg@entry=0x56323b7c12c0 <trx_sys+28544>, hdr_addr=...)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:469
|
#7 0x0000563239d062a7 in trx_purge_truncate_rseg_history (rseg=..., limit=...)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:549
|
#8 0x0000563239d10371 in trx_purge_truncate_history ()
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:643
|
#9 0x0000563239d16812 in trx_purge (n_tasks=<optimized out>,
|
…
|
At this point of execution, we can access the block here:
frame 4
|
set $block=(buf_block_t*)m_memo.small[5].object
|
Between this and the recovery, the page contents differs quite a bit:
--- /dev/fd/63 2023-01-27 10:22:59.045904382 +0200
|
+++ /dev/fd/62 2023-01-27 10:22:59.025903573 +0200
|
@@ -1,6 +1,6 @@
|
000000 00 00 00 00 00 00 02 41 ff ff ff ff ff ff ff ff
|
-000010 00 00 00 00 05 92 ef 2a 00 03 00 00 00 00 00 00
|
-000020 00 00 00 00 00 02 ff ff ff ff 00 00 00 00 01 8e
|
+000010 00 00 00 00 03 82 bf 89 00 03 00 00 00 00 00 00
|
+000020 00 00 00 00 00 02 00 00 02 16 00 26 00 00 01 af
|
000030 00 26 00 00 00 00 00 00 00 d6 00 00 00 00 00 00
|
000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
*
|
@@ -11,7 +11,14 @@
|
000270 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
*
|
-0006f0 00 00 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
|
+0004b0 00 00 00 00 00 00 00 00 00 a4 00 00 00 00 00 00
|
+0004c0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
+*
|
+0004e0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
|
+0004f0 69 d2 00 00 02 47 ff ff ff ff ff ff ff ff ff ff
|
+000500 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
|
+*
|
+0006f0 ff ff 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
|
000700 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
*
|
000720 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
|
@@ -21,5 +28,12 @@
|
000930 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
000940 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
*
|
-000ff0 00 00 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
|
+000db0 00 00 00 00 00 00 00 00 00 d8 00 00 00 00 00 00
|
+000dc0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
+*
|
+000de0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
|
+000df0 69 d2 00 00 00 f9 ff ff ff ff ff ff ff ff ff ff
|
+000e00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
|
+*
|
+000ff0 ff ff 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
|
001000
|
The mini-transaction generated the following 85 bytes of log:
(rr) p/x *m_log.m_first_block.m_data@m_log.m_size
|
$6 = {0x39, 0x2, 0x25, 0x32, 0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0xb7, 0x0,
|
0xff, 0xff, 0xff, 0xff, 0x0, 0x0, 0x34, 0x2, 0x25, 0x31, 0x0, 0x34, 0x2,
|
0x25, 0x2d, 0x0, 0x47, 0x2, 0x81, 0xc1, 0x82, 0x32, 0x4, 0xff, 0x34, 0x2,
|
0x0, 0x3d, 0x5b, 0x2, 0x2, 0x37, 0x35, 0x2, 0x0, 0x80, 0x3b, 0xea, 0x48,
|
0x2, 0x81, 0xc1, 0x81, 0xf2, 0x81, 0xc0, 0x0, 0x77, 0x2, 0x25, 0x0, 0x67,
|
0x57, 0x32, 0xe1, 0x78, 0x2, 0x81, 0xc1, 0x0, 0x19, 0xee, 0x8b, 0x3c, 0x77,
|
0x2, 0x0, 0x0, 0x67, 0x32, 0x82, 0x1f, 0x0}
|
Let us filter out the records for this page 2:0x241:
0x47, 0x2, 0x81, 0xc1, 0x82, 0x32, 0x4, 0xff, MEMSET, writing 4×0xff bytes at offset 690
0x48, 0x2, 0x81, 0xc1, 0x81, 0xf2, 0x81, 0xc0, 0x0, MEMSET, writing 576×0x0 bytes at offset 626
0x78, 0x2, 0x81, 0xc1, 0x0, 0x19, 0xee, 0x8b, 0x3c, PAGE_CHECKSUM 0x19ee8b3c (this trips the recovery failure)
At the start of the mini-transaction (and before applying any redo log to it on recovery), the page contents differs quite a bit as well:
--- /dev/fd/63 2023-01-27 10:48:35.250176986 +0200
|
+++ /dev/fd/62 2023-01-27 10:48:35.250176986 +0200
|
@@ -1,6 +1,6 @@
|
000000 00 00 00 00 00 00 02 41 ff ff ff ff ff ff ff ff
|
-000010 00 00 00 00 05 92 ee fa 00 03 00 00 00 00 00 00
|
-000020 00 00 00 00 00 02 ff ff ff ff 00 00 00 00 01 8e
|
+000010 00 00 00 00 03 82 bf 89 00 03 00 00 00 00 00 00
|
+000020 00 00 00 00 00 02 00 00 02 16 00 26 00 00 01 af
|
000030 00 26 00 00 00 00 00 00 00 d6 00 00 00 00 00 00
|
000040 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
*
|
@@ -8,17 +8,17 @@
|
000070 69 d2 00 00 00 f7 ff ff ff ff ff ff ff ff ff ff
|
000080 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
|
*
|
-000270 ff ff 00 00 00 00 00 00 00 e7 00 00 00 00 00 00
|
-000280 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
+000270 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
+000280 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
*
|
-0002a0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
|
-0002b0 69 d2 00 00 00 37 00 00 00 38 ff ff ff ff ff ff
|
-0002c0 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
|
+0004b0 00 00 00 00 00 00 00 00 00 a4 00 00 00 00 00 00
|
+0004c0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
*
|
-0004b0 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
-0004c0 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
+0004e0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
|
+0004f0 69 d2 00 00 02 47 ff ff ff ff ff ff ff ff ff ff
|
+000500 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
|
*
|
-0006f0 00 00 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
|
+0006f0 ff ff 00 00 00 00 00 00 00 d7 00 00 00 00 00 00
|
000700 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
*
|
000720 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
|
@@ -28,5 +28,12 @@
|
000930 ff ff 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
000940 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
|
*
|
-000ff0 00 00 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
|
+000db0 00 00 00 00 00 00 00 00 00 d8 00 00 00 00 00 00
|
+000dc0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 00 00
|
+*
|
+000de0 00 00 ff ff ff ff 00 00 ff ff ff ff 00 00 05 d6
|
+000df0 69 d2 00 00 00 f9 ff ff ff ff ff ff ff ff ff ff
|
+000e00 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff
|
+*
|
+000ff0 ff ff 00 00 00 00 00 00 03 82 bf 89 89 d0 04 6b
|
001000
|
So, we will have to track earlier changes to the page, to find where some change was not logged.
|
|
The FIL_PAGE_LSN of the page that was read in recovery was 0x382bf89 (58900361). That change was written to the buffer pool here:
(rr) when
|
Current event: 1276639
|
(rr) bt
|
#0 buf_flush_note_modification (end=58900361, start=58900260,
|
b=0x7fbb4551cb40)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/include/buf0flu.h:118
|
#1 mtr_t::commit (this=this@entry=0x7fbb396e0cb0)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/mtr/mtr0mtr.cc:196
|
#2 0x0000563239d04be4 in trx_purge_free_segment (
|
rseg=rseg@entry=0x56323b7c0cc0 <trx_sys+27008>, hdr_addr=...)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:469
|
#3 0x0000563239d062a7 in trx_purge_truncate_rseg_history (rseg=..., limit=...)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:549
|
#4 0x0000563239d10371 in trx_purge_truncate_history ()
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/trx/trx0purge.cc:643
|
#5 0x0000563239d16812 in trx_purge (n_tasks=<optimized out>,
|
A little later, this same thread would invoke mtr_t::commit_shrink() on a different tablespace (3, undo003). The last write of our problematic page happened after that:
Thread 5 hit Hardware watchpoint 8: -location $block.page.oldest_modification_
|
|
Old value =
|
{m = {<std::__atomic_base<unsigned long>> = {_M_i = 22754130}, <No data fields>}}
|
New value =
|
{m = {<std::__atomic_base<unsigned long>> = {_M_i = 1}, <No data fields>}}
|
0x0000563239ee7eca in Atomic_relaxed<unsigned long>::store (
|
o=std::memory_order_release, i=1, this=0x7fbb4551cb50)
|
at /usr/include/c++/9/bits/atomic_base.h:390
|
390 store(__int_type __i, memory_order __m = memory_order_seq_cst) noexcept
|
1: log_sys.lsn._M_i = 71354177
|
(rr) when
|
Current event: 1543240
|
Probably soon after this, the checkpoint had been advanced to what the recovery reported:
2023-01-26 12:42:54 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=71646474,71825142
|
Some next interesting events for the page occur at:
| LSN |
when |
remarks |
| 75680224 |
1677754 |
update of FIL_PAGE_LSN |
| 75700082 |
1677868 |
update of FIL_PAGE_LSN |
| 89776724 |
1890477 |
mtr_t::page_checksum() calculates checksum 0xd732b6f5 |
| 89777019 |
1890477 |
update of FIL_PAGE_LSN |
| 91986805 |
1903303 |
mtr_t::page_checksum() calculates checksum 0x2beb6bbd |
| 91986862 |
1903303 |
update of FIL_PAGE_LSN |
| 91986862 |
1903308 |
mtr_t::page_checksum() calculates checksum 0xeb825f4 |
| 91986912 |
1903315 |
update of FIL_PAGE_LSN |
| … |
| 93515647 |
1928020 |
trx_purge_truncate_history() invokes buf_page_t::reset_oldest_modification(), discarding changes to this page! |
| 93515647 |
1928220 |
mtr_t::commit_shrink() is invoked on this tablespace |
| 116667020 |
2494112 |
the server process receives SIGKILL |
The results of mtr_t::page_checksum() calls will be appended to the end of the log at mtr_t::commit(), right before the FIL_PAGE_LSN is updated, so they should be consistent.
At LSN=93491365 and when 1926356 we also got a call to mtr_t::commit_shrink() to shrink undo001, a.k.a. tablespace 1.
Something appears to be wrong on the recovery side. Recovery somehow misses the fact that the tablespace was shrunk, and tries to apply a log record that had been written at LSN=93515562, which was before LSN=93515647, which is when shrinking the tablespace was in progress. The recv_sys.recovered_lsn reached 116666679, which only misses 1 or 2 last 512-byte log blocks that would have been written after the SIGKILL. There was a message about completed truncation of undo002 long before that.
|
|
On recovery, we have the following:
(rr) p recv_sys.truncated_undo_spaces
|
$13 = {{lsn = 93491365, pages = 640}, {lsn = 93515647, pages = 640}, {lsn = 0, pages = 0} <repeats 125 times>}
|
So, we are missing a check for the LSN somewhere. The page read was initiated here:
#5 0x000055580c887b3f in buf_page_init_for_read (mode=mode@entry=132,
|
page_id=..., zip_size=zip_size@entry=0, unzip=unzip@entry=true)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:121
|
#6 0x000055580c88cf67 in buf_read_page_low (err=err@entry=0x7ffce40ae060,
|
space=space@entry=0x612000023440, sync=sync@entry=false,
|
mode=mode@entry=132, page_id=..., zip_size=zip_size@entry=0,
|
unzip=<optimized out>)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:300
|
#7 0x000055580c894c95 in buf_read_recv_pages (space_id=<optimized out>,
|
page_nos=page_nos@entry=0x7ffce40ae720, n=n@entry=1)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/buf/buf0rea.cc:711
|
#8 0x000055580c352be5 in recv_read_in_area (page_id=..., i=
|
{first = {m_id = 12884901888}, second = {state = page_recv_t::RECV_NOT_PROCESSED, last_offset = 1, log = {head = 0x7ff54a6d92c8, tail = 0x7ff54dd7e600}}}) at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/include/buf0types.h:131
|
#9 0x000055580c36ea76 in recv_sys_t::apply (this=0x55580e0966a0 <recv_sys>,
|
last_batch=last_batch@entry=true)
|
at /data/Server/bb-10.6-MDEV-30400C/storage/innobase/log/log0recv.cc:3344
|
Already at this point, the latest data about undo tablespace truncation is available, and hence we should not have initiated the page read. To be on the safe side, we might both want to avoid initiating the page read, and check in recv_recover_page() if an unnecessary read took place.
|