[MDEV-33137] Assertion `end_lsn == page_lsn' failed in recv_recover_page - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Storage Engine - InnoDB
Labels:
- regression
- rr-profile

Description

11.1 50799752dc6306c35a47405c8f556d0b9639a1a9
2023-12-28 23:21:30 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=159699380
2023-12-28 23:21:31 0 [Note] InnoDB: End of log at LSN=173644026
2023-12-28 23:21:31 0 [Note] InnoDB: To recover: 1098 pages
mysqld: /data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc:3139: buf_block_t* recv_recover_page(buf_block_t, mtr_t&, page_recv_t&, fil_space_t, lsn_t): Assertion `end_lsn == page_lsn' failed.
(rr) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x000014fdbb011859 in __GI_abort () at abort.c:79
#2 0x000014fdbb011729 in __assert_fail_base (fmt=0x14fdbb1a7588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55fb4d1115df "end_lsn == page_lsn",
file=0x55fb4d10f8b8 "/data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc", line=3139, function=<optimized out>) at assert.c:92
#3 0x000014fdbb022fd6 in __GI___assert_fail (assertion=0x55fb4d1115df "end_lsn == page_lsn", file=0x55fb4d10f8b8 "/data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc", line=3139,
function=0x55fb4d111410 "buf_block_t* recv_recover_page(buf_block_t, mtr_t&, page_recv_t&, fil_space_t, lsn_t)") at assert.c:101
#4 0x000055fb4c86451c in recv_recover_page (block=0x68201ad0, mtr=..., recs=..., space=0x55fb4ff8be70, init_lsn=0) at /data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc:3139
#5 0x000055fb4c865308 in recv_recover_page (space=0x55fb4ff8be70, bpage=0x68201ad0) at /data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc:3360
#6 0x000055fb4ca72dd5 in buf_page_t::read_complete (this=0x68201ad0, node=...) at /data/MDEV-33137/11.1/storage/innobase/buf/buf0buf.cc:3532
#7 0x000055fb4cafbccd in IORequest::read_complete (this=0x612b26e7c2b0, io_error=0) at /data/MDEV-33137/11.1/storage/innobase/fil/fil0fil.cc:2980
#8 0x000055fb4c8a9521 in read_io_callback (c=0x612b26e7c208) at /data/MDEV-33137/11.1/storage/innobase/os/os0file.cc:3416
#9 0x000055fb4cb8df7c in tpool::task_group::execute (this=0x55fb4ff5c6c0, t=0x612b26e7c290) at /data/MDEV-33137/11.1/tpool/task_group.cc:70
#10 0x000055fb4cb8e2c8 in tpool::task::execute (this=0x612b26e7b228) at /data/MDEV-33137/11.1/tpool/task.cc:32
#11 0x000055fb4cb877c1 in tpool::thread_pool_generic::worker_main (this=0x55fb4ff18370, thread_var=0x55fb4ff18730) at /data/MDEV-33137/11.1/tpool/tpool_generic.cc:583
#12 0x000055fb4cb8dd70 in std::__invoke_impl<void, void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> (__f=
@0x55fb4ffb1238: (void (tpool::thread_pool_generic::)(class tpool::thread_pool_generic const, struct tpool::worker_data )) 0x55fb4cb876f2 <tpool::thread_pool_generic::worker_main(tpool::worker_data)>, __t=@0x55fb4ffb1230: 0x55fb4ff18370) at /usr/include/c++/9/bits/invoke.h:73
#13 0x000055fb4cb8dc52 in std::__invoke<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> (__fn=
@0x55fb4ffb1238: (void (tpool::thread_pool_generic::)(class tpool::thread_pool_generic const, struct tpool::worker_data )) 0x55fb4cb876f2 <tpool::thread_pool_generic::worker_main(tpool::worker_data)>) at /usr/include/c++/9/bits/invoke.h:95
#14 0x000055fb4cb8db71 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> >::_M_invoke<0ul, 1ul, 2ul> (
this=0x55fb4ffb1228) at /usr/include/c++/9/thread:244
#15 0x000055fb4cb8db0c in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> >::operator() (this=0x55fb4ffb1228)
at /usr/include/c++/9/thread:251
#16 0x000055fb4cb8daec in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> > >::_M_run
(this=0x55fb4ffb1220) at /usr/include/c++/9/thread:195
#17 0x000055fb4e428de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00000c5f66746609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x000014fdbb10e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It looks like the failure appeared after this merge in 10.11

commit 583a7452992acbb3c9bd74122c4f0086b9e16997

Merge: 45fadb64c1c 9c5600adde6

Author: Marko Mäkelä

Date:   Tue Nov 21 10:23:11 2023 +0200

    Merge 10.6 into 10.11

However I couldn't reproduce it on 10.6.

A couple of test runs which reproduce the failure currently fairly well:

randgen 3468ff84518decaa5752778562bfe79444b201c6
# on 11.1, 11.2
perl ./run.pl --compatibility=110299 --engine=InnoDB --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/util/file_key_management_keys.txt --grammar=conf/yy/desc_indexes.yy --scenario=AtomicDDL --filter=conf/ff/restrict_dynamic_vars.ff --gendata=conf/zz/trx_stress.zz --queries=1000000 --threads=6 --duration=200 --mysqld=--max-statement-time=30 --mysqld=--lock-wait-timeout=15 --mysqld=--innodb-lock-wait-timeout=10 --base-port=14000 --basedir=/data/bld/10.11-asan --vardir=/dev/shm/var12 --seed=1703604571 --nometadata-reload

# on 10.11
perl ./run.pl --compatibility=101199 --duration=600 --mysqld=--max-statement-time=60 --mysqld=--lock-wait-timeout=30 --mysqld=--innodb-lock-wait-timeout=15 --threads=6 --mysqld=--in_predicate_conversion_threshold=0 --mysqld=--session_track_transaction_info=STATE --mysqld=--strict_password_validation=OFF --mysqld=--expensive_subquery_limit=1000 --queries=1000000 --reporters=Backtrace,Deadlock,MemoryUsage,FeatureUsage --mysqld=--plugin-maturity=experimental --mysqld=--loose-debug_assert_on_not_freed_memory=0 --gendata=advanced --gendata=simple --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/util/file_key_management_keys.txt --mysqld=--character-set-server=latin1 --mysqld=--collation-server=latin1_swedish_ci --engine=InnoDB --gendata=simple --grammar=conf/yy/replication.yy --grammar=conf/yy/current_timestamp.yy --gendata=conf/zz/current_timestamp.zz --grammar=conf/yy/alter_table.yy --grammar=conf/yy/all_selects.yy:0.0001 --scenario=AtomicDDL --filter=conf/ff/restrict_dynamic_vars.ff --variator=ExecuteAsUnion --base-port=14000 --basedir=/data/bld/10.11-bug-lsn --vardir=/dev/shm/var5 --seed=1701900615

Attachments

Issue Links

is caused by

MDEV-29593 Purge misses a chance to free not-yet-reused undo pages

Closed

MDEV-32820 Race condition between trx_purge_free_segment() and trx_undo_create()

Closed

relates to

MDEV-22110 InnoDB unnecessarily writes unmodified pages

Closed

MDEV-24705 add check that LSN of the last skipped log record equals to FIL_PAGE_LSN field

Closed

MDEV-32050 UNDO logs still growing for write-intensive workloads

Closed

Activity

Ascending order - Click to sort in descending order

Marko Mäkelä added a comment - 2024-01-03 12:34

The assertion failure is directly connected with something that we have been observing on many non-debug builders, related to ~~MDEV-22110~~ and ~~MDEV-24705~~:

			ut_ad(end_lsn == page_lsn);

			if (end_lsn != page_lsn) {

				sql_print_information(

					"InnoDB: The last skipped log record"

					" LSN " LSN_PF

					" is not equal to page LSN " LSN_PF,

					end_lsn, page_lsn);

Marko Mäkelä added a comment - 2024-01-03 12:34 The assertion failure is directly connected with something that we have been observing on many non-debug builders, related to MDEV-22110 and MDEV-24705 : ut_ad(end_lsn == page_lsn); if (end_lsn != page_lsn) { sql_print_information( "InnoDB: The last skipped log record" " LSN " LSN_PF " is not equal to page LSN " LSN_PF, end_lsn, page_lsn); }

Marko Mäkelä added a comment - 2024-01-03 13:36

Here is some data from the recovery run:

11.1 50799752dc6306c35a47405c8f556d0b9639a1a9
(rr) print *l
$3 = {<log_rec_t> = {next = 0x22e056ae9400, lsn = 167115528}, start_lsn = 167115222}
(rr) i lo lsn
page_lsn = 159994270
start_lsn = 0
end_lsn = 159993940
recv_start_lsn = 167115222
(rr) print recv_sys.lsn
$5 = 173644026
(rr) print log_sys.next_checkpoint_lsn
$6 = 159699380

That is, we have a FIL_PAGE_LSN that is somewhere between the recovery start LSN (the checkpoint LSN) and the end of the recovered log. These do correspond to the log output:

11.1 50799752dc6306c35a47405c8f556d0b9639a1a9
2023-12-28 23:21:30 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=159699380
2023-12-28 23:21:31 0 [Note] InnoDB: End of log at LSN=173644026
2023-12-28 23:21:31 0 [Note] InnoDB: To recover: 1098 pages
mysqld: /data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc:3139: buf_block_t* recv_recover_page(buf_block_t, mtr_t&, page_recv_t&, fil_space_t, lsn_t): Assertion `end_lsn == page_lsn' failed.

The last checkpoint had been completed rather shortly before the server had been killed:

11.1 50799752dc6306c35a47405c8f556d0b9639a1a9
Thread 4 hit Hardware watchpoint 3: -location log_sys.next_checkpoint_lsn

Old value = 154809477
New value = 159699380
log_checkpoint_low (oldest_lsn=159699380, end_lsn=167367010) at /data/MDEV-33137/11.1/storage/innobase/buf/buf0flu.cc:1971
1971 log_sys.write_checkpoint(end_lsn);
Current event: 1159849

At that point the oldest_modification of the page 0x100000005 was 167115222, which is exactly the same as the local debug variable recv_start_lsn in the recovery trace. The write of the page (with FIL_PAGE_LSN=159994270) had taken place when log_sys.lsn and log_sys.flushed_to_disk_lsn were both 161771086, at when 1120317 of the server process. At the time of the write completion, the buf_flush_page_cleaner was in a timed wait for the start of its next 1-second slice.

Marko Mäkelä added a comment - 2024-01-03 13:36 Here is some data from the recovery run: 11.1 50799752dc6306c35a47405c8f556d0b9639a1a9 (rr) print *l $3 = {<log_rec_t> = {next = 0x22e056ae9400, lsn = 167115528}, start_lsn = 167115222} (rr) i lo lsn page_lsn = 159994270 start_lsn = 0 end_lsn = 159993940 recv_start_lsn = 167115222 (rr) print recv_sys.lsn $5 = 173644026 (rr) print log_sys.next_checkpoint_lsn $6 = 159699380 That is, we have a FIL_PAGE_LSN that is somewhere between the recovery start LSN (the checkpoint LSN) and the end of the recovered log. These do correspond to the log output: 11.1 50799752dc6306c35a47405c8f556d0b9639a1a9 2023-12-28 23:21:30 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=159699380 2023-12-28 23:21:31 0 [Note] InnoDB: End of log at LSN=173644026 2023-12-28 23:21:31 0 [Note] InnoDB: To recover: 1098 pages mysqld: /data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc:3139: buf_block_t* recv_recover_page(buf_block_t*, mtr_t&, page_recv_t&, fil_space_t*, lsn_t): Assertion `end_lsn == page_lsn' failed. The last checkpoint had been completed rather shortly before the server had been killed: 11.1 50799752dc6306c35a47405c8f556d0b9639a1a9 Thread 4 hit Hardware watchpoint 3: -location log_sys.next_checkpoint_lsn Old value = 154809477 New value = 159699380 log_checkpoint_low (oldest_lsn=159699380, end_lsn=167367010) at /data/MDEV-33137/11.1/storage/innobase/buf/buf0flu.cc:1971 1971 log_sys.write_checkpoint(end_lsn); Current event: 1159849 At that point the oldest_modification of the page 0x100000005 was 167115222, which is exactly the same as the local debug variable recv_start_lsn in the recovery trace. The write of the page (with FIL_PAGE_LSN =159994270) had taken place when log_sys.lsn and log_sys.flushed_to_disk_lsn were both 161771086, at when 1120317 of the server process. At the time of the write completion, the buf_flush_page_cleaner was in a timed wait for the start of its next 1-second slice.

Marko Mäkelä added a comment - 2024-01-03 15:25

After the last write of the undo page 0x100000005 and the server kill, we got a number of mini-transactions that had modified the page:

LSN	`when`	operation
167115528	1155381	`trx_undo_report_row_operation`
167241876	1155458	`trx_t::commit` in `dict_stats_save` during `ANALYZE TABLE`
167753178	1160455	`trx_purge_free_segment` in `purge_truncation_callback`
167753244	1160455	`trx_purge_free_segment` in `purge_truncation_callback`
167753310	1160455	`trx_purge_free_segment` in `purge_truncation_callback`
167753376	1160455	`trx_purge_free_segment` in `purge_truncation_callback`
167753508	1160455	`trx_purge_truncate_rseg_history` in `purge_truncation_callback`

The purge_truncation_callback is something that was refactored into a separate task in ~~MDEV-32050~~. Previously, this code was invoked much less frequently. The purpose of this code is to mark undo pages as freed after all history that they may contain has been purged. The rollback segment header page 0x100000005 itself was never marked as freed, and this execution is with innodb_undo_log_truncate=OFF.

But, we need to answer the question: Where did the LSN 159993940 come from, and what is causing that mismatch? This should be our mini-transaction of interest:

11.1 50799752dc6306c35a47405c8f556d0b9639a1a9
#1 0x00005644c8b6d694 in mach_write_to_8 (b=0x68258010, n=159993940)
at /data/MDEV-33137/11.1/storage/innobase/include/mach0data.inl:342
#2 0x00005644c8b6f82a in mtr_t::commit (this=0x105506e366b0)
at /data/MDEV-33137/11.1/storage/innobase/mtr/mtr0mtr.cc:439
#3 0x00005644c8ca3a90 in trx_purge_free_segment (rseg_hdr=0x68201040, …)
…
(rr) when
Current event: 1117280
(rr) frame 2
#2 0x00005644c8b6f82a in mtr_t::commit (this=0x105506e366b0)
at /data/MDEV-33137/11.1/storage/innobase/mtr/mtr0mtr.cc:439
439 mach_write_to_8(bpage->frame + FIL_PAGE_LSN, m_commit_lsn);
(rr) p/x *m_log.m_first_block.m_data@m_log.m_size
$17 = {0x39, 0x1, 0x5, 0x17, 0xf0, 0xc4, 0x69, 0xaf, 0xf0, 0x1c, 0xb7, 0x25,
0xf0, 0xc4, 0x69, 0xaf, 0xf0, 0x1c, 0x34, 0x1, 0x5, 0x14, 0xf, 0x34, 0x1,
0x5, 0x8, 0xf, 0x46, 0x1, 0x2, 0xcf, 0x64, 0x29, 0xc6, 0x34, 0x1, 0x0, 0x93,
0xa, 0x2, 0x1, 0x3b, 0x35, 0x1, 0x0, 0x2e, 0xb, 0x20, 0x77, 0x1, 0x5, 0x25,
0x34, 0x0, 0x32, 0xb, 0x77, 0x1, 0x2, 0x6e, 0xbc, 0xf8, 0x6, 0xeb, 0x77,
0x1, 0x0, 0xae, 0x6e, 0xe4, 0x57, 0x8d}

The 73 bytes of log records of this mini-transaction are as follows (one per line):

bytes	explanation
0x39, 0x1, 0x5, 0x17, 0xf0, 0xc4, 0x69, 0xaf, 0xf0, 0x1c,	WRITE, page 1:5
0xb7, 0x25, 0xf0, 0xc4, 0x69, 0xaf, 0xf0, 0x1c,	WRITE, same page
0x34, 0x1, 0x5, 0x14, 0xf,	WRITE, page 1:5
0x34, 0x1, 0x5, 0x8, 0xf,	WRITE, page 1:5
0x46, 0x1, 0x2, 0xcf, 0x64, 0x29, 0xc6,	MEMSET, page 1:2
0x34, 0x1, 0x0, 0x93, 0xa,	WRITE, page 1:0
0x2, 0x1, 0x3b,	FREE_PAGE, page 1:0x3b
0x35, 0x1, 0x0, 0x2e, 0xb, 0x20,	WRITE, page 1:0
0x77, 0x1, 0x5, 0x25, 0x34, 0x0, 0x32, 0xb,	OPTION, page 1:5 (~~MDEV-18976~~ checksum)
0x77, 0x1, 0x2, 0x6e, 0xbc, 0xf8, 0x6, 0xeb,	OPTION, page 1:2 (~~MDEV-18976~~ checksum)
0x77, 0x1, 0x0, 0xae, 0x6e, 0xe4, 0x57, 0x8d	OPTION, page 1:0 (~~MDEV-18976~~ checksum)

Because we have innodb_encrypt_log=ON, the log record payload (after the record type and page identifier) is encrypted. That is why the OPTION records do not have the expected subtype byte 0 after the page number. We can observe that some WRITE logging for page 1:5 could be optimized to make use of the ~~MDEV-12353~~ ‘same page’ flag.

These log records were copied to the memory-mapped ib_logfile0 at log_sys.buf + 59342846. The circular log file had wrapped around exactly 1 time. We have log_sys.file_size = 100663296 and log_sys.first_lsn = 12288. Therefore, the start offset of the mini-transaction 59342846 should correspond to the LSN 100663296-12288+59342846 = 159993854. If this calculation is correct, the 159993940 that we saw during recovery should correspond to byte 86, which would be 13 bytes after the end of the above records. In the log_sys.buf the ‘extra’ bytes include the mini-transaction end marker. Because we have innodb_encrypt_log=ON, after the first byte (in this case 0 because the log has wrapped around an odd number of times) we have an 8-byte cryptographic nonce that is roughly the LSN at that time. Finally there is a CRC-32C checksum of the mini-transaction:
0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0x89, 0x4f, 0xfe, 0x60, 0x9d, 0x49, 0x5d
At this point of execution I see log_sys.lsn=159993940=0x9895054, and the nonce is a little smaller, 0x9894ffe.

So, the end_lsn=159993940 during recovery seems to correspond to the end LSN of this mini-transaction in the log. It is also equal to the m_commit_lsn.

After this we have further mini-transactions that modify the page:

LSN	`when`	operation
159994006	1117280	`trx_purge_free_segment`
159994072	1117280	`trx_purge_free_segment`
159994138	1117280	`trx_purge_free_segment`
159994204	1117280	`trx_purge_free_segment`
159994270	1117280	`trx_purge_truncate_rseg_history`
167115528	1155381	`trx_undo_report_row_operation` during `dict_stats_save` for `ANALYZE TABLE`

At recovery, the interesting LSN are 159993940 and 159994270.

This only identifies the place that I need to investigate deeper, tomorrow.

Marko Mäkelä added a comment - 2024-01-03 15:25 After the last write of the undo page 0x100000005 and the server kill, we got a number of mini-transactions that had modified the page: LSN when operation 167115528 1155381 trx_undo_report_row_operation 167241876 1155458 trx_t::commit in dict_stats_save during ANALYZE TABLE 167753178 1160455 trx_purge_free_segment in purge_truncation_callback 167753244 1160455 trx_purge_free_segment in purge_truncation_callback 167753310 1160455 trx_purge_free_segment in purge_truncation_callback 167753376 1160455 trx_purge_free_segment in purge_truncation_callback 167753508 1160455 trx_purge_truncate_rseg_history in purge_truncation_callback The purge_truncation_callback is something that was refactored into a separate task in MDEV-32050 . Previously, this code was invoked much less frequently. The purpose of this code is to mark undo pages as freed after all history that they may contain has been purged. The rollback segment header page 0x100000005 itself was never marked as freed, and this execution is with innodb_undo_log_truncate=OFF . But, we need to answer the question: Where did the LSN 159993940 come from, and what is causing that mismatch? This should be our mini-transaction of interest: 11.1 50799752dc6306c35a47405c8f556d0b9639a1a9 #1 0x00005644c8b6d694 in mach_write_to_8 (b=0x68258010, n=159993940) at /data/MDEV-33137/11.1/storage/innobase/include/mach0data.inl:342 #2 0x00005644c8b6f82a in mtr_t::commit (this=0x105506e366b0) at /data/MDEV-33137/11.1/storage/innobase/mtr/mtr0mtr.cc:439 #3 0x00005644c8ca3a90 in trx_purge_free_segment (rseg_hdr=0x68201040, …) … (rr) when Current event: 1117280 (rr) frame 2 #2 0x00005644c8b6f82a in mtr_t::commit (this=0x105506e366b0) at /data/MDEV-33137/11.1/storage/innobase/mtr/mtr0mtr.cc:439 439 mach_write_to_8(bpage->frame + FIL_PAGE_LSN, m_commit_lsn); (rr) p/x *m_log.m_first_block.m_data@m_log.m_size $17 = {0x39, 0x1, 0x5, 0x17, 0xf0, 0xc4, 0x69, 0xaf, 0xf0, 0x1c, 0xb7, 0x25, 0xf0, 0xc4, 0x69, 0xaf, 0xf0, 0x1c, 0x34, 0x1, 0x5, 0x14, 0xf, 0x34, 0x1, 0x5, 0x8, 0xf, 0x46, 0x1, 0x2, 0xcf, 0x64, 0x29, 0xc6, 0x34, 0x1, 0x0, 0x93, 0xa, 0x2, 0x1, 0x3b, 0x35, 0x1, 0x0, 0x2e, 0xb, 0x20, 0x77, 0x1, 0x5, 0x25, 0x34, 0x0, 0x32, 0xb, 0x77, 0x1, 0x2, 0x6e, 0xbc, 0xf8, 0x6, 0xeb, 0x77, 0x1, 0x0, 0xae, 0x6e, 0xe4, 0x57, 0x8d} The 73 bytes of log records of this mini-transaction are as follows (one per line): bytes explanation 0x39, 0x1, 0x5, 0x17, 0xf0, 0xc4, 0x69, 0xaf, 0xf0, 0x1c, WRITE, page 1:5 0xb7, 0x25, 0xf0, 0xc4, 0x69, 0xaf, 0xf0, 0x1c, WRITE, same page 0x34, 0x1, 0x5, 0x14, 0xf, WRITE, page 1:5 0x34, 0x1, 0x5, 0x8, 0xf, WRITE, page 1:5 0x46, 0x1, 0x2, 0xcf, 0x64, 0x29, 0xc6, MEMSET, page 1:2 0x34, 0x1, 0x0, 0x93, 0xa, WRITE, page 1:0 0x2, 0x1, 0x3b, FREE_PAGE, page 1:0x3b 0x35, 0x1, 0x0, 0x2e, 0xb, 0x20, WRITE, page 1:0 0x77, 0x1, 0x5, 0x25, 0x34, 0x0, 0x32, 0xb, OPTION, page 1:5 ( MDEV-18976 checksum) 0x77, 0x1, 0x2, 0x6e, 0xbc, 0xf8, 0x6, 0xeb, OPTION, page 1:2 ( MDEV-18976 checksum) 0x77, 0x1, 0x0, 0xae, 0x6e, 0xe4, 0x57, 0x8d OPTION, page 1:0 ( MDEV-18976 checksum) Because we have innodb_encrypt_log=ON , the log record payload (after the record type and page identifier) is encrypted. That is why the OPTION records do not have the expected subtype byte 0 after the page number. We can observe that some WRITE logging for page 1:5 could be optimized to make use of the MDEV-12353 ‘same page’ flag. These log records were copied to the memory-mapped ib_logfile0 at log_sys.buf + 59342846 . The circular log file had wrapped around exactly 1 time. We have log_sys.file_size = 100663296 and log_sys.first_lsn = 12288 . Therefore, the start offset of the mini-transaction 59342846 should correspond to the LSN 100663296-12288+59342846 = 159993854. If this calculation is correct, the 159993940 that we saw during recovery should correspond to byte 86, which would be 13 bytes after the end of the above records. In the log_sys.buf the ‘extra’ bytes include the mini-transaction end marker. Because we have innodb_encrypt_log=ON , after the first byte (in this case 0 because the log has wrapped around an odd number of times) we have an 8-byte cryptographic nonce that is roughly the LSN at that time. Finally there is a CRC-32C checksum of the mini-transaction: 0x0, 0x0, 0x0, 0x0, 0x0, 0x9, 0x89, 0x4f, 0xfe, 0x60, 0x9d, 0x49, 0x5d At this point of execution I see log_sys.lsn=159993940=0x9895054 , and the nonce is a little smaller, 0x9894ffe. So, the end_lsn=159993940 during recovery seems to correspond to the end LSN of this mini-transaction in the log. It is also equal to the m_commit_lsn . After this we have further mini-transactions that modify the page: LSN when operation 159994006 1117280 trx_purge_free_segment 159994072 1117280 trx_purge_free_segment 159994138 1117280 trx_purge_free_segment 159994204 1117280 trx_purge_free_segment 159994270 1117280 trx_purge_truncate_rseg_history 167115528 1155381 trx_undo_report_row_operation during dict_stats_save for ANALYZE TABLE At recovery, the interesting LSN are 159993940 and 159994270. This only identifies the place that I need to investigate deeper, tomorrow.

Marko Mäkelä added a comment - 2024-01-04 07:58

A peculiarity about the mini-transaction that ends at LSN 159994006 is that the only log record for the page 1:5 is the OPTION record written by mtr_t::page_checksum() (~~MDEV-18976~~). The record was written because the block was marked as modified, even though there was no modification, here:

11.1 50799752dc6306c35a47405c8f556d0b9639a1a9
#0 0x00005644c8b75145 in mtr_t::set_modified (this=0x105506e366b0, block=...) at /data/MDEV-33137/11.1/storage/innobase/mtr/mtr0mtr.cc:1537
#1 0x00005644c8a9bcea in mtr_t::memcpy_low (this=0x105506e366b0, block=..., offset=50, data=0x68258032, len=6) at /data/MDEV-33137/11.1/storage/innobase/include/mtr0log.h:322
#2 0x00005644c8b98340 in mtr_t::memcpy (this=0x105506e366b0, b=..., offset=50, len=6) at /data/MDEV-33137/11.1/storage/innobase/include/mtr0log.h:310
#3 0x00005644c8b9c713 in mtr_t::memcpy<(mtr_t::write_type)0> (this=0x105506e366b0, b=..., dest=0x68258032, str=0x105506e365b0, len=6) at /data/MDEV-33137/11.1/storage/innobase/include/mtr0log.h:513
#4 0x00005644c8df7e89 in flst_write_addr (block=..., faddr=0x68258032 "\377\377\377\377", page=4294967295, boffset=0, mtr=0x105506e366b0) at /data/MDEV-33137/11.1/storage/innobase/fut/fut0lst.cc:65
#5 0x00005644c8df9de0 in flst_remove (base=0x68201040, boffset=46, cur=0x127749001f70, coffset=120, mtr=0x105506e366b0) at /data/MDEV-33137/11.1/storage/innobase/fut/fut0lst.cc:333
#6 0x00005644c8ca3ef4 in trx_purge_truncate_rseg_history (rseg=..., limit=..., all=true) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:445
#7 0x00005644c8ca47c8 in purge_sys_t::iterator::free_history (this=0x105506e36ad0) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:555
#8 0x00005644c8c90a1f in purge_truncation_callback () at /data/MDEV-33137/11.1/storage/innobase/srv/srv0srv.cc:1104

The ~~MDEV-18976~~ records have only ever been written in debug builds. But, in non-debug builds we occasionally see test failures due to unexpected messages InnoDB: The last skipped log record LSN …. So, the problem is not with mtr_t::page_checksum() but with the incorrect flagging of blocks as modified. In other words, ~~MDEV-22110~~ had not been fully fixed.

Marko Mäkelä added a comment - 2024-01-04 07:58 A peculiarity about the mini-transaction that ends at LSN 159994006 is that the only log record for the page 1:5 is the OPTION record written by mtr_t::page_checksum() ( MDEV-18976 ). The record was written because the block was marked as modified, even though there was no modification, here: 11.1 50799752dc6306c35a47405c8f556d0b9639a1a9 #0 0x00005644c8b75145 in mtr_t::set_modified (this=0x105506e366b0, block=...) at /data/MDEV-33137/11.1/storage/innobase/mtr/mtr0mtr.cc:1537 #1 0x00005644c8a9bcea in mtr_t::memcpy_low (this=0x105506e366b0, block=..., offset=50, data=0x68258032, len=6) at /data/MDEV-33137/11.1/storage/innobase/include/mtr0log.h:322 #2 0x00005644c8b98340 in mtr_t::memcpy (this=0x105506e366b0, b=..., offset=50, len=6) at /data/MDEV-33137/11.1/storage/innobase/include/mtr0log.h:310 #3 0x00005644c8b9c713 in mtr_t::memcpy<(mtr_t::write_type)0> (this=0x105506e366b0, b=..., dest=0x68258032, str=0x105506e365b0, len=6) at /data/MDEV-33137/11.1/storage/innobase/include/mtr0log.h:513 #4 0x00005644c8df7e89 in flst_write_addr (block=..., faddr=0x68258032 "\377\377\377\377", page=4294967295, boffset=0, mtr=0x105506e366b0) at /data/MDEV-33137/11.1/storage/innobase/fut/fut0lst.cc:65 #5 0x00005644c8df9de0 in flst_remove (base=0x68201040, boffset=46, cur=0x127749001f70, coffset=120, mtr=0x105506e366b0) at /data/MDEV-33137/11.1/storage/innobase/fut/fut0lst.cc:333 #6 0x00005644c8ca3ef4 in trx_purge_truncate_rseg_history (rseg=..., limit=..., all=true) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:445 #7 0x00005644c8ca47c8 in purge_sys_t::iterator::free_history (this=0x105506e36ad0) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:555 #8 0x00005644c8c90a1f in purge_truncation_callback () at /data/MDEV-33137/11.1/storage/innobase/srv/srv0srv.cc:1104 The MDEV-18976 records have only ever been written in debug builds. But, in non-debug builds we occasionally see test failures due to unexpected messages InnoDB: The last skipped log record LSN … . So, the problem is not with mtr_t::page_checksum() but with the incorrect flagging of blocks as modified. In other words, MDEV-22110 had not been fully fixed.

Marko Mäkelä added a comment - 2024-01-04 09:41

I was misled by a data watchpoint. The rollback segment header is actually falsely claimed as modified in a later mini-transaction that reuses the same memory buffer:

11.1 50799752dc6306c35a47405c8f556d0b9639a1a9
#2 0x00005644c8b48708 in mtr_t::memo_push (this=0x105506e366b0, block=0x68201040, type=MTR_MEMO_PAGE_X_MODIFY) at /data/MDEV-33137/11.1/storage/innobase/include/mtr0mtr.h:415
#3 0x00005644c8ca3c1a in trx_purge_free_segment (rseg_hdr=0x68201040, block=0x127749001f70, mtr=...) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:379
#4 0x00005644c8ca412b in trx_purge_truncate_rseg_history (rseg=..., limit=..., all=true) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:469
#5 0x00005644c8ca47c8 in purge_sys_t::iterator::free_history (this=0x105506e36ad0) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:555
#6 0x00005644c8c90a1f in purge_truncation_callback () at /data/MDEV-33137/11.1/storage/innobase/srv/srv0srv.cc:1104

At the time of the next mtr_t::commit() call, the mtr_t::m_memo refers to 5 buffer pool pages, all claimed to be modified. In fact, there are log records for only 3 of the pages: 2, 0, 0x3a. The pages 5 and 0x36 were not modified. They were registered as such at the start of the mini-transaction, at the end of the first loop body of trx_purge_free_segment(). I checked for other calls of mtr_t::memo_push() that would set the MODIFY flag. In buf_page_free() it is correct, because the page will actually be modified (or marked as free in the file). All other calls look incorrect to me:

diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc

index b500be548db..efbf2396991 100644

--- a/storage/innobase/trx/trx0purge.cc

+++ b/storage/innobase/trx/trx0purge.cc

@@ -378,8 +378,8 @@ static void trx_purge_free_segment(buf_block_t *rseg_hdr, buf_block_t *block,

     ut_ad(rseg_hdr->page.id() == rseg_hdr_id);

     block->page.lock.x_lock();

     ut_ad(block->page.id() == id);

-    mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_MODIFY);

-    mtr.memo_push(block, MTR_MEMO_PAGE_X_MODIFY);

+    mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_FIX);

+    mtr.memo_push(block, MTR_MEMO_PAGE_X_FIX);

   while (!fseg_free_step(TRX_UNDO_SEG_HDR + TRX_UNDO_FSEG_HEADER +

@@ -502,7 +502,7 @@ trx_purge_truncate_rseg_history(trx_rseg_t &rseg,

   mtr.start();

   rseg_hdr->page.lock.x_lock();

   ut_ad(rseg_hdr->page.id() == rseg.page_id());

-  mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_MODIFY);

+  mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_FIX);

   goto loop;

These calls had been added to MariaDB Server 10.6 in ~~MDEV-32820~~ and ~~MDEV-29593~~. Usually, a call to mtr_t::set_modified() will take care of flagging blocks as modified. I checked all calls to that; they will write log records if the page is persistent.

Marko Mäkelä added a comment - 2024-01-04 09:41 I was misled by a data watchpoint. The rollback segment header is actually falsely claimed as modified in a later mini-transaction that reuses the same memory buffer: 11.1 50799752dc6306c35a47405c8f556d0b9639a1a9 #2 0x00005644c8b48708 in mtr_t::memo_push (this=0x105506e366b0, block=0x68201040, type=MTR_MEMO_PAGE_X_MODIFY) at /data/MDEV-33137/11.1/storage/innobase/include/mtr0mtr.h:415 #3 0x00005644c8ca3c1a in trx_purge_free_segment (rseg_hdr=0x68201040, block=0x127749001f70, mtr=...) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:379 #4 0x00005644c8ca412b in trx_purge_truncate_rseg_history (rseg=..., limit=..., all=true) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:469 #5 0x00005644c8ca47c8 in purge_sys_t::iterator::free_history (this=0x105506e36ad0) at /data/MDEV-33137/11.1/storage/innobase/trx/trx0purge.cc:555 #6 0x00005644c8c90a1f in purge_truncation_callback () at /data/MDEV-33137/11.1/storage/innobase/srv/srv0srv.cc:1104 At the time of the next mtr_t::commit() call, the mtr_t::m_memo refers to 5 buffer pool pages, all claimed to be modified. In fact, there are log records for only 3 of the pages: 2, 0, 0x3a. The pages 5 and 0x36 were not modified. They were registered as such at the start of the mini-transaction, at the end of the first loop body of trx_purge_free_segment() . I checked for other calls of mtr_t::memo_push() that would set the MODIFY flag. In buf_page_free() it is correct, because the page will actually be modified (or marked as free in the file). All other calls look incorrect to me: diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc index b500be548db..efbf2396991 100644 --- a/storage/innobase/trx/trx0purge.cc +++ b/storage/innobase/trx/trx0purge.cc @@ -378,8 +378,8 @@ static void trx_purge_free_segment(buf_block_t *rseg_hdr, buf_block_t *block, ut_ad(rseg_hdr->page.id() == rseg_hdr_id); block->page.lock.x_lock(); ut_ad(block->page.id() == id); - mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_MODIFY); - mtr.memo_push(block, MTR_MEMO_PAGE_X_MODIFY); + mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_FIX); + mtr.memo_push(block, MTR_MEMO_PAGE_X_FIX); } while (!fseg_free_step(TRX_UNDO_SEG_HDR + TRX_UNDO_FSEG_HEADER + @@ -502,7 +502,7 @@ trx_purge_truncate_rseg_history(trx_rseg_t &rseg, mtr.start(); rseg_hdr->page.lock.x_lock(); ut_ad(rseg_hdr->page.id() == rseg.page_id()); - mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_MODIFY); + mtr.memo_push(rseg_hdr, MTR_MEMO_PAGE_X_FIX); goto loop; } These calls had been added to MariaDB Server 10.6 in MDEV-32820 and MDEV-29593 . Usually, a call to mtr_t::set_modified() will take care of flagging blocks as modified. I checked all calls to that; they will write log records if the page is persistent.

Marko Mäkelä added a comment - 2024-01-04 09:42

elenst, can you please test the above patch on a branch where you can easily reproduce this?

Marko Mäkelä added a comment - 2024-01-04 09:42 elenst , can you please test the above patch on a branch where you can easily reproduce this?

Elena Stepanova added a comment - 2024-01-05 23:03 - edited

I couldn't reproduce the failure on 10.11 or 11.2 with the patch above in a reasonable amount of attempts, so it apparently fixes at least the reported failure.

For the record, this is the test I used this time, both for 10.11 and 11.2:

randgen ff3b5a8b62b5735821092c043e718930f8c116ca

perl ./run.pl --compatibility=101199 --duration=600 --mysqld=--max-statement-time=60 --mysqld=--lock-wait-timeout=30 --mysqld=--innodb-lock-wait-timeout=15 --threads=6 --queries=1000000 --reporters=Backtrace  --mysqld=--loose-debug_assert_on_not_freed_memory=0  --gendata=simple --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/util/file_key_management_keys.txt  --engine=InnoDB  --gendata=simple --grammar=conf/yy/dml.yy --grammar=conf/yy/current_timestamp.yy --gendata=conf/zz/current_timestamp.zz    --grammar=conf/yy/all_selects.yy:0.0001 --scenario=AtomicDDL --filter=conf/ff/restrict_dynamic_vars.ff --variator=ExecuteAsUnion --base-port=10700 --basedir=/data/bld/11.2-patch-MDEV-33137/ --vardir=/dev/shm/var-MDEV33137-5 --seed=1701900615 --trials=15 --mysqld=--enforce-storage-engine=InnoDB

Comparing to the tests in the description, it enforces InnoDB storage engine and is somewhat simpler (less DDL), to avoid irrelevant failures. This way, without the patch I got the failure in 1-2 attempts on each branch, and with the patch got a clean run of 15 attempts on each branch.

However, it only checks that there are no crashes, assertion failures or corruption-like errors upon/after crash recovery. It doesn't verify the functionality or performance.

Elena Stepanova added a comment - 2024-01-05 23:03 - edited I couldn't reproduce the failure on 10.11 or 11.2 with the patch above in a reasonable amount of attempts, so it apparently fixes at least the reported failure. For the record, this is the test I used this time, both for 10.11 and 11.2: randgen ff3b5a8b62b5735821092c043e718930f8c116ca perl ./run.pl --compatibility=101199 --duration=600 --mysqld=--max-statement-time=60 --mysqld=--lock-wait-timeout=30 --mysqld=--innodb-lock-wait-timeout=15 --threads=6 --queries=1000000 --reporters=Backtrace --mysqld=--loose-debug_assert_on_not_freed_memory=0 --gendata=simple --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/util/file_key_management_keys.txt --engine=InnoDB --gendata=simple --grammar=conf/yy/dml.yy --grammar=conf/yy/current_timestamp.yy --gendata=conf/zz/current_timestamp.zz --grammar=conf/yy/all_selects.yy:0.0001 --scenario=AtomicDDL --filter=conf/ff/restrict_dynamic_vars.ff --variator=ExecuteAsUnion --base-port=10700 --basedir=/data/bld/11.2-patch-MDEV-33137/ --vardir=/dev/shm/var-MDEV33137-5 --seed=1701900615 --trials=15 --mysqld=--enforce-storage-engine=InnoDB Comparing to the tests in the description, it enforces InnoDB storage engine and is somewhat simpler (less DDL), to avoid irrelevant failures. This way, without the patch I got the failure in 1-2 attempts on each branch, and with the patch got a clean run of 15 attempts on each branch. However, it only checks that there are no crashes, assertion failures or corruption-like errors upon/after crash recovery. It doesn't verify the functionality or performance.

Marko Mäkelä added a comment - 2024-01-08 14:44

Thank you, elenst. I think that the ~~MDEV-18976~~ page checksum records should be rather good at catching recovery errors for DML workloads.

Marko Mäkelä added a comment - 2024-01-08 14:44 Thank you, elenst . I think that the MDEV-18976 page checksum records should be rather good at catching recovery errors for DML workloads.

Debarun Banerjee added a comment - 2024-01-10 07:11

1. Setting MTR_MEMO_PAGE_X_MODIFY without actually modifying the page could result in the problem. So, I agree here but we should work on a mtr test too.

It seems apparently harmless though for release builds

2. There is a second issue with setting MTR_MEMO_PAGE_X_MODIFY directly without the page being modified later.

This could result in mtr commit to add page to flush list without taking the flush order mutex. Which could in turn result in wrong checkpoint calculation and eventual data loss / corruption .
So, we should absolutely remove all direct setting of MTR_MEMO_PAGE_X_MODIFY other than mtr_t::set_modified(). Perhaps somehow enforce it or at least document well.

Perhaps assert in mtr commit
Possibly add a test

3. In purge code perhaps better/cleaner to reacquire the pages as opposed to pinning it. Usually pinning page to keep it alive is a special case and is better to be used when absolutely required. I suspect it is not going to cause any harm during purge if we get the required pages again.

Debarun Banerjee added a comment - 2024-01-10 07:11 1. Setting MTR_MEMO_PAGE_X_MODIFY without actually modifying the page could result in the problem. So, I agree here but we should work on a mtr test too. It seems apparently harmless though for release builds 2. There is a second issue with setting MTR_MEMO_PAGE_X_MODIFY directly without the page being modified later. This could result in mtr commit to add page to flush list without taking the flush order mutex. Which could in turn result in wrong checkpoint calculation and eventual data loss / corruption . So, we should absolutely remove all direct setting of MTR_MEMO_PAGE_X_MODIFY other than mtr_t::set_modified(). Perhaps somehow enforce it or at least document well. Perhaps assert in mtr commit Possibly add a test 3. In purge code perhaps better/cleaner to reacquire the pages as opposed to pinning it. Usually pinning page to keep it alive is a special case and is better to be used when absolutely required. I suspect it is not going to cause any harm during purge if we get the required pages again.

Marko Mäkelä added a comment - 2024-01-10 07:40

debarun, I mostly agree with you. For test coverage, we occasionally had some crash recovery tests failing due to this:

10.6 aff5ed3988a3add17d607e6184288f82b567dbee
innodb_zip.bug56680 '4k,innodb' w9 [ fail ] Found warnings/errors in server log file!
Test ended at 2023-12-17 11:57:59
line
2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1315593 is not equal to page LSN 1315626
2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1315984 is not equal to page LSN 1316017
2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1329658 is not equal to page LSN 1329707
^ Found warnings in /home/buildbot/amd64-ubuntu-2204/build/mysql-test/var/9/log/mysqld.1.err

I do not have any ideas how to improve the regression test suite with respect to this.

I did think of adding an assertion to mtr_t::memo_push() that would disallow parameters that include the MODIFY flag, but I decided against it, because then buf_page_free() would have had to be changed to less optimal code.

I prefer to buffer-fix some blocks across mini-transaction boundaries in operations that need to be split into multiple small mini-transactions. The buffer-fix will not block page writes or log checkpoints, but it will prevent page eviction. I agree that there are some dangers around buffer-fixing, like ~~MDEV-31767~~ taught me. If a block may have just been read into the buffer pool, we must not only acquire a page latch, but we must also verify that the page was not evicted due to corruption (~~MDEV-13542~~). In this case, this validation was already performed when the page was first looked up by buf_page_get_gen(). The buffer-fix will ensure that an already validated block can be safely accessed in a subsequent mini-transaction, after reacquiring the page latch (which will block file I/O again).

The benefit of the buffer-fix trick is twofold:

Correctness: Preventing a hard-to-test scenario that across the mini-transaction boundary, the block was written out, evicted, and read back as corrupted.
Performance: Being able to remove some code for attempting to handle the above scenario. Avoiding expensive operations like Block_hint::buffer_fix_block_if_still_valid() or buf_pool.page_hash lookups.

Note: In ~~MDEV-13542~~, I tried to write the corruption handling of page-freeing operations in such a way that when corruption is detected, we would stop the efforts to free further linked pages, but carry on with the higher-level operation. In other words, we’d prefer to ‘leak’ some pages (not marking them as free). This was extensively tested with some error injection (--debug-dbug=+d,intermittent_read_failure in fil_space_t::io()).

Marko Mäkelä added a comment - 2024-01-10 07:40 debarun , I mostly agree with you. For test coverage, we occasionally had some crash recovery tests failing due to this: 10.6 aff5ed3988a3add17d607e6184288f82b567dbee innodb_zip.bug56680 '4k,innodb' w9 [ fail ] Found warnings/errors in server log file! Test ended at 2023-12-17 11:57:59 line 2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1315593 is not equal to page LSN 1315626 2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1315984 is not equal to page LSN 1316017 2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1329658 is not equal to page LSN 1329707 ^ Found warnings in /home/buildbot/amd64-ubuntu-2204/build/mysql-test/var/9/log/mysqld.1.err I do not have any ideas how to improve the regression test suite with respect to this. I did think of adding an assertion to mtr_t::memo_push() that would disallow parameters that include the MODIFY flag, but I decided against it, because then buf_page_free() would have had to be changed to less optimal code. I prefer to buffer-fix some blocks across mini-transaction boundaries in operations that need to be split into multiple small mini-transactions. The buffer-fix will not block page writes or log checkpoints, but it will prevent page eviction. I agree that there are some dangers around buffer-fixing, like MDEV-31767 taught me. If a block may have just been read into the buffer pool, we must not only acquire a page latch, but we must also verify that the page was not evicted due to corruption ( MDEV-13542 ). In this case, this validation was already performed when the page was first looked up by buf_page_get_gen() . The buffer-fix will ensure that an already validated block can be safely accessed in a subsequent mini-transaction, after reacquiring the page latch (which will block file I/O again). The benefit of the buffer-fix trick is twofold: Correctness: Preventing a hard-to-test scenario that across the mini-transaction boundary, the block was written out, evicted, and read back as corrupted. Performance: Being able to remove some code for attempting to handle the above scenario. Avoiding expensive operations like Block_hint::buffer_fix_block_if_still_valid() or buf_pool.page_hash lookups. Note: In MDEV-13542 , I tried to write the corruption handling of page-freeing operations in such a way that when corruption is detected, we would stop the efforts to free further linked pages, but carry on with the higher-level operation. In other words, we’d prefer to ‘leak’ some pages (not marking them as free). This was extensively tested with some error injection ( --debug-dbug=+d,intermittent_read_failure in fil_space_t::io() ).

Vladislav Lesin added a comment - 2024-01-10 09:24

marko, debarun, thank you for such informative discussion. After all, I think that page fixing is reasonable solution for this case. As for testing, we could use the same technique for setting debug sync points in purge threads as it was used in ~~MDEV-31355~~ for "cursor-restore-locking.test" after I find the reason of the test instability.

Vladislav Lesin added a comment - 2024-01-10 09:24 marko , debarun , thank you for such informative discussion. After all, I think that page fixing is reasonable solution for this case. As for testing, we could use the same technique for setting debug sync points in purge threads as it was used in MDEV-31355 for "cursor-restore-locking.test" after I find the reason of the test instability.

People

Assignee:: Marko Mäkelä

Reporter:: Elena Stepanova

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Dates

Created:: 2023-12-28 23:08

Updated:: 2024-01-11 08:48

Resolved:: 2024-01-10 09:45

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.