[MDEV-33137] Assertion `end_lsn == page_lsn' failed in recv_recover_page - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Blocker
Resolution: Fixed
Affects Version/s: 10.6, 10.11, 11.0(EOL), 11.1(EOL), 11.2(EOL)
Fix Version/s: 10.6.17, 10.11.7, 11.0.5, 11.1.4, 11.2.3, 11.3.2
Component/s: Storage Engine - InnoDB
Labels:
- regression
- rr-profile

Description

11.1 50799752dc6306c35a47405c8f556d0b9639a1a9
2023-12-28 23:21:30 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=159699380
2023-12-28 23:21:31 0 [Note] InnoDB: End of log at LSN=173644026
2023-12-28 23:21:31 0 [Note] InnoDB: To recover: 1098 pages
mysqld: /data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc:3139: buf_block_t* recv_recover_page(buf_block_t, mtr_t&, page_recv_t&, fil_space_t, lsn_t): Assertion `end_lsn == page_lsn' failed.
(rr) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x000014fdbb011859 in __GI_abort () at abort.c:79
#2 0x000014fdbb011729 in __assert_fail_base (fmt=0x14fdbb1a7588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x55fb4d1115df "end_lsn == page_lsn",
file=0x55fb4d10f8b8 "/data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc", line=3139, function=<optimized out>) at assert.c:92
#3 0x000014fdbb022fd6 in __GI___assert_fail (assertion=0x55fb4d1115df "end_lsn == page_lsn", file=0x55fb4d10f8b8 "/data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc", line=3139,
function=0x55fb4d111410 "buf_block_t* recv_recover_page(buf_block_t, mtr_t&, page_recv_t&, fil_space_t, lsn_t)") at assert.c:101
#4 0x000055fb4c86451c in recv_recover_page (block=0x68201ad0, mtr=..., recs=..., space=0x55fb4ff8be70, init_lsn=0) at /data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc:3139
#5 0x000055fb4c865308 in recv_recover_page (space=0x55fb4ff8be70, bpage=0x68201ad0) at /data/MDEV-33137/11.1/storage/innobase/log/log0recv.cc:3360
#6 0x000055fb4ca72dd5 in buf_page_t::read_complete (this=0x68201ad0, node=...) at /data/MDEV-33137/11.1/storage/innobase/buf/buf0buf.cc:3532
#7 0x000055fb4cafbccd in IORequest::read_complete (this=0x612b26e7c2b0, io_error=0) at /data/MDEV-33137/11.1/storage/innobase/fil/fil0fil.cc:2980
#8 0x000055fb4c8a9521 in read_io_callback (c=0x612b26e7c208) at /data/MDEV-33137/11.1/storage/innobase/os/os0file.cc:3416
#9 0x000055fb4cb8df7c in tpool::task_group::execute (this=0x55fb4ff5c6c0, t=0x612b26e7c290) at /data/MDEV-33137/11.1/tpool/task_group.cc:70
#10 0x000055fb4cb8e2c8 in tpool::task::execute (this=0x612b26e7b228) at /data/MDEV-33137/11.1/tpool/task.cc:32
#11 0x000055fb4cb877c1 in tpool::thread_pool_generic::worker_main (this=0x55fb4ff18370, thread_var=0x55fb4ff18730) at /data/MDEV-33137/11.1/tpool/tpool_generic.cc:583
#12 0x000055fb4cb8dd70 in std::__invoke_impl<void, void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> (__f=
@0x55fb4ffb1238: (void (tpool::thread_pool_generic::)(class tpool::thread_pool_generic const, struct tpool::worker_data )) 0x55fb4cb876f2 <tpool::thread_pool_generic::worker_main(tpool::worker_data)>, __t=@0x55fb4ffb1230: 0x55fb4ff18370) at /usr/include/c++/9/bits/invoke.h:73
#13 0x000055fb4cb8dc52 in std::__invoke<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> (__fn=
@0x55fb4ffb1238: (void (tpool::thread_pool_generic::)(class tpool::thread_pool_generic const, struct tpool::worker_data )) 0x55fb4cb876f2 <tpool::thread_pool_generic::worker_main(tpool::worker_data)>) at /usr/include/c++/9/bits/invoke.h:95
#14 0x000055fb4cb8db71 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> >::_M_invoke<0ul, 1ul, 2ul> (
this=0x55fb4ffb1228) at /usr/include/c++/9/thread:244
#15 0x000055fb4cb8db0c in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> >::operator() (this=0x55fb4ffb1228)
at /usr/include/c++/9/thread:251
#16 0x000055fb4cb8daec in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::)(tpool::worker_data), tpool::thread_pool_generic, tpool::worker_data> > >::_M_run
(this=0x55fb4ffb1220) at /usr/include/c++/9/thread:195
#17 0x000055fb4e428de4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#18 0x00000c5f66746609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#19 0x000014fdbb10e133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It looks like the failure appeared after this merge in 10.11

commit 583a7452992acbb3c9bd74122c4f0086b9e16997

Merge: 45fadb64c1c 9c5600adde6

Author: Marko Mäkelä

Date:   Tue Nov 21 10:23:11 2023 +0200

    Merge 10.6 into 10.11

However I couldn't reproduce it on 10.6.

A couple of test runs which reproduce the failure currently fairly well:

randgen 3468ff84518decaa5752778562bfe79444b201c6
# on 11.1, 11.2
perl ./run.pl --compatibility=110299 --engine=InnoDB --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/util/file_key_management_keys.txt --grammar=conf/yy/desc_indexes.yy --scenario=AtomicDDL --filter=conf/ff/restrict_dynamic_vars.ff --gendata=conf/zz/trx_stress.zz --queries=1000000 --threads=6 --duration=200 --mysqld=--max-statement-time=30 --mysqld=--lock-wait-timeout=15 --mysqld=--innodb-lock-wait-timeout=10 --base-port=14000 --basedir=/data/bld/10.11-asan --vardir=/dev/shm/var12 --seed=1703604571 --nometadata-reload

# on 10.11
perl ./run.pl --compatibility=101199 --duration=600 --mysqld=--max-statement-time=60 --mysqld=--lock-wait-timeout=30 --mysqld=--innodb-lock-wait-timeout=15 --threads=6 --mysqld=--in_predicate_conversion_threshold=0 --mysqld=--session_track_transaction_info=STATE --mysqld=--strict_password_validation=OFF --mysqld=--expensive_subquery_limit=1000 --queries=1000000 --reporters=Backtrace,Deadlock,MemoryUsage,FeatureUsage --mysqld=--plugin-maturity=experimental --mysqld=--loose-debug_assert_on_not_freed_memory=0 --gendata=advanced --gendata=simple --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/util/file_key_management_keys.txt --mysqld=--character-set-server=latin1 --mysqld=--collation-server=latin1_swedish_ci --engine=InnoDB --gendata=simple --grammar=conf/yy/replication.yy --grammar=conf/yy/current_timestamp.yy --gendata=conf/zz/current_timestamp.zz --grammar=conf/yy/alter_table.yy --grammar=conf/yy/all_selects.yy:0.0001 --scenario=AtomicDDL --filter=conf/ff/restrict_dynamic_vars.ff --variator=ExecuteAsUnion --base-port=14000 --basedir=/data/bld/10.11-bug-lsn --vardir=/dev/shm/var5 --seed=1701900615

Attachments

Issue Links

is caused by

MDEV-29593 Purge misses a chance to free not-yet-reused undo pages

Closed

MDEV-32820 Race condition between trx_purge_free_segment() and trx_undo_create()

Closed

relates to

MDEV-22110 InnoDB unnecessarily writes unmodified pages

Closed

MDEV-24705 add check that LSN of the last skipped log record equals to FIL_PAGE_LSN field

Closed

MDEV-32050 UNDO logs still growing for write-intensive workloads

Closed

Activity

Ascending order - Click to sort in descending order

View 6 older comments

Elena Stepanova added a comment - 2024-01-05 23:03 - edited

I couldn't reproduce the failure on 10.11 or 11.2 with the patch above in a reasonable amount of attempts, so it apparently fixes at least the reported failure.

For the record, this is the test I used this time, both for 10.11 and 11.2:

randgen ff3b5a8b62b5735821092c043e718930f8c116ca

perl ./run.pl --compatibility=101199 --duration=600 --mysqld=--max-statement-time=60 --mysqld=--lock-wait-timeout=30 --mysqld=--innodb-lock-wait-timeout=15 --threads=6 --queries=1000000 --reporters=Backtrace  --mysqld=--loose-debug_assert_on_not_freed_memory=0  --gendata=simple --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/util/file_key_management_keys.txt  --engine=InnoDB  --gendata=simple --grammar=conf/yy/dml.yy --grammar=conf/yy/current_timestamp.yy --gendata=conf/zz/current_timestamp.zz    --grammar=conf/yy/all_selects.yy:0.0001 --scenario=AtomicDDL --filter=conf/ff/restrict_dynamic_vars.ff --variator=ExecuteAsUnion --base-port=10700 --basedir=/data/bld/11.2-patch-MDEV-33137/ --vardir=/dev/shm/var-MDEV33137-5 --seed=1701900615 --trials=15 --mysqld=--enforce-storage-engine=InnoDB

Comparing to the tests in the description, it enforces InnoDB storage engine and is somewhat simpler (less DDL), to avoid irrelevant failures. This way, without the patch I got the failure in 1-2 attempts on each branch, and with the patch got a clean run of 15 attempts on each branch.

However, it only checks that there are no crashes, assertion failures or corruption-like errors upon/after crash recovery. It doesn't verify the functionality or performance.

Elena Stepanova added a comment - 2024-01-05 23:03 - edited I couldn't reproduce the failure on 10.11 or 11.2 with the patch above in a reasonable amount of attempts, so it apparently fixes at least the reported failure. For the record, this is the test I used this time, both for 10.11 and 11.2: randgen ff3b5a8b62b5735821092c043e718930f8c116ca perl ./run.pl --compatibility=101199 --duration=600 --mysqld=--max-statement-time=60 --mysqld=--lock-wait-timeout=30 --mysqld=--innodb-lock-wait-timeout=15 --threads=6 --queries=1000000 --reporters=Backtrace --mysqld=--loose-debug_assert_on_not_freed_memory=0 --gendata=simple --mysqld=--innodb-encrypt-tables --mysqld=--innodb-encrypt-log --mysqld=--innodb-encryption-threads=4 --mysqld=--plugin-load-add=file_key_management --mysqld=--file-key-management-filename=`pwd`/util/file_key_management_keys.txt --engine=InnoDB --gendata=simple --grammar=conf/yy/dml.yy --grammar=conf/yy/current_timestamp.yy --gendata=conf/zz/current_timestamp.zz --grammar=conf/yy/all_selects.yy:0.0001 --scenario=AtomicDDL --filter=conf/ff/restrict_dynamic_vars.ff --variator=ExecuteAsUnion --base-port=10700 --basedir=/data/bld/11.2-patch-MDEV-33137/ --vardir=/dev/shm/var-MDEV33137-5 --seed=1701900615 --trials=15 --mysqld=--enforce-storage-engine=InnoDB Comparing to the tests in the description, it enforces InnoDB storage engine and is somewhat simpler (less DDL), to avoid irrelevant failures. This way, without the patch I got the failure in 1-2 attempts on each branch, and with the patch got a clean run of 15 attempts on each branch. However, it only checks that there are no crashes, assertion failures or corruption-like errors upon/after crash recovery. It doesn't verify the functionality or performance.

Marko Mäkelä added a comment - 2024-01-08 14:44

Thank you, elenst. I think that the ~~MDEV-18976~~ page checksum records should be rather good at catching recovery errors for DML workloads.

Marko Mäkelä added a comment - 2024-01-08 14:44 Thank you, elenst . I think that the MDEV-18976 page checksum records should be rather good at catching recovery errors for DML workloads.

Debarun Banerjee added a comment - 2024-01-10 07:11

1. Setting MTR_MEMO_PAGE_X_MODIFY without actually modifying the page could result in the problem. So, I agree here but we should work on a mtr test too.

It seems apparently harmless though for release builds

2. There is a second issue with setting MTR_MEMO_PAGE_X_MODIFY directly without the page being modified later.

This could result in mtr commit to add page to flush list without taking the flush order mutex. Which could in turn result in wrong checkpoint calculation and eventual data loss / corruption .
So, we should absolutely remove all direct setting of MTR_MEMO_PAGE_X_MODIFY other than mtr_t::set_modified(). Perhaps somehow enforce it or at least document well.

Perhaps assert in mtr commit
Possibly add a test

3. In purge code perhaps better/cleaner to reacquire the pages as opposed to pinning it. Usually pinning page to keep it alive is a special case and is better to be used when absolutely required. I suspect it is not going to cause any harm during purge if we get the required pages again.

Debarun Banerjee added a comment - 2024-01-10 07:11 1. Setting MTR_MEMO_PAGE_X_MODIFY without actually modifying the page could result in the problem. So, I agree here but we should work on a mtr test too. It seems apparently harmless though for release builds 2. There is a second issue with setting MTR_MEMO_PAGE_X_MODIFY directly without the page being modified later. This could result in mtr commit to add page to flush list without taking the flush order mutex. Which could in turn result in wrong checkpoint calculation and eventual data loss / corruption . So, we should absolutely remove all direct setting of MTR_MEMO_PAGE_X_MODIFY other than mtr_t::set_modified(). Perhaps somehow enforce it or at least document well. Perhaps assert in mtr commit Possibly add a test 3. In purge code perhaps better/cleaner to reacquire the pages as opposed to pinning it. Usually pinning page to keep it alive is a special case and is better to be used when absolutely required. I suspect it is not going to cause any harm during purge if we get the required pages again.

Marko Mäkelä added a comment - 2024-01-10 07:40

debarun, I mostly agree with you. For test coverage, we occasionally had some crash recovery tests failing due to this:

10.6 aff5ed3988a3add17d607e6184288f82b567dbee
innodb_zip.bug56680 '4k,innodb' w9 [ fail ] Found warnings/errors in server log file!
Test ended at 2023-12-17 11:57:59
line
2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1315593 is not equal to page LSN 1315626
2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1315984 is not equal to page LSN 1316017
2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1329658 is not equal to page LSN 1329707
^ Found warnings in /home/buildbot/amd64-ubuntu-2204/build/mysql-test/var/9/log/mysqld.1.err

I do not have any ideas how to improve the regression test suite with respect to this.

I did think of adding an assertion to mtr_t::memo_push() that would disallow parameters that include the MODIFY flag, but I decided against it, because then buf_page_free() would have had to be changed to less optimal code.

I prefer to buffer-fix some blocks across mini-transaction boundaries in operations that need to be split into multiple small mini-transactions. The buffer-fix will not block page writes or log checkpoints, but it will prevent page eviction. I agree that there are some dangers around buffer-fixing, like ~~MDEV-31767~~ taught me. If a block may have just been read into the buffer pool, we must not only acquire a page latch, but we must also verify that the page was not evicted due to corruption (~~MDEV-13542~~). In this case, this validation was already performed when the page was first looked up by buf_page_get_gen(). The buffer-fix will ensure that an already validated block can be safely accessed in a subsequent mini-transaction, after reacquiring the page latch (which will block file I/O again).

The benefit of the buffer-fix trick is twofold:

Correctness: Preventing a hard-to-test scenario that across the mini-transaction boundary, the block was written out, evicted, and read back as corrupted.
Performance: Being able to remove some code for attempting to handle the above scenario. Avoiding expensive operations like Block_hint::buffer_fix_block_if_still_valid() or buf_pool.page_hash lookups.

Note: In ~~MDEV-13542~~, I tried to write the corruption handling of page-freeing operations in such a way that when corruption is detected, we would stop the efforts to free further linked pages, but carry on with the higher-level operation. In other words, we’d prefer to ‘leak’ some pages (not marking them as free). This was extensively tested with some error injection (--debug-dbug=+d,intermittent_read_failure in fil_space_t::io()).

Marko Mäkelä added a comment - 2024-01-10 07:40 debarun , I mostly agree with you. For test coverage, we occasionally had some crash recovery tests failing due to this: 10.6 aff5ed3988a3add17d607e6184288f82b567dbee innodb_zip.bug56680 '4k,innodb' w9 [ fail ] Found warnings/errors in server log file! Test ended at 2023-12-17 11:57:59 line 2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1315593 is not equal to page LSN 1315626 2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1315984 is not equal to page LSN 1316017 2023-12-17 11:57:58 0 [Warning] InnoDB: The last skipped log record LSN 1329658 is not equal to page LSN 1329707 ^ Found warnings in /home/buildbot/amd64-ubuntu-2204/build/mysql-test/var/9/log/mysqld.1.err I do not have any ideas how to improve the regression test suite with respect to this. I did think of adding an assertion to mtr_t::memo_push() that would disallow parameters that include the MODIFY flag, but I decided against it, because then buf_page_free() would have had to be changed to less optimal code. I prefer to buffer-fix some blocks across mini-transaction boundaries in operations that need to be split into multiple small mini-transactions. The buffer-fix will not block page writes or log checkpoints, but it will prevent page eviction. I agree that there are some dangers around buffer-fixing, like MDEV-31767 taught me. If a block may have just been read into the buffer pool, we must not only acquire a page latch, but we must also verify that the page was not evicted due to corruption ( MDEV-13542 ). In this case, this validation was already performed when the page was first looked up by buf_page_get_gen() . The buffer-fix will ensure that an already validated block can be safely accessed in a subsequent mini-transaction, after reacquiring the page latch (which will block file I/O again). The benefit of the buffer-fix trick is twofold: Correctness: Preventing a hard-to-test scenario that across the mini-transaction boundary, the block was written out, evicted, and read back as corrupted. Performance: Being able to remove some code for attempting to handle the above scenario. Avoiding expensive operations like Block_hint::buffer_fix_block_if_still_valid() or buf_pool.page_hash lookups. Note: In MDEV-13542 , I tried to write the corruption handling of page-freeing operations in such a way that when corruption is detected, we would stop the efforts to free further linked pages, but carry on with the higher-level operation. In other words, we’d prefer to ‘leak’ some pages (not marking them as free). This was extensively tested with some error injection ( --debug-dbug=+d,intermittent_read_failure in fil_space_t::io() ).

Vladislav Lesin added a comment - 2024-01-10 09:24

marko, debarun, thank you for such informative discussion. After all, I think that page fixing is reasonable solution for this case. As for testing, we could use the same technique for setting debug sync points in purge threads as it was used in ~~MDEV-31355~~ for "cursor-restore-locking.test" after I find the reason of the test instability.

Vladislav Lesin added a comment - 2024-01-10 09:24 marko , debarun , thank you for such informative discussion. After all, I think that page fixing is reasonable solution for this case. As for testing, we could use the same technique for setting debug sync points in purge threads as it was used in MDEV-31355 for "cursor-restore-locking.test" after I find the reason of the test instability.

MariaDB Server

Assertion `end_lsn == page_lsn' failed in recv_recover_page

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration