[MDEV-32681] Test case innodb.undo_truncate_recover frequent failure Created: 2023-11-04  Updated: 2023-12-05  Resolved: 2023-11-17

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5
Fix Version/s: 10.5.24

Type: Bug Priority: Major
Reporter: Kristian Nielsen Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None

Attachments: File data.tar.xz    
Issue Links:
Blocks
Relates
relates to MDEV-26055 Adaptive flushing is still not gettin... Closed
relates to MDEV-26827 Make page flushing even faster Closed
relates to MDEV-32757 innodb_undo_log_truncate=ON is not cr... Closed

 Description   

innodb.undo_truncate_recover '16k,2,innodb,strict_full_crc32' w15 [ fail ]
        Test ended at 2023-11-03 15:59:19
 
CURRENT_TEST: innodb.undo_truncate_recover
--- /home/buildbot/amd64-debian-11-msan/build/mysql-test/suite/innodb/r/undo_truncate_recover.result	2023-11-03 15:04:02.000000000 +0000
+++ /home/buildbot/amd64-debian-11-msan/build/mysql-test/suite/innodb/r/undo_truncate_recover.reject	2023-11-03 15:59:19.194596785 +0000
@@ -11,5 +11,5 @@
 drop table t1;
 call mtr.add_suppression("InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces");
 SET GLOBAL innodb_fast_shutdown=0;
-FOUND 1 /ib_undo_trunc/ in mysqld.1.err
+NOT FOUND /ib_undo_trunc/ in mysqld.1.err
 # restart: with restart_parameters

When it happens, the following appear in the mysqld.1.err:

2023-11-03 15:58:19 0 [Note] InnoDB: Truncating .//undo001
2023-11-03 15:58:19 0 [Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing innodb_buffer_pool_size. Pending flushes (fsync) log: 0; buffer pool: 0. 1491 OS file reads, 1420 OS file writes, 231 OS fsyncs.
231103 15:59:17 [ERROR] mysqld got signal 6 ;

I think the signal 6 might be because of a timeout waiting for the server to shut down (the server is manually restarted in the test case at this point).

The server is at this point stuck during shutdown in trx_purge_truncate_history(), trying to create the rsegs for the truncated tablespace, but not able to get a buffer pool page, as per the printout in the error log:

#12 0x00007f662f46d90c in std::condition_variable::wait(std::unique_lock<std::mutex>&) () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#13 0x0000559380bb0bab in tpool::waitable_task::wait (this=0x55938203ece0 <purge_coordinator_task>, lk=...) at /kvm/src/my/sn-mdev30255/tpool/task.cc:79
#14 0x0000559380bb0bf9 in tpool::waitable_task::wait (this=0x55938203ece0 <purge_coordinator_task>) at /kvm/src/my/sn-mdev30255/tpool/task.cc:85
#15 0x00005593809b6216 in srv_purge_shutdown () at /kvm/src/my/sn-mdev30255/storage/innobase/srv/srv0srv.cc:2048
#16 0x00005593809be4a3 in innodb_preshutdown () at /kvm/src/my/sn-mdev30255/storage/innobase/srv/srv0start.cc:2003
#17 0x000055938043c47c in plugin_pre_shutdown (plugin=0x559382a31d78) at /kvm/src/my/sn-mdev30255/sql/handler.cc:1000
#18 0x00005593800ff526 in plugin_foreach_with_mask (thd=0x0, func=0x55938043c43b <plugin_pre_shutdown(THD*, plugin_ref, void*)>, type=1, state_mask=10, arg=0x0) at /kvm/src/my/sn-mdev30255/sql/sql_plugin.cc:2553
#19 0x000055938043c4a8 in ha_pre_shutdown () at /kvm/src/my/sn-mdev30255/sql/handler.cc:1007
#20 0x000055937ffa6c10 in mysqld_main (argc=170, argv=0x559382731150) at /kvm/src/my/sn-mdev30255/sql/mysqld.cc:5734
#21 0x000055937ff9d465 in main (argc=29, argv=0x7fff57493358) at /kvm/src/my/sn-mdev30255/sql/main.cc:25
 
Thread 2 (Thread 1407920.1407930 (mariadbd)):
#0  buf_LRU_get_free_block (have_mutex=false) at /kvm/src/my/sn-mdev30255/storage/innobase/buf/buf0lru.cc:486
#1  0x0000559380b2e530 in fsp_page_create (space=0x5593828ab850, offset=49, mtr=0x7f66248f55f0) at /kvm/src/my/sn-mdev30255/storage/innobase/fsp/fsp0fsp.cc:1080
#2  0x0000559380b2eb05 in fsp_alloc_free_page (space=0x5593828ab850, hint=0, mtr=0x7f66248f55f0, init_mtr=0x7f66248f55f0) at /kvm/src/my/sn-mdev30255/storage/innobase/fsp/fsp0fsp.cc:1197
#3  0x0000559380b323db in fseg_alloc_free_page_low (space=0x5593828ab850, seg_inode=0x7f662e0762b2 "", iblock=0x7f662dbd3020, hint=0, direction=111 'o', has_done_reservation=false, mtr=0x7f66248f55f0, init_mtr=0x7f66248f55f0) at /kvm/src/my/sn-mdev30255/storage/innobase/fsp/fsp0fsp.cc:2136
#4  0x0000559380b30a27 in fseg_create (space=0x5593828ab850, byte_offset=62, mtr=0x7f66248f55f0, has_done_reservation=false, block=0x0) at /kvm/src/my/sn-mdev30255/storage/innobase/fsp/fsp0fsp.cc:1759
#5  0x00005593809fda94 in trx_rseg_header_create (space=0x5593828ab850, rseg_id=93, max_trx_id=37, sys_header=0x7f662dbd2c00, mtr=0x7f66248f55f0) at /kvm/src/my/sn-mdev30255/storage/innobase/trx/trx0rseg.cc:321
#6  0x00005593809debaa in trx_purge_truncate_history () at /kvm/src/my/sn-mdev30255/storage/innobase/trx/trx0purge.cc:737
#7  0x00005593809b57f1 in srv_do_purge (n_total_purged=0x7f66248f5b18) at /kvm/src/my/sn-mdev30255/storage/innobase/srv/srv0srv.cc:1847
#8  0x00005593809b5d92 in purge_coordinator_callback_low () at /kvm/src/my/sn-mdev30255/storage/innobase/srv/srv0srv.cc:1945
#9  0x00005593809b5e8f in purge_coordinator_callback () at /kvm/src/my/sn-mdev30255/storage/innobase/srv/srv0srv.cc:1974
#10 0x0000559380bb0678 in tpool::task_group::execute (this=0x55938203ec40 <purge_coordinator_task_group>, t=0x55938203ece0 <purge_coordinator_task>) at /kvm/src/my/sn-mdev30255/tpool/task_group.cc:55
#11 0x0000559380bb09ce in tpool::task::execute (this=0x55938203ece0 <purge_coordinator_task>) at /kvm/src/my/sn-mdev30255/tpool/task.cc:47
#12 0x0000559380baa527 in tpool::thread_pool_generic::worker_main (this=0x55938279d090, thread_var=0x55938279d5c0) at /kvm/src/my/sn-mdev30255/tpool/tpool_generic.cc:599
#13 0x0000559380bb0418 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=@0x7f6614001798: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x559380baa492 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x7f6614001790: 0x55938279d090) at /usr/include/c++/10/bits/invoke.h:73
#14 0x0000559380bb0308 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=@0x7f6614001798: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x559380baa492 <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/10/bits/invoke.h:95
#15 0x0000559380bb023b in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x7f6614001788) at /usr/include/c++/10/thread:264
#16 0x0000559380bb01d8 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x7f6614001788) at /usr/include/c++/10/thread:271
#17 0x0000559380bb01bc in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x7f6614001780) at /usr/include/c++/10/thread:215
#18 0x00007f662f472ed0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#19 0x00007f662f22bea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#20 0x00007f662f149a2f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I was able to reproduce with --rr, have the run saved currently.

I see that the code is locking all dirty buffer pool pages in the undo tablespace, then moving on to re-creating all rseg header pages. I wonder if it managed to lock all pages in the buffer pool, thus locking itself out from creating new pages? Maybe there should be a check if the code is trying to lock more than say 50% of buffer pool pages, and if so flush some pages explicitly or abort and try again later. Or better if the pages can be dropped from the buffer pool without flushing, but not sure if that can be done in a crash/recover -safe way.

In 10.5, the test failure is easy to trigger with 16k pages, I can see it hangs when it has locked + created around 500 pages (with 8M buffer pool). Cannot be reproduced with 16M buffer pool.

In 10.6, the failure no longer occurs. Not sure if this is accidental or if the code is fixed. In 10.6 the code still locks all modified pages in the to-be-truncated undo tablespace, but the number of pages is smaller so the code doesn't reach 500 pages locked+created which triggered the hang in 10.5.



 Comments   
Comment by Marko Mäkelä [ 2023-11-04 ]

I think that this would be fixed by backporting the fixes of MDEV-26827 and MDEV-26055 to 10.5.

Comment by Kristian Nielsen [ 2023-11-04 ]

Maybe we can just add --innodb-buffer-pool-size=16M to the undo_truncate_recover.opt to make the failure in buildbot go away?

Even if there's a theoretical bug here, it might not occur practically that all pages in the buffer pool (except for <64) are dirty pages from a single undo tablespace... which seems to be what is triggering the hang in this test failure.

marko, what do you think?

Comment by Marko Mäkelä [ 2023-11-09 ]

I can reproduce this easily. Between the time the following breakpoint was hit:

10.5 c68620df48cd1b0e7f5f1cb5ff8b905a5d9c80a5

#0  sql_print_warning (format=format@entry=0x55774dbea4a6 "InnoDB: %s") at /mariadb/10.5/sql/log.cc:9272
#1  0x000055774eaae451 in ib::warn::~warn (this=this@entry=0x7fc799ef1e40, __in_chrg=<optimized out>) at /usr/include/c++/13/bits/basic_string.h:222
#2  0x000055774e8e0517 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /mariadb/10.5/storage/innobase/buf/buf0lru.cc:472

and the SIGABRT was sent because the shutdown took more than 60 seconds, the buf_pool.free.count remained at 0. But, the SIGABRT arrived not too much after the message had been sent. I maintain that this thread is not genuinely hung, but freeing the blocks is taking a lot of time.

On the server restart after the SIGABRT, we got something more to worry about:

10.5 c68620df48cd1b0e7f5f1cb5ff8b905a5d9c80a5

2023-11-09 11:44:01 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=20379452,20379452
2023-11-09 11:44:01 0 [Note] InnoDB: Opened 2 undo tablespaces
2023-11-09 11:44:01 0 [Warning] InnoDB: innodb_undo_tablespaces=0 disables dedicated undo log tablespaces
2023-11-09 11:44:01 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 56386 row operations to undo
2023-11-09 11:44:01 0 [Note] InnoDB: Trx id counter is 37
2023-11-09 11:44:01 0 [Note] InnoDB: 128 rollback segments in 2 undo tablespaces are active.
2023-11-09 11:44:01 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2023-11-09 11:44:01 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2023-11-09 11:44:01 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2023-11-09 11:44:01 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2023-11-09 11:44:01 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
mariadbd: /mariadb/10.5/storage/innobase/buf/buf0buf.cc:3123: buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, const char*, unsigned int, mtr_t*, dberr_t*, bool): Assertion `mode == 16 || mode == 12 || fix_block->page.status != buf_page_t::FREED' failed.

I tried starting up MariaDB Server 10.6 with a copy of that data directory data.tar.xz, and it crashes during the rollback as well:

10.6 04477bd9364e7a753918c730b89e7012b1926e6a

#0  __assert_fail (assertion=0x5635444ce928 "mode == 16 || mode == 12", file=0x563544523a98 "/mariadb/10.6/storage/innobase/buf/buf0buf.cc", line=2552, function=0x5635445b2b90 "buf_block_t* buf_page_get_low(page_id_t, ulint, ulint, buf_block_t*, ulint, mtr_t*, dberr_t*, bool)") at ./assert/assert.c:101
#1  0x0000563545178675 in buf_page_get_low (page_id=page_id@entry={m_id = 4294967843}, zip_size=zip_size@entry=0, rw_latch=rw_latch@entry=2, guess=<optimized out>, guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f2604a0ad80, err=0x7f2604a0ad7c, allow_ibuf_merge=false) at /mariadb/10.6/storage/innobase/buf/buf0buf.cc:2552
#2  0x0000563545179554 in buf_page_get_gen (page_id={m_id = 4294967843}, zip_size=zip_size@entry=0, rw_latch=rw_latch@entry=2, guess=guess@entry=0x0, mode=mode@entry=10, mtr=mtr@entry=0x7f2604a0ad80, err=0x7f2604a0ad7c, allow_ibuf_merge=false) at /mariadb/10.6/storage/innobase/buf/buf0buf.cc:2930
#3  0x00005635453a6b0e in trx_undo_truncate_end (
    undo=@0x5635468569e0: {id = 0, state = 1, trx_id = 31, xid = {formatID = -1, gtrid_length = 94786795125117, bqual_length = 94786797348224, data = "\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000\000P\221nY\375\177\000\000\004\000\000\000\001\000\000\000g\022*E5V\000\000\200\245bE5V\000\000\004\000\000\000\001", '\000' <repeats 11 times>, "\210\246bE5V\000\000P\214nY\375\177\000\000\203\257@E5V\000\000\200\245bE5V\000\000}\271@E5V\000\000\200\245bE5V\000\000\000\000\000\000\000\000\000\000\001\000\000\000\000\000\000"}, dict_operation = false, rseg = 0x563545633a00 <trx_sys+25536>, hdr_page_no = 68, last_page_no = 547, hdr_offset = 521, size = 451, top_page_no = 546, top_offset = 16176, top_undo_no = 56260, guess_block = 0x0, undo_list = {prev = 0x0, next = 0x0}}, limit=56261, is_temp=is_temp@entry=false) at /mariadb/10.6/storage/innobase/trx/trx0undo.cc:807
#4  0x00005635453a6cc6 in trx_undo_try_truncate (…)
#5  0x000056354535ca71 in row_undo_rec_get (node=node@entry=0x7f25f80018b0) at /mariadb/10.6/storage/innobase/row/row0undo.cc:267
#6  0x000056354535cf73 in row_undo (node=node@entry=0x7f25f80018b0, thr=thr@entry=0x7f25f80016d0) at /mariadb/10.6/storage/innobase/row/row0undo.cc:388
#7  0x000056354535daff in row_undo_step (thr=thr@entry=0x7f25f80016d0) at /mariadb/10.6/storage/innobase/row/row0undo.cc:438
#8  0x00005635452ec27e in que_thr_step (thr=thr@entry=0x7f25f80016d0) at /mariadb/10.6/storage/innobase/que/que0que.cc:586
#9  0x00005635452ec46f in que_run_threads_low (thr=thr@entry=0x7f25f80016d0) at /mariadb/10.6/storage/innobase/que/que0que.cc:644
#10 0x00005635452ec52c in que_run_threads (thr=0x7f25f80016d0) at /mariadb/10.6/storage/innobase/que/que0que.cc:664
#11 0x000056354539506c in trx_rollback_active (trx=trx@entry=0x7f288fbffb80) at /mariadb/10.6/storage/innobase/trx/trx0roll.cc:613
#12 0x0000563545396221 in trx_rollback_recovered (all=all@entry=true) at /mariadb/10.6/storage/innobase/trx/trx0roll.cc:751
#13 0x00005635453964af in trx_rollback_all_recovered () at /mariadb/10.6/storage/innobase/trx/trx0roll.cc:797

I will have to analyze and fix this as well.

Backporting the fixes from 10.6 should make buf_flush_page_cleaner() guarantee some spare capacity in buf_pool.free.count.

Comment by Marko Mäkelä [ 2023-11-09 ]

The crash occurs when the 56261st row of the recovered transaction 31 is being rolled back. Initially there were 56386 rows that had to be rolled back. The immediate reason is that trx_undo_truncate_end() had already freed the page 547 in undo tablespace 1 during a previous iteration of the loop:

    if (undo.last_page_no != undo.hdr_page_no)
    {
      err= trx_undo_free_last_page(&undo, &mtr);
      if (UNIV_UNLIKELY(err != DB_SUCCESS))
        goto func_exit;
      undo.rseg->latch.wr_unlock();
      mtr.commit();
      continue;
    }

The function trx_undo_free_page() returned the same page number to trx_undo_free_last_page(), so we would seem to attempt a double-free in a possibly infinite loop. The list TRX_UNDO_SEG_HDR + TRX_UNDO_PAGE_LIST on the header page 68 would appear to be corrupted. Before the server had been killed by the shutdown timeout, this part of the page was last touched in a read into the buffer pool. Similarly, during the failed crash recovery, there was no change to the header page at all before the first call to trx_undo_free_page() for the 56261st row of transaction 31.

I think that this corruption would be easier to debug by changing the test to use a large buffer pool and by adding a forced kill of the server. In that way, the page would remain at the same virtual memory address for its entire lifetime in the being-killed server process. In any case, the copy of the data directory data.tar.xz is not sufficient for determining the cause of this corruption.

Comment by Marko Mäkelä [ 2023-11-09 ]

Transaction 31 was actually committed before the server was killed due to the shutdown hang. It had written 60,000 undo log records, corresponding to the following SQL statements:

let SEARCH_PATTERN = ib_undo_trunc;
begin;
update t1 set c = 'MariaDB';
update t1 set c = 'InnoDB';
eval set global debug_dbug = '+d,$SEARCH_PATTERN';
commit;

The LSN of the transaction commit was 20365073, and the log was durably written up to that point, and ultimately to 20379464, which also is the recv_sys.recovered_lsn of the recovery. So, this transaction should have been recovered as committed, and no rollback should have been attempted in the first place.

Comment by Marko Mäkelä [ 2023-11-09 ]

The undo log page 68 state was changed from TRX_UNDO_ACTIVE to TRX_UNDO_TO_PURGE during the commit of transaction 31:

10.5 c68620df48cd1b0e7f5f1cb5ff8b905a5d9c80a5

#1  0x000055774eaa9de7 in trx_undo_set_state_at_finish (undo=0x557750a0adc0, mtr=mtr@entry=0x7fc789daef00) at /mariadb/10.5/storage/innobase/trx/trx0undo.cc:1321
#2  0x000055774ea8897c in trx_purge_add_undo_to_history (trx=trx@entry=0x7fc7a0170188, undo=@0x7fc7a0171228: 0x557750a0adc0, mtr=mtr@entry=0x7fc789daef00) at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:219
#3  0x000055774eaa2fdd in trx_write_serialisation_history (trx=trx@entry=0x7fc7a0170188, mtr=mtr@entry=0x7fc789daef00) at /mariadb/10.5/storage/innobase/trx/trx0trx.cc:1089
#4  0x000055774eaa3414 in trx_t::commit_low (this=this@entry=0x7fc7a0170188, mtr=0x7fc789daef00) at /mariadb/10.5/storage/innobase/trx/trx0trx.cc:1455
#5  0x000055774eaa35f3 in trx_t::commit (this=this@entry=0x7fc7a0170188) at /mariadb/10.5/storage/innobase/trx/trx0trx.cc:1502

The FIL_PAGE_LSN of this page at the time the SIGABRT is received is 20379428, and the oldest_modification is 0, that is, any changes had supposedly been written back. But, on recovery, we read the an older version of the page, with the FIL_PAGE_LSN being only 19400054. This looks like a "write side" error. The page had been wrongly marked as clean here:

10.5 c68620df48cd1b0e7f5f1cb5ff8b905a5d9c80a5

#0  buf_page_t::clear_oldest_modification (this=0x7fc79b8082a0, temporary=false) at /mariadb/10.5/storage/innobase/include/buf0buf.h:2250
#1  0x000055774ea8aa46 in trx_purge_truncate_history () at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:681
#2  0x000055774ea6ef4b in srv_do_purge (n_total_purged=n_total_purged@entry=0x7fc799ef2bb8) at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1847
#3  0x000055774ea6f15f in purge_coordinator_callback_low () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1945
#4  0x000055774ea71706 in purge_coordinator_callback () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1974

At this point, the log_sys.flushed_to_disk_lsn as well as log_sys.lsn had already reached 20379464, the value at the time of the SIGABRT. But, we had not yet durably written any log record for truncating the undo tablespace. The error occurs when the log checkpoint is advanced from 19417270 to 20379452 by the buf_flush_page_cleaner() thread a little later. At that point, our undo tablespace truncation thread was busy allocating buffer pool pages:

10.5 c68620df48cd1b0e7f5f1cb5ff8b905a5d9c80a5

#11 0x000055774e99f57c in binary_semaphore::wait (this=0x7fc799ef36a0) at /mariadb/10.5/storage/innobase/log/log0sync.cc:126
#12 0x000055774e99f8f2 in group_commit_lock::acquire (this=this@entry=0x55774f607b00 <flush_lock>, num=num@entry=20379464) at /mariadb/10.5/storage/innobase/log/log0sync.cc:189
#13 0x000055774e98bf6d in log_write_up_to (lsn=20379464, flush_to_disk=flush_to_disk@entry=true, rotate_key=rotate_key@entry=false) at /mariadb/10.5/storage/innobase/log/log0log.cc:822
#14 0x000055774e98c14b in log_buffer_flush_to_disk (sync=sync@entry=true) at /mariadb/10.5/storage/innobase/log/log0log.cc:858
#15 0x000055774e8ea008 in buf_flush_LRU (max_n=32) at /mariadb/10.5/storage/innobase/buf/buf0flu.cc:1742
#16 0x000055774e8e01d5 in buf_LRU_get_free_block (have_mutex=have_mutex@entry=false) at /mariadb/10.5/storage/innobase/buf/buf0lru.cc:504
#17 0x000055774e8d4d56 in fsp_page_create (space=space@entry=0x5577509eba60, offset=51, mtr=mtr@entry=0x7fc799ef2680) at /mariadb/10.5/storage/innobase/fsp/fsp0fsp.cc:1080
#18 0x000055774e8d5344 in fsp_alloc_free_page (space=space@entry=0x5577509eba60, hint=<optimized out>, hint@entry=0, mtr=mtr@entry=0x7fc799ef2680, init_mtr=init_mtr@entry=0x7fc799ef2680) at /mariadb/10.5/storage/innobase/fsp/fsp0fsp.cc:1197
#19 0x000055774e8d61e4 in fseg_alloc_free_page_low (space=space@entry=0x5577509eba60, seg_inode=seg_inode@entry=0x7fc79bcb6432 "", iblock=0x7fc79b818ec0, hint=hint@entry=0, direction=direction@entry=111 'o', has_done_reservation=has_done_reservation@entry=false, mtr=0x7fc799ef2680, init_mtr=0x7fc799ef2680) at /mariadb/10.5/storage/innobase/fsp/fsp0fsp.cc:2136
#20 0x000055774e8d6b48 in fseg_create (space=space@entry=0x5577509eba60, byte_offset=byte_offset@entry=62, mtr=mtr@entry=0x7fc799ef2680, has_done_reservation=has_done_reservation@entry=false, block=block@entry=0x0) at /mariadb/10.5/storage/innobase/fsp/fsp0fsp.cc:1759
#21 0x000055774ea9ccea in trx_rseg_header_create (space=space@entry=0x5577509eba60, rseg_id=rseg_id@entry=97, max_trx_id=37, sys_header=sys_header@entry=0x7fc79b825620, mtr=mtr@entry=0x7fc799ef2680) at /mariadb/10.5/storage/innobase/trx/trx0rseg.cc:321
#22 0x000055774ea8ad7e in trx_purge_truncate_history () at /mariadb/10.5/storage/innobase/trx/trx0purge.cc:737
#23 0x000055774ea6ef4b in srv_do_purge (n_total_purged=n_total_purged@entry=0x7fc799ef2bb8) at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1847
#24 0x000055774ea6f15f in purge_coordinator_callback_low () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1945
#25 0x000055774ea71706 in purge_coordinator_callback () at /mariadb/10.5/storage/innobase/srv/srv0srv.cc:1974

In summary, the extremely slow buffer pool eviction in 10.5 let us find a recovery bug in innodb_undo_log_truncate=ON. It turns out that the following patch will fix this:

diff --git a/storage/innobase/trx/trx0purge.cc b/storage/innobase/trx/trx0purge.cc
index a297ca4921e..28a58c00772 100644
--- a/storage/innobase/trx/trx0purge.cc
+++ b/storage/innobase/trx/trx0purge.cc
@@ -677,10 +677,7 @@ void trx_purge_truncate_history()
         ut_ad(bpage->io_fix() == BUF_IO_NONE);
 
         if (bpage->oldest_modification() > 1)
-        {
-          bpage->clear_oldest_modification(false);
           mtr.memo_push(block, MTR_MEMO_PAGE_X_FIX);
-        }
         else
         {
           rw_lock_x_unlock(&block->lock);

That call is not only harmful but also redundant, because mtr_t::commit_shrink() would execute the following in Shrink::operator()(mtr_memo_slot_t*):

      if (bpage.oldest_modification() > 1)
        bpage.clear_oldest_modification(false);
      slot->type= static_cast<mtr_memo_type_t>(slot->type & ~MTR_MEMO_MODIFY);

However, the above patch is not enough. We would need to adjust the following condition in fsp_page_create():

  if (UNIV_UNLIKELY(space->is_being_truncated))
  {
    // …
    if (block && block->page.oldest_modification() <= 1)
      block= nullptr;

Before the call to clear_oldest_modification() was removed, this condition would hold and we would end up allocating another block for the same page identifier. This is actually related to running out of the buffer pool! Without the assignment block=nullptr we would hit an assertion failure page.state() != BUF_BLOCK_FILE_PAGE in buf_block_t::initialise() later in fsp_page_create():

      free_block= block;
      goto got_free_block;
  }
// …
got_free_block:
  block= buf_page_create(space, static_cast<uint32_t>(offset),
                         space->zip_size(), mtr, free_block);

I think that we must revise fsp_page_create() as well as Shrink::operator()(mtr_memo_slot_t*) so that the x-latched blocks of the mini-transaction will only be marked clean if they were not modified by the mini-transaction. In that way, we should be able to reuse the already x-latched blocks in the buffer pool.

Comment by Marko Mäkelä [ 2023-11-10 ]

I filed MDEV-32757 for the recovery bug. My current work-in-progress fix of that does not prevent the test from occasionally running out of buffer pool from time to time. It might have reduced the failure rate somewhat.

Comment by Marko Mäkelä [ 2023-11-10 ]

I ported MDEV-26055 to 10.5. It seems to fix this test failure. I do not think it makes sense to port MDEV-26827. It could depend more on MDEV-24142 and other refactoring that is only present in 10.6.

Comment by Matthias Leich [ 2023-11-16 ]

origin/10.5-MDEV-32681 4ffeaec2dfe591d4961b80c95a6863a3100cbba6 2023-11-14T09:13:39+02:00
performed well in RQG testing.

Generated at Thu Feb 08 10:33:11 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.