[MDEV-30584] Assertion `state < buf_page_t::UNFIXED || page_id_t(page_get_space_id(block->page.frame), page_get_page_no(block->page.frame)) == block->page.id()' in mtr_t::page_lock Created: 2023-02-06  Updated: 2023-03-24

Status: Open
Project: MariaDB Server
Component/s: Encryption, Storage Engine - InnoDB
Affects Version/s: 10.6.6
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Matthias Leich Assignee: Thirunarayanan Balathandayuthapani
Resolution: Unresolved Votes: 0
Labels: rr-profile-analyzed

Attachments: File simp_TBR-1227.cfg     File simp_TBR-1227.yy    

 Description   

GIT_SHOW: HEAD, origin/bb-10.6-release 6dd07fbd042132729e5dea0a6a9b35f07a33e9e5 2022-02-03T16:51:08+02:00
 
mysqld: /data/Server/bb-10.6-releaseB/storage/innobase/mtr/mtr0mtr.cc:1207: void mtr_t::page_lock(buf_block_t*, ulint): Assertion `state < buf_page_t::UNFIXED || page_id_t(page_get_space_id(block->page.frame), page_get_page_no(block->page.frame)) == block->page.id()' failed.
...
(rr) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f66fe3cc859 in __GI_abort () at abort.c:79
#2  0x00007f66fe3cc729 in __assert_fail_base (fmt=0x7f66fe562588 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=0x564faa3ecf18 "state < buf_page_t::UNFIXED || page_id_t(page_get_space_id(block->page.frame), page_get_page_no(block->page.frame)) == block->page.id()", 
    file=0x564faa3ebf18 "/data/Server/bb-10.6-releaseB/storage/innobase/mtr/mtr0mtr.cc", line=1207, function=<optimized out>) at assert.c:92
#3  0x00007f66fe3ddf36 in __GI___assert_fail (
    assertion=0x564faa3ecf18 "state < buf_page_t::UNFIXED || page_id_t(page_get_space_id(block->page.frame), page_get_page_no(block->page.frame)) == block->page.id()", 
    file=0x564faa3ebf18 "/data/Server/bb-10.6-releaseB/storage/innobase/mtr/mtr0mtr.cc", line=1207, function=0x564faa3ece78 "void mtr_t::page_lock(buf_block_t*, ulint)") at assert.c:101
#4  0x0000564fa9b7c77d in mtr_t::page_lock (this=0x607509018820, block=0x607509051ca0, rw_latch=2) at /data/Server/bb-10.6-releaseB/storage/innobase/mtr/mtr0mtr.cc:1207
#5  0x0000564fa9d581eb in buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=12, mtr=0x607509018820, err=0x0, allow_ibuf_merge=false)
    at /data/Server/bb-10.6-releaseB/storage/innobase/buf/buf0buf.cc:2957
#6  0x0000564fa9d58712 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=12, mtr=0x607509018820, err=0x0, allow_ibuf_merge=false)
    at /data/Server/bb-10.6-releaseB/storage/innobase/buf/buf0buf.cc:3028
#7  0x0000564fa9df4411 in fil_crypt_get_page_throttle (state=0x607509018d60, offset=312, mtr=0x607509018820, sleeptime_ms=0x6075090187c8)
    at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0crypt.cc:1705
#8  0x0000564fa9df4793 in fil_crypt_rotate_page (key_state=0x607509018d54, state=0x607509018d60) at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0crypt.cc:1788
#9  0x0000564fa9df4eef in fil_crypt_rotate_pages (key_state=0x607509018d54, state=0x607509018d60) at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0crypt.cc:1925
#10 0x0000564fa9df5785 in fil_crypt_thread () at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0crypt.cc:2099
#11 0x0000564fa9d7c08f in std::__invoke_impl<void, void (*)()> (__f=@0x669f0c03b718: 0x564fa9df559f <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:60
#12 0x0000564fa9d7c03b in std::__invoke<void (*)()> (__fn=@0x669f0c03b718: 0x564fa9df559f <fil_crypt_thread()>) at /usr/include/c++/9/bits/invoke.h:95
#13 0x0000564fa9d7bfdc in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x669f0c03b718) at /usr/include/c++/9/thread:244
#14 0x0000564fa9d7bfae in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x669f0c03b718) at /usr/include/c++/9/thread:251
#15 0x0000564fa9d7bf8e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x669f0c03b710) at /usr/include/c++/9/thread:195
#16 0x00007ffc844ecde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#17 0x0000607509023609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#18 0x00007f66fe4c9293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
pluto:/data/results/1675702417/TBR-1227$ _RR_TRACE_DIR=./1/rr/ rr replay --mark-stdio
(rr)
 
# git clone https://github.com/mleich1/rqg --branch <pick the right branch> RQG
#
# GIT_SHOW: HEAD -> master, origin/master, origin/HEAD 624894590f3b4e59919b9c621783265b584e5482 2023-01-30T17:12:10+01:00
# rqg.pl  : Version 4.2.1 (2022-12)
#
# $RQG_HOME/rqg.pl \
# --duration=300 \
# --queries=10000000 \
# --no_mask \
# --seed=random \
# --max_gd_duration=1200 \
# --skip_gendata \
# --engine=InnoDB \
# --rpl_mode=none \
# --gendata_sql=simp_TBR-1227.sql \
# --mysqld=--log-output=none \
# --mysqld=--wait_timeout=28800 \
# --mysqld=--loose-innodb_read_only_compressed=OFF \
# --mysqld=--plugin-load-add=file_key_management.so \
# --mysqld=--innodb_page_size=64K \
# --mysqld=--slave_net_timeout=60 \
# --mysqld=--log_bin_trust_function_creators=1 \
# --mysqld=--loose-debug_assert_on_not_freed_memory=0 \
# --mysqld=--loose-table_lock_wait_timeout=50 \
# --mysqld=--loose_innodb_lock_schedule_algorithm=fcfs \
# --mysqld=--loose-max-statement-time=30 \
# --mysqld=--lock-wait-timeout=86400 \
# --mysqld=--file-key-management-filename=$RQG_HOME/conf/mariadb/encryption_keys.txt \
# --mysqld=--net_read_timeout=30 \
# --mysqld=--loose-innodb_fatal_semaphore_wait_threshold=300 \
# --mysqld=--interactive_timeout=28800 \
# --mysqld=--log-bin \
# --mysqld=--loose-idle_transaction_timeout=0 \
# --mysqld=--net_write_timeout=60 \
# --mysqld=--innodb-lock-wait-timeout=50 \
# --mysqld=--loose-idle_readonly_transaction_timeout=0 \
# --mysqld=--connect_timeout=60 \
# --mysqld=--innodb-buffer-pool-size=24M \
# --mysqld=--loose-plugin-load-add=provider_lz4.so \
# --mysqld=--sync-binlog=1 \
# --mysqld=--loose-idle_write_transaction_timeout=0 \
# --reporters=Backtrace,Deadlock,None \
# --validators=None \
# --grammar=simp_TBR-1227.yy \
# --threads=1 \
# <local settings>
 
I tried to derive some MTR test but had no luck.
 
simp_TBR-1227.yy
----------------------------
my_int:
    { if (not defined $my_int) { $my_int = 1 } else { ++$my_int } ; return $my_int } ;
 
thread1:
    INSERT INTO t1 (col1, col_text) VALUES ( my_int , REPEAT(SUBSTR(CAST( $my_int AS CHAR),1,1), @fill_amount) ) ;
 
thread1_connect:
    SET AUTOCOMMIT = 0; SET @fill_amount = (@@innodb_page_size * 2 ) + 1 ;
 
thread1_init:
    SET GLOBAL innodb_encrypt_tables = 1 , GLOBAL innodb_random_read_ahead = 1 , GLOBAL innodb_encryption_threads = 1 ; CREATE TABLE t1 ( col1 INT, col_text TEXT ) ENGINE = InnoDB ROW_FORMAT = Dynamic ;



 Comments   
Comment by Thirunarayanan Balathandayuthapani [ 2023-02-07 ]

Thread 10 hit Hardware watchpoint 3: -location block->page.zip.fix.m._M_i
 
Old value = 2147483648
New value = 536870912
std::__atomic_base<unsigned int>::fetch_sub (__m=std::memory_order_relaxed, __i=1610612736, this=0x607509051cf8)
    at /usr/include/c++/9/bits/atomic_base.h:551
551	      { return __atomic_fetch_sub(&_M_i, __i, int(__m)); }
(rr) where
#0  std::__atomic_base<unsigned int>::fetch_sub (__m=std::memory_order_relaxed, __i=1610612736, this=0x607509051cf8)
    at /usr/include/c++/9/bits/atomic_base.h:551
#1  Atomic_relaxed<unsigned int>::fetch_sub (this=0x607509051cf8, i=1610612736, o=std::memory_order_relaxed)
    at /data/Server/bb-10.6-releaseB/include/my_atomic_wrapper.h:55
#2  0x0000564fa9d5bc80 in buf_page_t::read_complete (this=0x607509051ca0, node=...)
    at /data/Server/bb-10.6-releaseB/storage/innobase/buf/buf0buf.cc:3737
#3  0x0000564fa9de828c in fil_aio_callback (request=...) at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0fil.cc:2930
#4  0x0000564fa9b89466 in io_callback (cb=0x67f8598baee0) at /data/Server/bb-10.6-releaseB/storage/innobase/os/os0file.cc:3604
#5  0x0000564fa9e7628a in tpool::task_group::execute (this=0x564facd75400, t=0x67f8598baf38)
    at /data/Server/bb-10.6-releaseB/tpool/task_group.cc:55
#6  0x0000564fa9e765c4 in tpool::task::execute (this=0x67f8598baf38) at /data/Server/bb-10.6-releaseB/tpool/task.cc:32
#7  0x0000564fa9e6efa7 in tpool::thread_pool_generic::worker_main (this=0x564facd31260, thread_var=0x564facd40d80)
    at /data/Server/bb-10.6-releaseB/tpool/tpool_generic.cc:549
#8  0x0000564fa9e76092 in std::__invoke_impl<void, void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__f=
    @0x581f100028d8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x564fa9e6ef0e <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>, __t=@0x581f100028d0: 0x564facd31260) at /usr/include/c++/9/bits/invoke.h:73
#9  0x0000564fa9e75f74 in std::__invoke<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> (__fn=
    @0x581f100028d8: (void (tpool::thread_pool_generic::*)(tpool::thread_pool_generic * const, tpool::worker_data *)) 0x564fa9e6ef0e <tpool::thread_pool_generic::worker_main(tpool::worker_data*)>) at /usr/include/c++/9/bits/invoke.h:95
#10 0x0000564fa9e75e93 in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::_M_invoke<0ul, 1ul, 2ul> (this=0x581f100028c8) at /usr/include/c++/9/thread:244
#11 0x0000564fa9e75e2e in std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> >::operator() (this=0x581f100028c8) at /usr/include/c++/9/thread:251
#12 0x0000564fa9e75e0e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (tpool::thread_pool_generic::*)(tpool::worker_data*), tpool::thread_pool_generic*, tpool::worker_data*> > >::_M_run (this=0x581f100028c0) at /usr/include/c++/9/thread:195
#13 0x00007ffc844ecde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#14 0x0000607509023609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#15 0x00007f66fe4c9293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
 
(rr) when
Current event: 37835
 
(rr) p read_id
$2 = {m_id = 0}

Above page is an uninitialized page and we mark the page state as UNFIXED.

Encryption threads tries to acquire the page and we end up crashing in

mysqld: /data/Server/bb-10.6-releaseB/storage/innobase/mtr/mtr0mtr.cc:1207: void mtr_t::page_lock(buf_block_t*, ulint): Assertion `state < buf_page_t::UNFIXED || page_id_t(page_get_space_id(block->page.frame), page_get_page_no(block->page.frame)) == block->page.id()'

 
Thread 3 hit Hardware watchpoint 2: -location block->page.zip.fix
 
Old value = {m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 536870912}, <No data fields>}}
New value = {m = {<std::__atomic_base<unsigned int>> = {static _S_alignment = 4, _M_i = 536870913}, <No data fields>}}
 
Thread 3 hit Hardware watchpoint 3: -location block->page.zip.fix.m._M_i
 
Old value = 536870912
New value = 536870913
std::__atomic_base<unsigned int>::fetch_add (__m=std::memory_order_relaxed, __i=1, this=0x607509051cf8)
    at /usr/include/c++/9/bits/atomic_base.h:541
541	      { return __atomic_fetch_add(&_M_i, __i, int(__m)); }
(rr) where
#0  std::__atomic_base<unsigned int>::fetch_add (__m=std::memory_order_relaxed, __i=1, this=0x607509051cf8)
    at /usr/include/c++/9/bits/atomic_base.h:541
#1  Atomic_relaxed<unsigned int>::fetch_add (this=0x607509051cf8, i=1, o=std::memory_order_relaxed)
    at /data/Server/bb-10.6-releaseB/include/my_atomic_wrapper.h:53
#2  0x0000564fa9b1a18d in buf_page_t::fix (this=0x607509051ca0, count=1) at /data/Server/bb-10.6-releaseB/storage/innobase/include/buf0buf.h:865
#3  0x0000564fa9d56a2f in buf_page_get_low (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=12, mtr=0x607509018820, err=0x0, 
    allow_ibuf_merge=false) at /data/Server/bb-10.6-releaseB/storage/innobase/buf/buf0buf.cc:2557
#4  0x0000564fa9d58712 in buf_page_get_gen (page_id=..., zip_size=0, rw_latch=2, guess=0x0, mode=12, mtr=0x607509018820, err=0x0, 
    allow_ibuf_merge=false) at /data/Server/bb-10.6-releaseB/storage/innobase/buf/buf0buf.cc:3028
#5  0x0000564fa9df4411 in fil_crypt_get_page_throttle (state=0x607509018d60, offset=312, mtr=0x607509018820, sleeptime_ms=0x6075090187c8)
    at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0crypt.cc:1705
#6  0x0000564fa9df4793 in fil_crypt_rotate_page (key_state=0x607509018d54, state=0x607509018d60)
    at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0crypt.cc:1788
#7  0x0000564fa9df4eef in fil_crypt_rotate_pages (key_state=0x607509018d54, state=0x607509018d60)
    at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0crypt.cc:1925
#8  0x0000564fa9df5785 in fil_crypt_thread () at /data/Server/bb-10.6-releaseB/storage/innobase/fil/fil0crypt.cc:2099
#9  0x0000564fa9d7c08f in std::__invoke_impl<void, void (*)()> (__f=@0x669f0c03b718: 0x564fa9df559f <fil_crypt_thread()>)
    at /usr/include/c++/9/bits/invoke.h:60
#10 0x0000564fa9d7c03b in std::__invoke<void (*)()> (__fn=@0x669f0c03b718: 0x564fa9df559f <fil_crypt_thread()>)
    at /usr/include/c++/9/bits/invoke.h:95
#11 0x0000564fa9d7bfdc in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0ul> (this=0x669f0c03b718) at /usr/include/c++/9/thread:244
#12 0x0000564fa9d7bfae in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x669f0c03b718) at /usr/include/c++/9/thread:251
#13 0x0000564fa9d7bf8e in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x669f0c03b710)
    at /usr/include/c++/9/thread:195
#14 0x00007ffc844ecde4 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#15 0x0000607509023609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#16 0x00007f66fe4c9293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(rr) when
Current event: 37891

In fil_crypt_get_page_throttle(), we need to check whether page is allocated before fetching the buffer pool?

        buf_block_t* block = buf_page_get_gen(page_id, zip_size, RW_X_LATCH,
                                              NULL,
                                              BUF_PEEK_IF_IN_POOL, mtr);
        if (block != NULL) {
                /* page was in buffer pool */
                state->crypt_stat.pages_read_from_cache++;
                return block;
        }
 
        if (space->is_stopping()) {
                return NULL;
        }
 
        if (DB_SUCCESS_LOCKED_REC
            != fseg_page_is_allocated(space, state->offset)) {
                /* page is already freed */
                return NULL;
        }

Comment by Marko Mäkelä [ 2023-02-07 ]

The purpose of the assertion in mtr_t::page_lock() is to catch any access to buffer pool pages that contain the wrong page number. I suppose that this page frame is filled with all zeroes. It likely is not a sign of a real problem, because we are invoking fseg_page_is_allocated().

We might simply want to relax this assertion for BUF_PEEK_IF_IN_POOL. I see that the call in btr_cur_pess_upd_restore_supremum() could be replaced with a call to mtr_t::get_already_latched(); in btr_search_drop_page_hash_when_freed() we could use a different (possibly new) mode. When a page is being freed, we would want move it to the least recently used end of buf_pool.LRU so that the garbage page can be evicted and reused soon.

Generated at Thu Feb 08 10:17:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.