[MDEV-29126] DB is crashing repeatedly with error message "InnoDB: Rec offset 99, cur1 offset 1723, cur2 offset 16095" Created: 2022-07-18  Updated: 2022-07-20  Resolved: 2022-07-20

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.13
Fix Version/s: 10.6.9, 10.7.5, 10.8.4, 10.9.2, 10.10.1

Type: Bug Priority: Major
Reporter: Michael Qin (Inactive) Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: None

Issue Links:
Duplicate
duplicates MDEV-13542 Crashing on a corrupted page is unhel... Closed
Relates
relates to MDEV-27734 Set innodb_change_buffering=none by d... Closed

 Description   

Initial crash:

2022-07-09  6:10:39 0 [ERROR] [FATAL] InnoDB: Rec offset 99, cur1 offset 1723, cur2 offset 16095
220709  6:10:39 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.5.13-MariaDB-log
key_buffer_size=16777216
read_buffer_size=262144
max_used_connections=12
max_threads=317
thread_count=7
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 754326 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x146f1deb11d8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x146ebc5ec438 thread_stack 0x40000
/rdsdbbin/mysql/bin/mysqld(my_print_stacktrace+0x2e)[0x557a41f30dae]
/rdsdbbin/mysql/bin/mysqld(handle_fatal_signal+0x51b)[0x557a416dd7fb]
sigaction.c:0(__restore_rt)[0x1470809308e0]
:0(__GI_raise)[0x1470805a7ca0]
:0(__GI_abort)[0x1470805a9148]
:0(ib::fatal::~fatal())[0x557a41d84301]
include/mach0data.ic:84(mach_read_from_2)[0x557a41cac710]
btr/btr0btr.cc:1381(btr_page_reorganize_low(page_cur_t*, dict_index_t*, mtr_t*) [clone .isra.64])[0x557a41d8aca6]
btr/btr0btr.cc:1646(btr_page_reorganize(page_cur_t*, dict_index_t*, mtr_t*))[0x557a41d8b7f7]
include/dict0mem.h:1840(dict_table_t::not_redundant() const)[0x557a41c6b2d3]
ibuf/ibuf0ibuf.cc:3881(ibuf_insert_to_index_page)[0x557a41c70d51]
buf/buf0buf.cc:3382(buf_page_get_low(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*, bool))[0x557a41dbebd9]
buf/buf0buf.cc:3464(buf_page_get_gen(page_id_t, unsigned long, unsigned long, buf_block_t*, unsigned long, char const*, unsigned int, mtr_t*, dberr_t*, bool))[0x557a41dbecb6]
btr/btr0cur.cc:1624(btr_cur_search_to_nth_level_func(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_cur_t*, rw_lock_t*, char const*, unsigned int, mtr_t*, unsigned long))[0x557a41d9f0b1]
include/btr0pcur.ic:448(btr_pcur_open_low(dict_index_t*, unsigned long, dtuple_t const*, page_cur_mode_t, unsigned long, btr_pcur_t*, char const*, unsigned int, unsigned long, mtr_t*) [clone .constprop.37])[0x557a41d165a2]
row/row0row.cc:1306(row_search_index_entry(dict_index_t*, dtuple_t const*, unsigned long, btr_pcur_t*, mtr_t*))[0x557a41d1670f]
row/row0purge.cc:459(row_purge_remove_sec_if_poss_leaf(purge_node_t*, dict_index_t*, dtuple_t const*))[0x557a41d11239]
row/row0purge.cc:568(row_purge_remove_sec_if_poss)[0x557a41d12236]
row/row0purge.cc:1113(row_purge)[0x557a41d132da]
que/que0que.cc:946(que_thr_step)[0x557a41cc6fae]
srv/srv0srv.cc:1855(srv_task_execute)[0x557a41d33e36]
tpool/task_group.cc:57(tpool::task_group::execute(tpool::task*))[0x557a41e9d5cf]
tpool/tpool_generic.cc:544(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x557a41e9c58b]
bits/unique_ptr.h:78(std::default_delete<std::thread::_State>::operator()(std::thread::_State*) const)[0x557a42015d8f]
pthread_create.c:0(start_thread)[0x14708092644b]
:0(__GI___clone)[0x14708066140f]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 0
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway. 
Query: 
 
Writing a core file...
Working directory at /rdsdbdata/db
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            unlimited            unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             unlimited            unlimited            processes 
Max open files            65535                65535                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       30936                30936                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: /rdsdbdata/tmp/core-%e-%p

no more crash logs:

2022-07-11 18:20:48 0 [Note] InnoDB: !!! innodb_force_recovery is set to 2 !!!
2022-07-11 18:20:48 0 [Note] InnoDB: Uses event mutexes
2022-07-11 18:20:48 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
2022-07-11 18:20:48 0 [Note] InnoDB: Number of pools: 1
2022-07-11 18:20:48 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2022-07-11 18:20:48 0 [Note] InnoDB: Using Linux native AIO
2022-07-11 18:20:48 0 [Note] InnoDB: Initializing buffer pool, total size = 5368709120, chunk size = 134217728
2022-07-11 18:20:48 0 [Note] InnoDB: Completed initialization of buffer pool
2022-07-11 18:20:48 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=1019799709084,1019799709084
2022-07-11 18:20:48 0 [Note] InnoDB: Last binlog file '/rdsdbdata/log/binlog/mysql-bin-changelog.045232', position 602
2022-07-11 18:20:48 0 [Note] InnoDB: 128 rollback segments are active.
2022-07-11 18:20:48 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2022-07-11 18:20:48 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2022-07-11 18:20:48 0 [Note] InnoDB: Setting file '/rdsdbdata/db/innodb/ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2022-07-11 18:20:48 0 [Note] InnoDB: File '/rdsdbdata/db/innodb/ibtmp1' size is now 12 MB.
2022-07-11 18:20:48 0 [Note] InnoDB: 10.5.13 started; log sequence number 1019799709316; transaction id 1766426
2022-07-11 18:20:48 0 [Note] InnoDB: Loading buffer pool(s) from /rdsdbdata/db/innodb/ib_buffer_pool
2022-07-11 18:20:48 0 [Note] Recovering after a crash using /rdsdbdata/log/binlog/mysql-bin-changelog
2022-07-11 18:20:48 0 [Note] Starting crash recovery...
2022-07-11 18:20:48 0 [Note] Crash recovery finished.
2022-07-11 18:20:48 0 [Note] Server socket created on IP: '::'.
2022-07-11 18:20:48 0 [Note] Reading of all Master_info entries succeeded
2022-07-11 18:20:48 0 [Note] Added new Master_info '' to hash table
2022-07-11 18:20:48 0 [Note] /rdsdbbin/mysql/bin/mysqld: ready for connections.
Version: '10.5.13-MariaDB-log'  socket: '/tmp/mysql.sock'  port: 3306  managed by https://aws.amazon.com/rds/
2022-07-11 18:20:48 2 [Note] Event Scheduler: scheduler thread started with id 2
2022-07-11 18:20:52 7 [Warning] Aborted connection 7 to db: 'unconnected' user: 'rdsadmin' host: 'localhost' (Got an error reading communication packets)
2022-07-11 18:21:15 0 [Note] InnoDB: Buffer pool(s) load completed at 220711 18:21:15



 Comments   
Comment by Marko Mäkelä [ 2022-07-20 ]

The server crashes, because a page is found to be corrupted during a change buffer merge.

The crash was fixed in MDEV-13542. The exact cause of this corruption is corruption is unknown; we have been unable to reproduce it internally. The change buffer was disabled by default in MDEV-27734.

Generated at Thu Feb 08 10:06:06 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.