[MDEV-27987] "[ERROR] InnoDB: Unable to find a record to delete-mark" shortly after upgrading slave from 10.5.10 to 10.6.5 - Jira

XML

Word

Printable

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Cannot Reproduce
Affects Version/s: 10.6.5
Fix Version/s: N/A
Component/s: Storage Engine - InnoDB, Upgrades
Labels:
None
Environment:
OpenSUSE 15.3

Description

Hello there,

A few weeks ago, I started upgrading our MariaDB machines from OpenSUSE 15.2 to 15.3. As part of the upgrade, MariaDB got updated from (I am fairly sure) 10.5.10 to 10.6.5.

At first I upgraded just one slave machine, which has been running fine.

Today, I upgraded the remaining two slaves as well as the master (in that order).

Then, suddenly, 2 hours after the upgrade, one of the slaves crashed and started restarting and crashing on start:

2022-03-02 16:28:21 4 [Note] Slave: connected to master 'repl@spire:3306',replication resumed in log 'mysql-bin.004769' at position 88403470

2022-03-02 18:13:49 0 [ERROR] InnoDB: Unable to find a record to delete-mark

InnoDB: tuple DATA TUPLE: 4 fields;

 0: len 9; hex 617070735f706f7374; asc apps_post;;

 1: len 7; hex 7075626c697368; asc publish;;

 2: len 5; hex 99ab64aa13; asc   d  ;;

 3: len 8; hex 00000000002dc510; asc      -  ;;

InnoDB: record PHYSICAL RECORD: n_fields 4; compact format; info bits 0

 0: len 9; hex 617070735f706f7374; asc apps_post;;

 1: len 7; hex 7075626c697368; asc publish;;

 2: len 5; hex 99ab645980; asc   dY ;;

 3: len 8; hex 00000000002e4c4c; asc      .LL;;

2022-03-02 18:13:49 0 [ERROR] InnoDB: page [page id: space=59276, page number=92474] (240 records, index id 44529).

2022-03-02 18:13:49 0 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

This repeats over and over until I see this:

2022-03-02 18:13:49 0 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/

2022-03-02 18:13:53 0 [ERROR] InnoDB: Corruption of an index tree: table `***********`.`wp_posts` index `post_modified_gmt_type_status`, father ptr page no 32786, child page no 92474

PHYSICAL RECORD: n_fields 4; compact format; info bits 0

 0: len 5; hex 617070735f; asc apps_;;

 1: len 9; hex 706f73747075626c69; asc postpubli;;

 2: len 7; hex 736899ab637282; asc sh  cr ;;

 3: len 8; hex 00000000002db631; asc      - 1;;

2022-03-02 18:13:53 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 163

PHYSICAL RECORD: n_fields 5; compact format; info bits 16

 0: len 5; hex 8000000000; asc      ;;

 1: len 9; hex 61706b5f73706c6974; asc apk_split;;

 2: len 5; hex 6472616674; asc draft;;

 3: len 8; hex 00000000000f62f0; asc       b ;;

 4: len 4; hex 00008012; asc     ;;

2022-03-02 18:13:53 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 165

2022-03-02 18:13:53 0 [ERROR] [FATAL] InnoDB: You should dump + drop + reimport the table to fix the corruption. If the crash happens at database startup. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. Then dump + drop + reimport.

220302 18:13:53 [ERROR] mysqld got signal 6 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.6.5-MariaDB-log

key_buffer_size=536870912

read_buffer_size=262144

max_used_connections=232

max_threads=2050

thread_count=25

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5300842 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x560f8bc5f328

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7fad76ffcb78 thread_stack 0x49000

??:0(my_print_stacktrace)[0x560f88883ddd]

??:0(handle_fatal_signal)[0x560f883648f5]

??:0(sem_open)[0x7fb451453f80]

??:0(__gconv_transform_internal_ascii)[0x7fb45056c18b]

??:0(__gconv_transform_utf8_internal)[0x7fb45056d585]

??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f88740fd1]

??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f8874580d]

??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f887458e6]

??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f8874b051]

??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f8875d906]

??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f8876561f]

??:0(std::unique_lock<std::mutex>::unlock())[0x560f886e6cba]

??:0(std::unique_lock<std::mutex>::unlock())[0x560f886e7e97]

??:0(std::unique_lock<std::mutex>::unlock())[0x560f886e852a]

??:0(std::unique_lock<std::mutex>::unlock())[0x560f886a662d]

??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x560f88718b2f]

??:0(std::unique_lock<std::mutex>::unlock())[0x560f88707adf]

??:0(tpool::task_group::execute(tpool::task*))[0x560f88825921]

??:0(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x560f888246ef]

??:0(__gnu_debug::_Safe_sequence_base::_M_revalidate_singular())[0x7fb450f38cd4]

/lib64/libpthread.so.0(+0x894a)[0x7fb45144894a]

??:0(eval_expr)[0x7fb450632d0f]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x0): (null)

Connection ID (thread ID): 0

Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

We think the query pointer is invalid, but we will try to print it anyway.

Query:

Writing a core file...

Working directory at /var/lib/mysql

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        unlimited            unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             128328               128328               processes

Max open files            1048576              1048576              files

Max locked memory         8388608              8388608              bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       128328               128328               signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: |/bin/false

2022-03-02 18:14:03 0 [Warning] You need to use --log-bin to make --binlog-format work.

2022-03-02 18:14:03 0 [Note] InnoDB: Compressed tables use zlib 1.2.11

2022-03-02 18:14:03 0 [Note] InnoDB: Number of pools: 1

2022-03-02 18:14:03 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2022-03-02 18:14:03 0 [Note] InnoDB: Using Linux native AIO

2022-03-02 18:14:03 0 [Note] InnoDB: Initializing buffer pool, total size = 25769803776, chunk size = 134217728

2022-03-02 18:14:03 0 [Note] InnoDB: Completed initialization of buffer pool

2022-03-02 18:14:03 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=17030859767851,17030859785816

2022-03-02 18:14:03 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo

2022-03-02 18:14:03 0 [Note] InnoDB: Trx id counter is 215031286094

2022-03-02 18:14:03 0 [Note] InnoDB: Starting final batch to recover 12213 pages from redo log.

2022-03-02 18:14:05 0 [Note] InnoDB: Last binlog file './mysql-bin.000524', position 142764333

2022-03-02 18:14:05 0 [Note] InnoDB: 128 rollback segments are active.

2022-03-02 18:14:05 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

2022-03-02 18:14:05 0 [Note] InnoDB: Rolled back recovered transaction 215031286093

2022-03-02 18:14:05 0 [Note] InnoDB: Rollback of non-prepared transactions completed

2022-03-02 18:14:05 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"

2022-03-02 18:14:05 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2022-03-02 18:14:05 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2022-03-02 18:14:05 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2022-03-02 18:14:05 0 [Note] InnoDB: 10.6.5 started; log sequence number 17030939799231; transaction id 215031286096

2022-03-02 18:14:05 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool

2022-03-02 18:14:05 0 [Note] Plugin 'FEEDBACK' is disabled.

2022-03-02 18:14:05 0 [Note] Server socket created on IP: '192.168.202.72'.

2022-03-02 18:14:05 4 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@spire:3306' in log 'mysql-bin.004771' at position 484601512

2022-03-02 18:14:05 5 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.004771' at position 483941269, relay log './mysqld-relay-bin.012431' position: 483931147

2022-03-02 18:14:05 0 [Note] /usr/sbin/mysqld: ready for connections.

Version: '10.6.5-MariaDB-log'  socket: '/var/run/mysql/mysql.sock'  port: 3306  MariaDB package

2022-03-02 18:14:05 4 [Note] Slave I/O thread: connected to master 'repl@spire:3306',replication started in log 'mysql-bin.004771' at position 484601512

2022-03-02 18:14:05 0 [ERROR] InnoDB: Corruption of an index tree: table `***********`.`wp_posts` index `post_modified_gmt_type_status`, father ptr page no 32786, child page no 92474

I found https://jira.mariadb.org/browse/MDEV-22373 and related issues, but they're all marked as fixed.

Any ideas? I'm dreading this happening again to other slaves and taking down our db setup (can it also happen to the master and crash it?.

Attachments

Issue Links

relates to

MDEV-25796 Failing assertion: !cursor->index->is_committed() in row0ins.cc

Open

MDEV-27734 Set innodb_change_buffering=none by default

Closed

MDEV-22373 Unable to find a record to delete-mark ends up crashing mysqld process after upgrading from 10.1.43 to 10.4

Closed

Activity

People

Assignee:: Marko Mäkelä

Reporter:: Artem Russakovskii

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 2022-03-03 06:10

Updated:: 2022-06-22 09:31

Resolved:: 2022-06-22 09:30

Time Tracking

Estimated:

Not Specified

Remaining:

Logged:

0.75h

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.