Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27987

"[ERROR] InnoDB: Unable to find a record to delete-mark" shortly after upgrading slave from 10.5.10 to 10.6.5

    XMLWordPrintable

Details

    Description

      Hello there,

      A few weeks ago, I started upgrading our MariaDB machines from OpenSUSE 15.2 to 15.3. As part of the upgrade, MariaDB got updated from (I am fairly sure) 10.5.10 to 10.6.5.

      At first I upgraded just one slave machine, which has been running fine.

      Today, I upgraded the remaining two slaves as well as the master (in that order).

      Then, suddenly, 2 hours after the upgrade, one of the slaves crashed and started restarting and crashing on start:

      2022-03-02 16:28:21 4 [Note] Slave: connected to master 'repl@spire:3306',replication resumed in log 'mysql-bin.004769' at position 88403470
      2022-03-02 18:13:49 0 [ERROR] InnoDB: Unable to find a record to delete-mark
      InnoDB: tuple DATA TUPLE: 4 fields;
       0: len 9; hex 617070735f706f7374; asc apps_post;;
       1: len 7; hex 7075626c697368; asc publish;;
       2: len 5; hex 99ab64aa13; asc   d  ;;
       3: len 8; hex 00000000002dc510; asc      -  ;;
       
      InnoDB: record PHYSICAL RECORD: n_fields 4; compact format; info bits 0
       0: len 9; hex 617070735f706f7374; asc apps_post;;
       1: len 7; hex 7075626c697368; asc publish;;
       2: len 5; hex 99ab645980; asc   dY ;;
       3: len 8; hex 00000000002e4c4c; asc      .LL;;
      2022-03-02 18:13:49 0 [ERROR] InnoDB: page [page id: space=59276, page number=92474] (240 records, index id 44529).
      2022-03-02 18:13:49 0 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      

      This repeats over and over until I see this:

      2022-03-02 18:13:49 0 [ERROR] InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      2022-03-02 18:13:53 0 [ERROR] InnoDB: Corruption of an index tree: table `***********`.`wp_posts` index `post_modified_gmt_type_status`, father ptr page no 32786, child page no 92474
      PHYSICAL RECORD: n_fields 4; compact format; info bits 0
       0: len 5; hex 617070735f; asc apps_;;
       1: len 9; hex 706f73747075626c69; asc postpubli;;
       2: len 7; hex 736899ab637282; asc sh  cr ;;
       3: len 8; hex 00000000002db631; asc      - 1;;
      2022-03-02 18:13:53 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 163
      PHYSICAL RECORD: n_fields 5; compact format; info bits 16
       0: len 5; hex 8000000000; asc      ;;
       1: len 9; hex 61706b5f73706c6974; asc apk_split;;
       2: len 5; hex 6472616674; asc draft;;
       3: len 8; hex 00000000000f62f0; asc       b ;;
       4: len 4; hex 00008012; asc     ;;
      2022-03-02 18:13:53 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 165
      2022-03-02 18:13:53 0 [ERROR] [FATAL] InnoDB: You should dump + drop + reimport the table to fix the corruption. If the crash happens at database startup. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. Then dump + drop + reimport.
      220302 18:13:53 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.6.5-MariaDB-log
      key_buffer_size=536870912
      read_buffer_size=262144
      max_used_connections=232
      max_threads=2050
      thread_count=25
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 5300842 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x560f8bc5f328
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fad76ffcb78 thread_stack 0x49000
      ??:0(my_print_stacktrace)[0x560f88883ddd]
      ??:0(handle_fatal_signal)[0x560f883648f5]
      ??:0(sem_open)[0x7fb451453f80]
      ??:0(__gconv_transform_internal_ascii)[0x7fb45056c18b]
      ??:0(__gconv_transform_utf8_internal)[0x7fb45056d585]
      ??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f88740fd1]
      ??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f8874580d]
      ??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f887458e6]
      ??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f8874b051]
      ??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f8875d906]
      ??:0(std::pair<std::_Rb_tree_iterator<std::pair<unsigned long const, bool> >, bool> std::_Rb_tree<unsigned long, std::pair<unsigned long const, bool>, std::_Select1st<std::pair<unsigned long const, bool> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, bool> > >::_M_emplace_unique<unsigned long&, bool>(unsigned long&, bool&&))[0x560f8876561f]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x560f886e6cba]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x560f886e7e97]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x560f886e852a]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x560f886a662d]
      ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x560f88718b2f]
      ??:0(std::unique_lock<std::mutex>::unlock())[0x560f88707adf]
      ??:0(tpool::task_group::execute(tpool::task*))[0x560f88825921]
      ??:0(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x560f888246ef]
      ??:0(__gnu_debug::_Safe_sequence_base::_M_revalidate_singular())[0x7fb450f38cd4]
      /lib64/libpthread.so.0(+0x894a)[0x7fb45144894a]
      ??:0(eval_expr)[0x7fb450632d0f]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): (null)
      Connection ID (thread ID): 0
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
       
      We think the query pointer is invalid, but we will try to print it anyway.
      Query:
       
      Writing a core file...
      Working directory at /var/lib/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units
      Max cpu time              unlimited            unlimited            seconds
      Max file size             unlimited            unlimited            bytes
      Max data size             unlimited            unlimited            bytes
      Max stack size            8388608              unlimited            bytes
      Max core file size        unlimited            unlimited            bytes
      Max resident set          unlimited            unlimited            bytes
      Max processes             128328               128328               processes
      Max open files            1048576              1048576              files
      Max locked memory         8388608              8388608              bytes
      Max address space         unlimited            unlimited            bytes
      Max file locks            unlimited            unlimited            locks
      Max pending signals       128328               128328               signals
      Max msgqueue size         819200               819200               bytes
      Max nice priority         0                    0
      Max realtime priority     0                    0
      Max realtime timeout      unlimited            unlimited            us
      Core pattern: |/bin/false
       
      2022-03-02 18:14:03 0 [Warning] You need to use --log-bin to make --binlog-format work.
      2022-03-02 18:14:03 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2022-03-02 18:14:03 0 [Note] InnoDB: Number of pools: 1
      2022-03-02 18:14:03 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2022-03-02 18:14:03 0 [Note] InnoDB: Using Linux native AIO
      2022-03-02 18:14:03 0 [Note] InnoDB: Initializing buffer pool, total size = 25769803776, chunk size = 134217728
      2022-03-02 18:14:03 0 [Note] InnoDB: Completed initialization of buffer pool
      2022-03-02 18:14:03 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=17030859767851,17030859785816
      2022-03-02 18:14:03 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 1 row operations to undo
      2022-03-02 18:14:03 0 [Note] InnoDB: Trx id counter is 215031286094
      2022-03-02 18:14:03 0 [Note] InnoDB: Starting final batch to recover 12213 pages from redo log.
      2022-03-02 18:14:05 0 [Note] InnoDB: Last binlog file './mysql-bin.000524', position 142764333
      2022-03-02 18:14:05 0 [Note] InnoDB: 128 rollback segments are active.
      2022-03-02 18:14:05 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
      2022-03-02 18:14:05 0 [Note] InnoDB: Rolled back recovered transaction 215031286093
      2022-03-02 18:14:05 0 [Note] InnoDB: Rollback of non-prepared transactions completed
      2022-03-02 18:14:05 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
      2022-03-02 18:14:05 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2022-03-02 18:14:05 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2022-03-02 18:14:05 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2022-03-02 18:14:05 0 [Note] InnoDB: 10.6.5 started; log sequence number 17030939799231; transaction id 215031286096
      2022-03-02 18:14:05 0 [Note] InnoDB: Loading buffer pool(s) from /var/lib/mysql/ib_buffer_pool
      2022-03-02 18:14:05 0 [Note] Plugin 'FEEDBACK' is disabled.
      2022-03-02 18:14:05 0 [Note] Server socket created on IP: '192.168.202.72'.
      2022-03-02 18:14:05 4 [Note] Slave I/O thread: Start asynchronous replication to master 'repl@spire:3306' in log 'mysql-bin.004771' at position 484601512
      2022-03-02 18:14:05 5 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.004771' at position 483941269, relay log './mysqld-relay-bin.012431' position: 483931147
      2022-03-02 18:14:05 0 [Note] /usr/sbin/mysqld: ready for connections.
      Version: '10.6.5-MariaDB-log'  socket: '/var/run/mysql/mysql.sock'  port: 3306  MariaDB package
      2022-03-02 18:14:05 4 [Note] Slave I/O thread: connected to master 'repl@spire:3306',replication started in log 'mysql-bin.004771' at position 484601512
      2022-03-02 18:14:05 0 [ERROR] InnoDB: Corruption of an index tree: table `***********`.`wp_posts` index `post_modified_gmt_type_status`, father ptr page no 32786, child page no 92474
      

      I found https://jira.mariadb.org/browse/MDEV-22373 and related issues, but they're all marked as fixed.

      Any ideas? I'm dreading this happening again to other slaves and taking down our db setup (can it also happen to the master and crash it?.

      Attachments

        Issue Links

          Activity

            People

              marko Marko Mäkelä
              archon810 Artem Russakovskii
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.