Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-28167

MariaDB Core Dump / [ERROR] InnoDB: Corruption of an index tree

Details

    • Bug
    • Status: Open (View Workflow)
    • Major
    • Resolution: Unresolved
    • 10.5.10
    • None
    • Galera SST
    • None
    • Compiled from source on FreeBSD 12.3

    Description

      My mariadb instance(s) crashed with the following error:

      2022-03-22 21:24:13 0 [ERROR] InnoDB: Corruption of an index tree: table `xxxx`.`yyy` index `UNIQ_1483A5E965AB1D88`, fa
      ther ptr page no 3306, child page no 1420
      PHYSICAL RECORD: n_fields 2; compact format; info bits 0
       0: len 28; hex 593346463266526354774d6250504a337766623837334e65675a5832; asc Y3FF2fRcTwMbPPJ3wfb873NegZX2;;
       1: len 30; hex 31346266653932392d393139622d346364662d396337312d323064333331; asc 14bfe929-919b-4cdf-9c71-20d331; (total 36 b
      ytes);
      2022-03-22 21:24:13 0 [Note] InnoDB: n_owned: 0; heap_no: 2; next rec: 200
      PHYSICAL RECORD: n_fields 3; compact format; info bits 0
       0: len 28; hex 4f5044356775356f30634e55536f536d445657617236514635336f31; asc OPD5gu5o0cNUSoSmDVWar6QF53o1;;
       1: len 30; hex 31383661376531652d303631662d346538372d396430662d626430646437; asc 186a7e1e-061f-4e87-9d0f-bd0dd7; (total 36 bytes);
       2: len 4; hex 00000cea; asc     ;;
      2022-03-22 21:24:13 0 [Note] InnoDB: n_owned: 0; heap_no: 99; next rec: 8184
      2022-03-22 21:24:13 0 [ERROR] [FATAL] InnoDB: You should dump + drop + reimport the table to fix the corruption. If the crash happens at database startup. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery. Then dump + drop + reimport.
      220322 21:24:13 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.5.10-MariaDB-log
      Server version: 10.5.10-MariaDB-log
      key_buffer_size=33554432
      read_buffer_size=8388608
      max_used_connections=49
      max_threads=2002
      thread_count=16
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 32883467 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x128471b698
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fffded4de50 thread_stack 0x49000
      0x13155bc <my_print_stacktrace+0x3c> at /usr/local/libexec/mariadbd
      0xc77f2f <handle_fatal_signal+0x28f> at /usr/local/libexec/mariadbd
      0x80190cb70 <_pthread_sigmask+0x530> at /lib/libthr.so.3
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x0): (null)
      Connection ID (thread ID): 0
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_
      intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=o
      n,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subque
      ry_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_increment
      al=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=
      on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowi
      d_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
       
      We think the query pointer is invalid, but we will try to print it anyway.
      Query:
      

      I tried opening the core file with gdb, to get more insights from not really getting any clues:

      warning: exec file is newer than core file.
      [New LWP 100784]
      [New LWP 100847]
      [New LWP 100921]
      [New LWP 100922]
      [New LWP 100923]
      [New LWP 100932]
      [New LWP 100936]
      [New LWP 100938]
      [New LWP 100968]
      [New LWP 100970]
      [New LWP 100982]
      [New LWP 100983]
      [New LWP 100997]
      [New LWP 100999]
      [New LWP 101044]
      [New LWP 100929]
      [New LWP 101219]
      [New LWP 100822]
      [New LWP 100947]
      [New LWP 101490]
      [New LWP 101658]
      [New LWP 100858]
      [New LWP 100722]
      [New LWP 100992]
      [New LWP 101097]
      [New LWP 101137]
      [New LWP 101164]
      [New LWP 101220]
      [New LWP 101084]
      [New LWP 100606]
      [New LWP 101227]
      Core was generated by `/usr/local/libexec/mariadbd --defaults-extra-file=/usr/local/etc/mysql/my.cnf --'.
      cProgram terminated with signal SIGABRT, Aborted.
      Sent by kill() from pid 46919 and user 88.
      #0  0x0000000801b7cbda in ?? ()
      [Current thread is 1 (LWP 100784)]
      (gdb) thread 1
      [Switching to thread 1 (LWP 100784)]
      #0  0x0000000801b7cbda in ?? ()
      (gdb) bt
      #0  0x0000000801b7cbda in ?? ()
      #1  0x0000000000c7814b in ?? ()
      #2  0x06fa6e0a0000000d in ?? ()
      #3  0x000000095d9fb578 in ?? ()
      #4  0x0000000100000000 in ?? ()
      #5  0x000000180000000d in ?? ()
      #6  0x0000001600000015 in ?? ()
      #7  0x0000007a00000002 in ?? ()
      #8  0x0000005000000002 in ?? ()
      #9  0x00007fff00000000 in ?? ()
      #10 0x0000000000000e10 in ?? ()
      #11 0x0000000801e212a1 in ?? ()
      #12 0x00000000623a306d in ?? ()
      #13 0x0000000000000008 in ?? ()
      #14 0x0065726f632e4e25 in ?? ()
      #15 0x00000009a5a85b80 in ?? ()
      #16 0x00007fffded4bb00 in ?? ()
      #17 0x0000000801beb017 in ?? ()
      #18 0x00007fffded4bb40 in ?? ()
      #19 0x000000000000004d in ?? ()
      #20 0x0000000000000000 in ?? ()
      

      I guess SIGABRT is in libc?

      Attachments

        Activity

          Lol.. I am not able to reproduce the crash on the production system. I have no idea what caused it (what I am trying to figure out)
          The build system I was using from the new version of mariadbd was same libc, same source build path ete ctc - done from FreeBSD ports system - so it should be "OK"

          But regardless, I would say the focus should be on these lines: "[ERROR] InnoDB: Corruption of an index tree: table" right? Those caused InnoDB to invoke abort().. Or did I miss something?

          michbsd Michael Landin added a comment - Lol.. I am not able to reproduce the crash on the production system. I have no idea what caused it (what I am trying to figure out) The build system I was using from the new version of mariadbd was same libc, same source build path ete ctc - done from FreeBSD ports system - so it should be "OK" But regardless, I would say the focus should be on these lines: " [ERROR] InnoDB: Corruption of an index tree: table" right? Those caused InnoDB to invoke abort().. Or did I miss something?

          The fatal message is output when the internal links of a table are found to be corrupted. CHECK TABLE without QUICK should exercise this code. You could start by executing CHECK TABLE on every InnoDB table. Once you have identified the corrupted table, the schema of that table would be helpful to know.

          I do not have any idea what could cause this type of corruption in normal circumstances.

          Abnormal circumstances could include the following:

          • Memory corruption due to faulty hardware, or a software bug. InnoDB only updates or validates page checksums when writing or reading from a data file; there is no checksum validation while the data remains cached in the buffer pool. So, it could happily compute a valid checksum right before writing out a page that was corrupted while it resided in the buffer pool.
          • Unsafe copying of the data directory while the server is running. Use mariadb-backup or a file system snapshot.
          • Deleting the ib_logfile0 to ‘fix’ crash recovery trouble.
          marko Marko Mäkelä added a comment - The fatal message is output when the internal links of a table are found to be corrupted. CHECK TABLE without QUICK should exercise this code. You could start by executing CHECK TABLE on every InnoDB table. Once you have identified the corrupted table, the schema of that table would be helpful to know. I do not have any idea what could cause this type of corruption in normal circumstances. Abnormal circumstances could include the following: Memory corruption due to faulty hardware, or a software bug. InnoDB only updates or validates page checksums when writing or reading from a data file; there is no checksum validation while the data remains cached in the buffer pool. So, it could happily compute a valid checksum right before writing out a page that was corrupted while it resided in the buffer pool. Unsafe copying of the data directory while the server is running. Use mariadb-backup or a file system snapshot. Deleting the ib_logfile0 to ‘fix’ crash recovery trouble.

          Thank you for the insights.

          I restarted mariadb with --innodb-force-recovery=2 flag - after that I ran OPTIMIZE on the database showing problems and then everything was fine again.

          I do not believe it was faulty hardware (as I run galera on top of my mariadb - and the change cuased all 3 cluster nodes to core dump with the same error)
          No operations (like deleting or copying files were done)

          michbsd Michael Landin added a comment - Thank you for the insights. I restarted mariadb with --innodb-force-recovery=2 flag - after that I ran OPTIMIZE on the database showing problems and then everything was fine again. I do not believe it was faulty hardware (as I run galera on top of my mariadb - and the change cuased all 3 cluster nodes to core dump with the same error) No operations (like deleting or copying files were done)

          Thank you for clarifying that Galera snapshot transfers are involved. I think that there have been some problems with that, and also some recent fixes by sysprg.

          FreeBSD does not support asynchronous I/O (or at least we do not implement any API for that), and therefore my remarks in MDEV-24845 about innodb_disallow_writes not blocking InnoDB page writes should not apply to FreeBSD.

          Which Galera snapshot transfer method do you use?

          marko Marko Mäkelä added a comment - Thank you for clarifying that Galera snapshot transfers are involved. I think that there have been some problems with that, and also some recent fixes by sysprg . FreeBSD does not support asynchronous I/O (or at least we do not implement any API for that), and therefore my remarks in MDEV-24845 about innodb_disallow_writes not blocking InnoDB page writes should not apply to FreeBSD. Which Galera snapshot transfer method do you use?

          I use rsync

          michbsd Michael Landin added a comment - I use rsync

          People

            Unassigned Unassigned
            michbsd Michael Landin
            Votes:
            1 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.