Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-31065

MariaDB crashes with [ERROR] [FATAL] InnoDB: Page old data size 8483 new data size 7800, page old max ins size 7519 new max ins size 8202

Details

    Description

      MariaDB crashes with `[ERROR] [FATAL] InnoDB: Page old data size 8483 new data size 7800, page old max ins size 7519 new max ins size 8202` error. Crash occurs every ~20seconds if mariaDB is set on innodb_force_recovery = 0 or 1 modes. Starting innodb_force_recovery = 2 keeps MariaBD running, though setting recovery mode lower will make it crash again.

      Error log below:

      Apr 17 11:44:03 server mysqld: 2023-04-17 11:44:03 0 [ERROR] [FATAL] InnoDB: Page old data size 8483 new data size 7800, page old max ins size 7519 new max ins size 8202
      Apr 17 11:44:03 server mysqld: 230417 11:44:03 [ERROR] mysqld got signal 6 ;
      Apr 17 11:44:03 server mysqld: This could be because you hit a bug. It is also possible that this binary
      Apr 17 11:44:03 server mysqld: or one of the libraries it was linked against is corrupt, improperly built,
      Apr 17 11:44:03 server mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
      Apr 17 11:44:03 server mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
      Apr 17 11:44:03 server mysqld: We will try our best to scrape up some info that will hopefully help
      Apr 17 11:44:03 server mysqld: diagnose the problem, but since we have already crashed,
      Apr 17 11:44:03 server mysqld: something is definitely wrong and this may fail.
      Apr 17 11:44:03 server mysqld: Server version: 10.5.12-MariaDB-cll-lve
      Apr 17 11:44:03 server mysqld: key_buffer_size=2147483648
      Apr 17 11:44:03 server mysqld: read_buffer_size=4194304
      Apr 17 11:44:03 server mysqld: max_used_connections=38
      Apr 17 11:44:03 server mysqld: max_threads=2002
      Apr 17 11:44:03 server mysqld: thread_count=38
      Apr 17 11:44:03 server mysqld: It is possible that mysqld could use up to
      Apr 17 11:44:03 server mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18548686 K  bytes of memory
      Apr 17 11:44:03 server mysqld: Hope that's ok; if not, decrease some variables in the equation.
      Apr 17 11:44:03 server mysqld: Thread pointer: 0x7f3e8fc13018
      Apr 17 11:44:03 server mysqld: Attempting backtrace. You can use the following information to find out
      Apr 17 11:44:03 server mysqld: where mysqld died. If you see no messages after this, something went
      Apr 17 11:44:03 server mysqld: terribly wrong...
      Apr 17 11:44:03 server mysqld: stack_bottom = 0x7f3e8fbfeb98 thread_stack 0x49000
      Apr 17 11:44:04 server mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55cf0ff8bbfe]
      Apr 17 11:44:04 server mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x485)[0x55cf0f9f8645]
      Apr 17 11:44:14 server mysqld: sigaction.c:0(__restore_rt)[0x7f424f1b6630]
      Apr 17 11:44:14 server mysqld: :0(__GI_raise)[0x7f424d486387]
      Apr 17 11:44:14 server mysqld: :0(__GI_abort)[0x7f424d487a78]
      Apr 17 11:44:15 server mysqld: /usr/sbin/mysqld(+0x63f5d1)[0x55cf0f6d65d1]
      Apr 17 11:44:15 server mysqld: /usr/sbin/mysqld(+0x63fe23)[0x55cf0f6d6e23]
      Apr 17 11:44:16 server mysqld: /usr/sbin/mysqld(+0xddccd7)[0x55cf0fe73cd7]
      Apr 17 11:44:16 server mysqld: /usr/sbin/mysqld(+0xcd613a)[0x55cf0fd6d13a]
      Apr 17 11:44:17 server mysqld: /usr/sbin/mysqld(+0xcd9dac)[0x55cf0fd70dac]
      Apr 17 11:44:17 server mysqld: /usr/sbin/mysqld(+0xe0fb9d)[0x55cf0fea6b9d]
      Apr 17 11:44:18 server mysqld: /usr/sbin/mysqld(+0xe0fd08)[0x55cf0fea6d08]
      Apr 17 11:44:18 server mysqld: /usr/sbin/mysqld(+0xdefca5)[0x55cf0fe86ca5]
      Apr 17 11:44:19 server mysqld: /usr/sbin/mysqld(+0xd72653)[0x55cf0fe09653]
      Apr 17 11:44:19 server mysqld: /usr/sbin/mysqld(+0xd7272e)[0x55cf0fe0972e]
      Apr 17 11:44:20 server mysqld: /usr/sbin/mysqld(+0xd6d5ec)[0x55cf0fe045ec]
      Apr 17 11:44:20 server mysqld: /usr/sbin/mysqld(+0xd6e683)[0x55cf0fe05683]
      Apr 17 11:44:21 server mysqld: /usr/sbin/mysqld(+0xd6f10a)[0x55cf0fe0610a]
      Apr 17 11:44:21 server mysqld: /usr/sbin/mysqld(+0xd2c618)[0x55cf0fdc3618]
      Apr 17 11:44:22 server mysqld: /usr/sbin/mysqld(+0xda3d52)[0x55cf0fe3ad52]
      Apr 17 11:44:22 server mysqld: /usr/sbin/mysqld(+0xd8fb37)[0x55cf0fe26b37]
      Apr 17 11:44:23 server mysqld: /usr/sbin/mysqld(_ZN5tpool10task_group7executeEPNS_4taskE+0x8c)[0x55cf0fff096c]
      Apr 17 11:44:23 server mysqld: ut/ut0ut.cc:576(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55cf0ffef83f]
      Apr 17 11:44:23 server mysqld: btr/btr0btr.cc:1411(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55cf1003a2cf]
      Apr 17 11:44:26 server mysqld: pthread_create.c:0(start_thread)[0x7f424f1aeea5]
      Apr 17 11:44:27 server mysqld: ??:0(__clone)[0x7f424d54eb0d]
      Apr 17 11:44:27 server mysqld: Trying to get some variables.
      Apr 17 11:44:27 server mysqld: Some pointers may be invalid and cause the dump to abort.
      Apr 17 11:44:27 server mysqld: Query (0x0): (null)
      Apr 17 11:44:27 server mysqld: Connection ID (thread ID): 0
      Apr 17 11:44:27 server mysqld: Status: NOT_KILLED
      Apr 17 11:44:27 server mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
      Apr 17 11:44:27 server mysqld: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      Apr 17 11:44:27 server mysqld: information that should help you find out what is causing the crash.
      Apr 17 11:44:27 server mysqld: We think the query pointer is invalid, but we will try to print it anyway.
      Apr 17 11:44:27 server mysqld: Query:
      Apr 17 11:44:27 server mysqld: Writing a core file...
      Apr 17 11:44:27 server mysqld: Working directory at /var/lib/mysql
      Apr 17 11:44:27 server mysqld: Resource Limits:
      Apr 17 11:44:27 server mysqld: Limit                     Soft Limit           Hard Limit           Units
      Apr 17 11:44:27 server mysqld: Max cpu time              unlimited            unlimited            seconds
      Apr 17 11:44:27 server mysqld: Max file size             unlimited            unlimited            bytes
      Apr 17 11:44:27 server mysqld: Max data size             unlimited            unlimited            bytes
      Apr 17 11:44:27 server mysqld: Max stack size            8388608              unlimited            bytes
      Apr 17 11:44:27 server mysqld: Max core file size        unlimited            unlimited            bytes
      Apr 17 11:44:27 server mysqld: Max resident set          unlimited            unlimited            bytes
      Apr 17 11:44:27 server mysqld: Max processes             1028624              1028624              processes
      Apr 17 11:44:27 server mysqld: Max open files            300000               300000               files
      Apr 17 11:44:27 server mysqld: Max locked memory         65536                65536                bytes
      Apr 17 11:44:27 server mysqld: Max address space         unlimited            unlimited            bytes
      Apr 17 11:44:27 server mysqld: Max file locks            unlimited            unlimited            locks
      Apr 17 11:44:27 server mysqld: Max pending signals       1028624              1028624              signals
      Apr 17 11:44:27 server mysqld: Max msgqueue size         819200               819200               bytes
      Apr 17 11:44:27 server mysqld: Max nice priority         0                    0
      Apr 17 11:44:27 server mysqld: Max realtime priority     0                    0
      Apr 17 11:44:27 server mysqld: Max realtime timeout      unlimited            unlimited            us
      Apr 17 11:44:27 server mysqld: Core pattern: core
      

      Attaching Full Backtraces For All Threads From a Core File, which was generated with the following command:

      gdb --batch --eval-command="set print frame-arguments all" --eval-command="thread apply all bt full" /usr/sbin/mariadbd /var/lib/mysql/core.932  > mariadbd_full_bt_all_threads.txt
      

      Please let me know in case any other info is needed, thank you!

      Attachments

        Issue Links

          Activity

            mariadbd_full_bt_all_threads.txt shows that the calling thread is executing ibuf_insert_to_index_page_low(). I suspect that some stale change buffer entries are being applied twice. This was found out to be the actual root cause of MDEV-30009. I described that bug in my FOSDEM 2023 speech.

            marko Marko Mäkelä added a comment - mariadbd_full_bt_all_threads.txt shows that the calling thread is executing ibuf_insert_to_index_page_low() . I suspect that some stale change buffer entries are being applied twice. This was found out to be the actual root cause of MDEV-30009 . I described that bug in my FOSDEM 2023 speech .

            Can you please check what happens when upgrading to MariaDB 10.6.12 or later? I hope that it will only mark the secondary index as corrupted, and not crash.

            This actual corruption can be fixed by dropping the corrupted secondary index, and creating it again, in two ALTER TABLE statements. If the table is not too large, you might just simply execute OPTIMIZE TABLE to completely rebuild it.

            marko Marko Mäkelä added a comment - Can you please check what happens when upgrading to MariaDB 10.6.12 or later? I hope that it will only mark the secondary index as corrupted, and not crash. This actual corruption can be fixed by dropping the corrupted secondary index, and creating it again, in two ALTER TABLE statements. If the table is not too large, you might just simply execute OPTIMIZE TABLE to completely rebuild it.

            Thank you for the quick response! Another confirmation I should visit FOSDEM next year

            Unfortunately this server is downtime sensitive and upgrading major version would take some time as we're dealing with relatively large data sets.

            We've tried OPTIMIZE TABLE on affected table(s), however, optimize function forced MariaDB to crash again, attaching full strace log in case you'd like to take a look:

            mariadbd_full_bt_all_threads3.txt

            We're thinking about possibility dropping those affected databases and restoring them from the backups.

            Paulius Mickus Paulius mickus added a comment - Thank you for the quick response! Another confirmation I should visit FOSDEM next year Unfortunately this server is downtime sensitive and upgrading major version would take some time as we're dealing with relatively large data sets. We've tried OPTIMIZE TABLE on affected table(s), however, optimize function forced MariaDB to crash again, attaching full strace log in case you'd like to take a look: mariadbd_full_bt_all_threads3.txt We're thinking about possibility dropping those affected databases and restoring them from the backups.

            Did you upgrade to MariaDB 10.5.19 (where MDEV-30009 has been fixed) before executing the OPTIMIZE TABLE?

            If you are using the 10.5 series, you’d better execute the OPTIMIZE TABLE or DROP INDEX while the server is started up with innodb_force_recovery=2, to prevent the purge of history from causing a change buffer merge before the problematic index has been dropped. If you are executing ALTER TABLE…ADD INDEX or CREATE INDEX in a version older than 10.5.19, then the phantom change buffer entries will not be removed and may cause trouble later, even if you are using the recent default (MDEV-27734) setting innodb_change_buffering=none.

            mariadbd_full_bt_all_threads3.txt shows a crash in fts_sync_during_ddl(). That would be a separate bug to be filed. It might be fixed in a newer release than the one that you are using. thiru could perhaps comment on that.

            marko Marko Mäkelä added a comment - Did you upgrade to MariaDB 10.5.19 (where MDEV-30009 has been fixed) before executing the OPTIMIZE TABLE ? If you are using the 10.5 series, you’d better execute the OPTIMIZE TABLE or DROP INDEX while the server is started up with innodb_force_recovery=2 , to prevent the purge of history from causing a change buffer merge before the problematic index has been dropped. If you are executing ALTER TABLE…ADD INDEX or CREATE INDEX in a version older than 10.5.19, then the phantom change buffer entries will not be removed and may cause trouble later, even if you are using the recent default ( MDEV-27734 ) setting innodb_change_buffering=none . mariadbd_full_bt_all_threads3.txt shows a crash in fts_sync_during_ddl() . That would be a separate bug to be filed. It might be fixed in a newer release than the one that you are using. thiru could perhaps comment on that.

            No, we were still using 10.5.12, going to update to 10.5.19 shortly.

            Paulius Mickus Paulius mickus added a comment - No, we were still using 10.5.12, going to update to 10.5.19 shortly.

            After upgrading to 10.5.19 optimize went successfully on crashing databases and MariaDB server is now running smoothly with innodb_force_recovery=0, so it seems that crash in fts_sync_during_ddl() is also related to older versions only. Thank you for your help!

            Paulius Mickus Paulius mickus added a comment - After upgrading to 10.5.19 optimize went successfully on crashing databases and MariaDB server is now running smoothly with innodb_force_recovery=0, so it seems that crash in fts_sync_during_ddl() is also related to older versions only. Thank you for your help!

            Thank you. The FTS bug could have been MDEV-26273.

            marko Marko Mäkelä added a comment - Thank you. The FTS bug could have been MDEV-26273 .

            People

              marko Marko Mäkelä
              Paulius Mickus Paulius mickus
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.