[MDEV-31065] MariaDB crashes with [ERROR] [FATAL] InnoDB: Page old data size 8483 new data size 7800, page old max ins size 7519 new max ins size 8202 Created: 2023-04-17  Updated: 2023-04-18  Resolved: 2023-04-18

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.5.12
Fix Version/s: 10.11.2, 10.5.19, 10.6.12, 10.8.7, 10.9.5, 10.10.3

Type: Bug Priority: Major
Reporter: Paulius mickus Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: None
Environment:

CloudLinux release 7.9 4.18.0-305.19.1.lve.el7h.x86_64


Attachments: Text File mariadbd_full_bt_all_threads.txt     Text File mariadbd_full_bt_all_threads3.txt    
Issue Links:
Duplicate
duplicates MDEV-30009 InnoDB shutdown hangs when the change... Closed

 Description   

MariaDB crashes with `[ERROR] [FATAL] InnoDB: Page old data size 8483 new data size 7800, page old max ins size 7519 new max ins size 8202` error. Crash occurs every ~20seconds if mariaDB is set on innodb_force_recovery = 0 or 1 modes. Starting innodb_force_recovery = 2 keeps MariaBD running, though setting recovery mode lower will make it crash again.

Error log below:

Apr 17 11:44:03 server mysqld: 2023-04-17 11:44:03 0 [ERROR] [FATAL] InnoDB: Page old data size 8483 new data size 7800, page old max ins size 7519 new max ins size 8202
Apr 17 11:44:03 server mysqld: 230417 11:44:03 [ERROR] mysqld got signal 6 ;
Apr 17 11:44:03 server mysqld: This could be because you hit a bug. It is also possible that this binary
Apr 17 11:44:03 server mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Apr 17 11:44:03 server mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Apr 17 11:44:03 server mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Apr 17 11:44:03 server mysqld: We will try our best to scrape up some info that will hopefully help
Apr 17 11:44:03 server mysqld: diagnose the problem, but since we have already crashed,
Apr 17 11:44:03 server mysqld: something is definitely wrong and this may fail.
Apr 17 11:44:03 server mysqld: Server version: 10.5.12-MariaDB-cll-lve
Apr 17 11:44:03 server mysqld: key_buffer_size=2147483648
Apr 17 11:44:03 server mysqld: read_buffer_size=4194304
Apr 17 11:44:03 server mysqld: max_used_connections=38
Apr 17 11:44:03 server mysqld: max_threads=2002
Apr 17 11:44:03 server mysqld: thread_count=38
Apr 17 11:44:03 server mysqld: It is possible that mysqld could use up to
Apr 17 11:44:03 server mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18548686 K  bytes of memory
Apr 17 11:44:03 server mysqld: Hope that's ok; if not, decrease some variables in the equation.
Apr 17 11:44:03 server mysqld: Thread pointer: 0x7f3e8fc13018
Apr 17 11:44:03 server mysqld: Attempting backtrace. You can use the following information to find out
Apr 17 11:44:03 server mysqld: where mysqld died. If you see no messages after this, something went
Apr 17 11:44:03 server mysqld: terribly wrong...
Apr 17 11:44:03 server mysqld: stack_bottom = 0x7f3e8fbfeb98 thread_stack 0x49000
Apr 17 11:44:04 server mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55cf0ff8bbfe]
Apr 17 11:44:04 server mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x485)[0x55cf0f9f8645]
Apr 17 11:44:14 server mysqld: sigaction.c:0(__restore_rt)[0x7f424f1b6630]
Apr 17 11:44:14 server mysqld: :0(__GI_raise)[0x7f424d486387]
Apr 17 11:44:14 server mysqld: :0(__GI_abort)[0x7f424d487a78]
Apr 17 11:44:15 server mysqld: /usr/sbin/mysqld(+0x63f5d1)[0x55cf0f6d65d1]
Apr 17 11:44:15 server mysqld: /usr/sbin/mysqld(+0x63fe23)[0x55cf0f6d6e23]
Apr 17 11:44:16 server mysqld: /usr/sbin/mysqld(+0xddccd7)[0x55cf0fe73cd7]
Apr 17 11:44:16 server mysqld: /usr/sbin/mysqld(+0xcd613a)[0x55cf0fd6d13a]
Apr 17 11:44:17 server mysqld: /usr/sbin/mysqld(+0xcd9dac)[0x55cf0fd70dac]
Apr 17 11:44:17 server mysqld: /usr/sbin/mysqld(+0xe0fb9d)[0x55cf0fea6b9d]
Apr 17 11:44:18 server mysqld: /usr/sbin/mysqld(+0xe0fd08)[0x55cf0fea6d08]
Apr 17 11:44:18 server mysqld: /usr/sbin/mysqld(+0xdefca5)[0x55cf0fe86ca5]
Apr 17 11:44:19 server mysqld: /usr/sbin/mysqld(+0xd72653)[0x55cf0fe09653]
Apr 17 11:44:19 server mysqld: /usr/sbin/mysqld(+0xd7272e)[0x55cf0fe0972e]
Apr 17 11:44:20 server mysqld: /usr/sbin/mysqld(+0xd6d5ec)[0x55cf0fe045ec]
Apr 17 11:44:20 server mysqld: /usr/sbin/mysqld(+0xd6e683)[0x55cf0fe05683]
Apr 17 11:44:21 server mysqld: /usr/sbin/mysqld(+0xd6f10a)[0x55cf0fe0610a]
Apr 17 11:44:21 server mysqld: /usr/sbin/mysqld(+0xd2c618)[0x55cf0fdc3618]
Apr 17 11:44:22 server mysqld: /usr/sbin/mysqld(+0xda3d52)[0x55cf0fe3ad52]
Apr 17 11:44:22 server mysqld: /usr/sbin/mysqld(+0xd8fb37)[0x55cf0fe26b37]
Apr 17 11:44:23 server mysqld: /usr/sbin/mysqld(_ZN5tpool10task_group7executeEPNS_4taskE+0x8c)[0x55cf0fff096c]
Apr 17 11:44:23 server mysqld: ut/ut0ut.cc:576(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55cf0ffef83f]
Apr 17 11:44:23 server mysqld: btr/btr0btr.cc:1411(Wsrep_server_service::log_dummy_write_set(wsrep::client_state&, wsrep::ws_meta const&))[0x55cf1003a2cf]
Apr 17 11:44:26 server mysqld: pthread_create.c:0(start_thread)[0x7f424f1aeea5]
Apr 17 11:44:27 server mysqld: ??:0(__clone)[0x7f424d54eb0d]
Apr 17 11:44:27 server mysqld: Trying to get some variables.
Apr 17 11:44:27 server mysqld: Some pointers may be invalid and cause the dump to abort.
Apr 17 11:44:27 server mysqld: Query (0x0): (null)
Apr 17 11:44:27 server mysqld: Connection ID (thread ID): 0
Apr 17 11:44:27 server mysqld: Status: NOT_KILLED
Apr 17 11:44:27 server mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
Apr 17 11:44:27 server mysqld: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
Apr 17 11:44:27 server mysqld: information that should help you find out what is causing the crash.
Apr 17 11:44:27 server mysqld: We think the query pointer is invalid, but we will try to print it anyway.
Apr 17 11:44:27 server mysqld: Query:
Apr 17 11:44:27 server mysqld: Writing a core file...
Apr 17 11:44:27 server mysqld: Working directory at /var/lib/mysql
Apr 17 11:44:27 server mysqld: Resource Limits:
Apr 17 11:44:27 server mysqld: Limit                     Soft Limit           Hard Limit           Units
Apr 17 11:44:27 server mysqld: Max cpu time              unlimited            unlimited            seconds
Apr 17 11:44:27 server mysqld: Max file size             unlimited            unlimited            bytes
Apr 17 11:44:27 server mysqld: Max data size             unlimited            unlimited            bytes
Apr 17 11:44:27 server mysqld: Max stack size            8388608              unlimited            bytes
Apr 17 11:44:27 server mysqld: Max core file size        unlimited            unlimited            bytes
Apr 17 11:44:27 server mysqld: Max resident set          unlimited            unlimited            bytes
Apr 17 11:44:27 server mysqld: Max processes             1028624              1028624              processes
Apr 17 11:44:27 server mysqld: Max open files            300000               300000               files
Apr 17 11:44:27 server mysqld: Max locked memory         65536                65536                bytes
Apr 17 11:44:27 server mysqld: Max address space         unlimited            unlimited            bytes
Apr 17 11:44:27 server mysqld: Max file locks            unlimited            unlimited            locks
Apr 17 11:44:27 server mysqld: Max pending signals       1028624              1028624              signals
Apr 17 11:44:27 server mysqld: Max msgqueue size         819200               819200               bytes
Apr 17 11:44:27 server mysqld: Max nice priority         0                    0
Apr 17 11:44:27 server mysqld: Max realtime priority     0                    0
Apr 17 11:44:27 server mysqld: Max realtime timeout      unlimited            unlimited            us
Apr 17 11:44:27 server mysqld: Core pattern: core

Attaching Full Backtraces For All Threads From a Core File, which was generated with the following command:

gdb --batch --eval-command="set print frame-arguments all" --eval-command="thread apply all bt full" /usr/sbin/mariadbd /var/lib/mysql/core.932  > mariadbd_full_bt_all_threads.txt

Please let me know in case any other info is needed, thank you!



 Comments   
Comment by Marko Mäkelä [ 2023-04-17 ]

mariadbd_full_bt_all_threads.txt shows that the calling thread is executing ibuf_insert_to_index_page_low(). I suspect that some stale change buffer entries are being applied twice. This was found out to be the actual root cause of MDEV-30009. I described that bug in my FOSDEM 2023 speech.

Comment by Marko Mäkelä [ 2023-04-17 ]

Can you please check what happens when upgrading to MariaDB 10.6.12 or later? I hope that it will only mark the secondary index as corrupted, and not crash.

This actual corruption can be fixed by dropping the corrupted secondary index, and creating it again, in two ALTER TABLE statements. If the table is not too large, you might just simply execute OPTIMIZE TABLE to completely rebuild it.

Comment by Paulius mickus [ 2023-04-17 ]

Thank you for the quick response! Another confirmation I should visit FOSDEM next year

Unfortunately this server is downtime sensitive and upgrading major version would take some time as we're dealing with relatively large data sets.

We've tried OPTIMIZE TABLE on affected table(s), however, optimize function forced MariaDB to crash again, attaching full strace log in case you'd like to take a look:

mariadbd_full_bt_all_threads3.txt

We're thinking about possibility dropping those affected databases and restoring them from the backups.

Comment by Marko Mäkelä [ 2023-04-17 ]

Did you upgrade to MariaDB 10.5.19 (where MDEV-30009 has been fixed) before executing the OPTIMIZE TABLE?

If you are using the 10.5 series, you’d better execute the OPTIMIZE TABLE or DROP INDEX while the server is started up with innodb_force_recovery=2, to prevent the purge of history from causing a change buffer merge before the problematic index has been dropped. If you are executing ALTER TABLE…ADD INDEX or CREATE INDEX in a version older than 10.5.19, then the phantom change buffer entries will not be removed and may cause trouble later, even if you are using the recent default (MDEV-27734) setting innodb_change_buffering=none.

mariadbd_full_bt_all_threads3.txt shows a crash in fts_sync_during_ddl(). That would be a separate bug to be filed. It might be fixed in a newer release than the one that you are using. thiru could perhaps comment on that.

Comment by Paulius mickus [ 2023-04-17 ]

No, we were still using 10.5.12, going to update to 10.5.19 shortly.

Comment by Paulius mickus [ 2023-04-17 ]

After upgrading to 10.5.19 optimize went successfully on crashing databases and MariaDB server is now running smoothly with innodb_force_recovery=0, so it seems that crash in fts_sync_during_ddl() is also related to older versions only. Thank you for your help!

Comment by Marko Mäkelä [ 2023-04-18 ]

Thank you. The FTS bug could have been MDEV-26273.

Generated at Thu Feb 08 10:21:01 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.