[MDEV-24162] InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.7/storage/innobase/include/fut0lst.h line 127 Created: 2020-11-07  Updated: 2021-01-13  Resolved: 2020-11-09

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.4.16, 10.5.7
Fix Version/s: 10.4.17, 10.5.8

Type: Bug Priority: Major
Reporter: Otto Kekäläinen Assignee: Marko Mäkelä
Resolution: Duplicate Votes: 0
Labels: corruption

Issue Links:
Duplicate
duplicates MDEV-24096 Server crash, InnoDB fatal error, Ass... Closed
duplicates MDEV-24155 Database not starting Closed
Problem/Incident
is caused by MDEV-23672 Assertion `v.v_indexes.empty()' faile... Closed
Relates
relates to MDEV-24578 MariaDB 10.5 fails to join Galera clu... Closed

 Description   

While starting a MariaDB Galera 10.5.7 node on machine in a production environment it failed to start with error message:

...
mariadbd[48385]: 2020-11-07 17:26:12 0 [Note] InnoDB: Setting log file ./ib_logfile101 size to 100663296 bytes
mariadbd[48385]: 2020-11-07 17:26:12 0 [Note] InnoDB: Renaming log file ./ib_logfile101 to ./ib_logfile0
mariadbd[48385]: 2020-11-07 17:26:12 0 [Note] InnoDB: New log file created, LSN=1273138439286
mariadbd[48385]: 2020-11-07 17:26:13 0 [Note] InnoDB: 128 rollback segments are active.
mariadbd[48385]: 2020-11-07 17:26:13 0 [Note] InnoDB: Creating sys_virtual system tables.
mariadbd[48385]: 2020-11-07 17:26:13 0 [Note] InnoDB: Creating shared tablespace for temporary tables
mariadbd[48385]: 2020-11-07 17:26:13 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
mariadbd[48385]: 2020-11-07 17:26:13 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
mariadbd[48385]: 2020-11-07 17:26:13 0 [Note] InnoDB: 10.5.7 started; log sequence number 0; transaction id 8895517680547275320
mariadbd[48385]: 2020-11-07 17:26:13 0x7f5c5effd700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.7/storage/innobase/include/fut0lst.h line 127
mariadbd[48385]: InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
mariadbd[48385]: 201107 17:26:13 [ERROR] mysqld got signal 6 ;

This seems very odd. Why are there buildbot paths embedded in the released binary?

The lines in fut0lst.h around 127 look like:

/** @return a file address */
inline fil_addr_t flst_read_addr(const byte *faddr)
{
  fil_addr_t addr= { mach_read_from_4(faddr + FIL_ADDR_PAGE),
                     mach_read_from_2(faddr + FIL_ADDR_BYTE) };
  ut_a(addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA);
  ut_a(ut_align_offset(faddr, srv_page_size) >= FIL_PAGE_DATA);
  return addr;
}

The commits around this section were:

  • February 2020: 7ae21b18a6b73bbc3bf1ff448faf60c29ac1d386
  • November 2019: a6e8a7df82f8434fc522470a8b7ffd45160cef71
  • May 2009: 3945d5e5549187a18c64a112899f90a7f6a320d6

Does marko have any idea what might be going on here?



 Comments   
Comment by Marko Mäkelä [ 2020-11-09 ]

The fix of MDEV-23672 caused a serious buffer pool corruption issue that will be fixed in MDEV-24096.

The failing assertion is likely being reported for some unrelated page that happened to be located adjacent to the undo log page in the buffer pool when the overwrite happened.

Comment by Tero Auvinen [ 2020-11-20 ]

We are still seeing these two errors in production with 10.5.8, happens after successful SST while joining v10.1.48 cluster

Nov 12 11:50:45 mariadbd[35358]: 2020-11-12 11:50:45 0x7f9720bdd800 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.8/storage/innobase/include/fut0lst.h line 127
Nov 12 11:50:45 mariadbd[35358]: InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
Nov 12 11:50:45 mariadbd[35358]: 201112 11:50:45 [ERROR] mysqld got signal 6 ;
Nov 12 11:50:45 mariadbd[35358]: Server version: 10.5.8-MariaDB-1:10.5.8+maria~bionic

Nov 12 12:01:00 sh[56947]: 2020-11-12 12:00:59 0x7f43085d2800 InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.8/storage/innobase/dict/dict0load.cc line 1816
Nov 12 12:01:00 sh[56947]: InnoDB: Failing assertion: btr_pcur_is_on_user_rec(&pcur)
Nov 12 12:01:00 sh[56947]: 201112 12:00:59 [ERROR] mysqld got signal 6 ;
Nov 12 12:01:00 sh[56947]: Server version: 10.5.8-MariaDB-1:10.5.8+maria~bionic

Mora data available if needed.

Comment by Marko Mäkelä [ 2020-11-20 ]

serter, was the data directory previously corrupted by running a server version that was affected by MDEV-24096? If yes, this can still be a duplicate of that bug.

Comment by Tero Auvinen [ 2020-11-22 ]

Starting point was always totally empty /var/lib/mysql so these happen after SST from v10.1.48 donor

Comment by Otto Kekäläinen [ 2021-01-12 ]

I came across this same error, with a pure 10.5.8 server joining a 10.1 cluster with no prior data directory.

sh[21888]: 2021-01-12  6:23:08 0 [ERROR] InnoDB: Your database may be corrupt or you may have copied the InnoDB tablespace but not the InnoDB log files. Please refer to https://mariadb.com/kb/en/library/innodb-recovery-modes/ for information about forcing recovery.
sh[21888]: 2021-01-12  6:23:09 0 [Note] InnoDB: 128 rollback segments are active.
sh[21888]: 2021-01-12  6:23:09 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
sh[21888]: 2021-01-12  6:23:09 0 [Note] InnoDB: Creating shared tablespace for temporary tables
sh[21888]: 2021-01-12  6:23:09 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
sh[21888]: 2021-01-12  6:23:09 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
sh[21888]: 2021-01-12  6:23:09 0 [Note] InnoDB: 10.5.8 started; log sequence number 1584788811151; transaction id 7795002400535230266
sh[21888]: 2021-01-12  6:23:09 0 [Warning] InnoDB: Skipping buffer pool dump/restore during wsrep recovery.
sh[21888]: 2021-01-12 06:23:09 0x7f80a4f46700  InnoDB: Assertion failure in file /home/buildbot/buildbot/build/mariadb-10.5.8/storage/innobase/include/fut0lst.h line 127
sh[21888]: InnoDB: Failing assertion: addr.page == FIL_NULL || addr.boffset >= FIL_PAGE_DATA
sh[21888]: InnoDB: We intentionally generate a memory trap.
sh[21888]: InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
sh[21888]: InnoDB: If you get repeated assertion failures or crashes, even
sh[21888]: InnoDB: immediately after the mysqld startup, there may be
sh[21888]: InnoDB: corruption in the InnoDB tablespace. Please refer to
sh[21888]: InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
sh[21888]: InnoDB: about forcing recovery.
sh[21888]: 210112  6:23:09 [ERROR] mysqld got signal 6 ;
sh[21888]: This could be because you hit a bug. It is also possible that this binary
sh[21888]: or one of the libraries it was linked against is corrupt, improperly built,
sh[21888]: or misconfigured. This error can also be caused by malfunctioning hardware.
sh[21888]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
sh[21888]: We will try our best to scrape up some info that will hopefully help
sh[21888]: diagnose the problem, but since we have already crashed,
sh[21888]: something is definitely wrong and this may fail.
sh[21888]: Server version: 10.5.8-MariaDB-1:10.5.8+maria~bionic
sh[21888]: key_buffer_size=134217728
sh[21888]: read_buffer_size=2097152
sh[21888]: max_used_connections=0
sh[21888]: max_threads=1002
sh[21888]: thread_count=0
sh[21888]: It is possible that mysqld could use up to
sh[21888]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6312783 K  bytes of memory
sh[21888]: Hope that's ok; if not, decrease some variables in the equation.
sh[21888]: Thread pointer: 0x7f8090000cf8
sh[21888]: Attempting backtrace. You can use the following information to find out
sh[21888]: where mysqld died. If you see no messages after this, something went
sh[21888]: terribly wrong...
sh[21888]: 2021-01-12  6:23:09 0 [Note] Plugin 'FEEDBACK' is disabled.
sh[21888]: stack_bottom = 0x7f80a4f45c98 thread_stack 0x49000
sh[21888]: 2021-01-12  6:23:09 0 [Note] Recovering after a crash using tc.log
sh[21888]: 2021-01-12  6:23:09 0 [Note] Starting crash recovery...
sh[21888]: 2021-01-12  6:23:09 0 [Note] Crash recovery finished.
sh[21888]: 2021-01-12  6:23:09 0 [Note] Server socket created on IP: '0.0.0.0'.
sh[21888]: 2021-01-12  6:23:09 0 [Note] WSREP: Recovered position: 7f74fdea-cda3-11e9-ae2f-ab188b9a01f2:338545194
sh[21888]: ??:0(my_print_stacktrace)[0x55614e3ad25e]
sh[21888]: ??:0(handle_fatal_signal)[0x55614ddbb715]
sh[21888]: ??:0(__restore_rt)[0x7f8146a25980]
sh[21888]: ??:0(gsignal)[0x7f8145d35fb7]
sh[21888]: ??:0(abort)[0x7f8145d37921]
sh[21888]: /usr/sbin/mysqld(+0x605032)[0x55614dab7032]
sh[21888]: ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x55614e1d5546]
sh[21888]: ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x55614e1d5849]
sh[21888]: ??:0(std::_Rb_tree<unsigned int, unsigned int, std::_Identity<unsigned int>, std::less<unsigned int>, std::allocator<unsigned int> >::_M_erase(std::_Rb_tree_node<unsigned int>*))[0x55614e1dab87]
sh[21888]: ??:0(std::unique_lock<std::mutex>::unlock())[0x55614e1bdc73]
sh[21888]: ??:0(tpool::task_group::execute(tpool::task*))[0x55614e335224]
sh[21888]: ??:0(tpool::thread_pool_generic::worker_main(tpool::worker_data*))[0x55614e3340ef]
sh[21888]: ??:0(std::error_code::default_error_condition() const)[0x7f81465436df]
sh[21888]: ??:0(start_thread)[0x7f8146a1a6db]
sh[21888]: ??:0(clone)[0x7f8145e1871f]
sh[21888]: Trying to get some variables.
sh[21888]: Some pointers may be invalid and cause the dump to abort.
sh[21888]: Query (0x0): (null)
sh[21888]: Connection ID (thread ID): 0
sh[21888]: Status: NOT_KILLED
sh[21888]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
sh[21888]: The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
sh[21888]: information that should help you find out what is causing the crash.
sh[21888]: We think the query pointer is invalid, but we will try to print it anyway.
sh[21888]: Query:
sh[21888]: Writing a core file...
sh[21888]: Working directory at /var/lib/mysql
sh[21888]: Resource Limits:
sh[21888]: Limit                     Soft Limit           Hard Limit           Units
sh[21888]: Max cpu time              unlimited            unlimited            seconds
sh[21888]: Max file size             unlimited            unlimited            bytes
sh[21888]: Max data size             unlimited            unlimited            bytes
sh[21888]: Max stack size            8388608              unlimited            bytes
sh[21888]: Max core file size        0                    unlimited            bytes
sh[21888]: Max resident set          unlimited            unlimited            bytes
sh[21888]: Max processes             63772                63772                processes
sh[21888]: Max open files            1048576              1048576              files
sh[21888]: Max locked memory         16777216             16777216             bytes
sh[21888]: Max address space         unlimited            unlimited            bytes
sh[21888]: Max file locks            unlimited            unlimited            locks
sh[21888]: Max pending signals       63772                63772                signals
sh[21888]: Max msgqueue size         819200               819200               bytes
sh[21888]: Max nice priority         0                    0
sh[21888]: Max realtime priority     0                    0
sh[21888]: Max realtime timeout      unlimited            unlimited            us
sh[21888]: Core pattern: core
sh[21888]: Aborted'

Comment by Marko Mäkelä [ 2021-01-13 ]

serter, please note that otto filed the new ticket MDEV-24578 for the upgrade issue. The corruption bug that this ticket was filed for only affects the MariaDB 10.4.16 and MariaDB 10.5.7 releases, only in cases where the database contains indexed virtual columns (for which MDEV-5800 and an import of the InnoDB changes from MySQL 5.7.9 introduced support). Let us continue debugging the upgrade problem further in MDEV-24578.

Generated at Thu Feb 08 09:27:54 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.