[MDEV-24286] InnoDB: Assertion failure in file .. trx/trx0undo.cc line 616 Created: 2020-11-26  Updated: 2021-04-25  Resolved: 2021-04-25

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.3.24
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Allen Lee (Inactive) Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

Virtualized, On Premise, CentOS, MariaDB community server.


Attachments: Zip Archive core.zip    

 Description   

Customer reported the very frequent assertion failure as below and server restarts every a couple of minutes or sometimes hours.

2020-11-23 18:36:58 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=26661390142
2020-11-23 18:36:58 0x7f2618e898c0  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.24/storage/innobase/trx/trx0undo.cc line 616
InnoDB: Failing assertion: free + TRX_UNDO_LOG_XA_HDR_SIZE < srv_page_size - 100
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
201123 18:36:58 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.3.24-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=0
max_threads=2002
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4532509 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d488aee51e]
/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55d488583a4f]
sigaction.c:0(__restore_rt)[0x7f2618a70630]
:0(__GI_raise)[0x7f2616d41387]
:0(__GI_abort)[0x7f2616d42a78]
/usr/sbin/mysqld(+0x4ed8a0)[0x55d4882be8a0]
/usr/sbin/mysqld(+0xa9ecb4)[0x55d48886fcb4]
/usr/sbin/mysqld(+0x9b0052)[0x55d488781052]
/usr/sbin/mysqld(+0x9b17b5)[0x55d4887827b5]
/usr/sbin/mysqld(+0x4e86b2)[0x55d4882b96b2]
/usr/sbin/mysqld(+0xae9eec)[0x55d4888baeec]
/usr/sbin/mysqld(+0xb07477)[0x55d4888d8477]
/usr/sbin/mysqld(+0xae6daa)[0x55d4888b7daa]
/usr/sbin/mysqld(+0xa8d9e3)[0x55d48885e9e3]
/usr/sbin/mysqld(+0xa9a108)[0x55d48886b108]
/usr/sbin/mysqld(+0xa62e9b)[0x55d488833e9b]
/usr/sbin/mysqld(+0x9519c1)[0x55d4887229c1]
/usr/sbin/mysqld(_Z24ha_initialize_handlertonP13st_plugin_int+0x64)[0x55d488586184]
/usr/sbin/mysqld(+0x5e1a60)[0x55d4883b2a60]
/usr/sbin/mysqld(_Z11plugin_initPiPPci+0x9ba)[0x55d4883b3bea]
/usr/sbin/mysqld(+0x513897)[0x55d4882e4897]
/usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x4ae)[0x55d4882eb2fe]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f2616d2d555]
/usr/sbin/mysqld(+0x50d55d)[0x55d4882de55d]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /mariadb/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             unlimited            unlimited            processes 
Max open files            1048576              1048576              files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       127912               127912               signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: core*strong text*

I will attach core files once I get from customer.



 Comments   
Comment by Marko Mäkelä [ 2020-12-21 ]

Can we please get a resolved stack trace? We must identify the page number of the corrupted page. Then we would need a copy of that page from the data directory.

Comment by Marko Mäkelä [ 2021-03-19 ]

allen.lee@mariadb.com, core files do not include the buffer pool contents ever since MDEV-10814. Hence, it is not sufficient for debugging this.
The stack trace that you posted does not include parameters to any functions. This probably happens because the package with debug symbols was not installed.
Last, unless core.zip contains the executable and all linked dynamic libraries, it would not be useful for retrieving any further information, such as the identity of the page.

Assuming that the installation is not using multiple undo tablespaces, then the undo page would be in the system tablespace. MDEV-24449 may cause arbitrary corruption of the system tablespace.

Generated at Thu Feb 08 09:28:50 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.