[MDEV-26893] innodb assertion on startup - rem0rec:877 Created: 2021-10-24  Updated: 2022-01-20  Resolved: 2022-01-06

Status: Closed
Project: MariaDB Server
Component/s: Storage Engine - InnoDB
Affects Version/s: 10.6.4
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Nigi Uorfori Assignee: Marko Mäkelä
Resolution: Incomplete Votes: 0
Labels: assertion

Issue Links:
Relates
relates to MDEV-11131 InnoDB: Assertion failure - rem0rec.c... Closed
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed
relates to MDEV-27411 Index Corruption in mariadb 10.5.11 s... Closed

 Description   

Server version: 10.6.4-MariaDB-1:10.6.4+maria~bullseye
key_buffer_size=2147483648
read_buffer_size=52428800
max_used_connections=0
max_threads=10002
thread_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 668090610 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000



 Comments   
Comment by Daniel Black [ 2021-10-24 ]

nigelorfori, thanks for the bug report. There's not enough information to show what caused this as it is. Was a core dump generated?

Can you install debug symbols?

And then take a backtrace?

Was the crash query related? Can you include the query? The SHOW CREATE TABLE tablename for the tables involved? And EXPLAIN query

Comment by Nigi Uorfori [ 2021-10-24 ]

Hello,

I'm not very experienced with MariaDB where can I find this? I found out 2 tables (InnoDB) are crashed. How do I fix this?

It are those 2 tables they keep doing these queries:
https://prnt.sc/1x9cajz

Comment by Nigi Uorfori [ 2021-10-24 ]

Hello,

It did run hundreds of queries I can not say exactly which query was the cause. I can not stop Mariadb service either because transactions or something :/

[Note] InnoDB: Waiting for 258 active transactions to finish
[Note] InnoDB: Waiting for 258 active transactions to finish
[Note] InnoDB: Waiting for 258 active transactions to finish
[Note] InnoDB: Waiting for 258 active transactions to finish
[Note] InnoDB: Waiting for 258 active transactions to finish

Comment by Daniel Black [ 2021-10-24 ]

SELECT COUNT FROM table can be a slow query, especially on large tables and isn't necessary tied to the crash you experienced.

I'm having trouble resolving these seemingly conflicting statuses:

max_used_connections=0
max_threads=10002
thread_count=1

No progress on "active transactions to finish" could just be a really large rollback.

Can you include journalctl -n 300 -u mariadb.service to include just crash back to when the MariaDB started that time, until the next startup?

What was happening on the server before the crash occured? What mariadb config have you got? How much memory do you have?

Comment by Nigi Uorfori [ 2021-10-25 ]

16GB RAM 8vCPU

It runs daily like hundreds of thousands queries and the database is like 400gb in total. I currently closed everything (website/server) to slow down the queries but then I found out this.

https://prnt.sc/1x9eqoq (journalctl)

https://prnt.sc/1x9etfh (logs.err)

I actually don't know what happend before the crash.

This is my mariadb.cnf:

max_connections = 10000
max_user_connections = 0
connect_timeout = 28800
max_connect_errors = 1000000
wait_timeout = 28800
interactive_timeout = 28800
max_allowed_packet = 50M
thread_cache_size = 128
sort_buffer_size = 15M
bulk_insert_buffer_size = 16M
tmp_table_size = 128M
max_heap_table_size = 32M

key_buffer_size = 2G

table_open_cache = 15000
myisam_sort_buffer_size = 256M
concurrent_insert = 2
read_buffer_size = 50M
read_rnd_buffer_size = 100M

query_cache_limit = 1M
query_cache_size = 128M
query_cache_type = OFF

max_binlog_size = 100M
sql_mode =

default_storage_engine = InnoDB

innodb_log_file_size = 1G
innodb_buffer_pool_size = 8G
innodb_log_buffer_size = 16M
innodb_file_per_table = 1
innodb_open_files = 400
innodb_io_capacity = 30000
innodb_io_capacity_max = 40000
innodb_flush_method = O_DIRECT
innodb_read_io_threads = 16
innodb_write_io_threads = 16

innodb_flush_log_at_trx_commit = 2
innodb_log_buffer_size = 24M
innodb_thread_concurrency = 8

Comment by Nigi Uorfori [ 2021-10-25 ]

Update:

0x7f370171a700 InnoDB: Assertion failure in file ./storage/innobase/rem/rem0rec.cc line 877
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mariadbd startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.

Comment by Daniel Black [ 2021-10-25 ]

Thanks better. It looks like maybe some index corruption similar to MDEV-11131

If you're able to include one more thing as text, can you do:

install debug symbol packages per first comment.

apt-get install -y gdb

sudo -u mysql gdb /usr/sbin/mariadbd

On (gdb) prompt

set print frame-arguments all
run

(when assertion occurs on gdb prompt)

thread apply all bt full

unrelated but your innodb_io_capacity looks too high, and I'm not sure you server could handle max_connections (memory usage).

Comment by Nigi Uorfori [ 2021-10-25 ]

Doesn't seem to work for me :/

https://prnt.sc/1x9nto8

and what max_connections / capacity do you recommend? I got enough memory but use 16gb for this server

Comment by Nigi Uorfori [ 2021-10-25 ]

Update:

is could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.6.4-MariaDB-1:10.6.4+maria~bullseye
key_buffer_size=2147483648
read_buffer_size=52428800
max_used_connections=5
max_threads=10002
thread_count=5
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 668090610 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fd844000c58
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fda7c295d58 thread_stack 0x49000
Printing to addr2line failed
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x564d38daedfe]
/usr/sbin/mariadbd(handle_fatal_signal+0x485)[0x564d3887c125]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140)[0x7fdb153d0140]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0x141)[0x7fdb14f19ce1]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x123)[0x7fdb14f03537]
/usr/sbin/mariadbd(+0x653183)[0x564d38521183]
/usr/sbin/mariadbd(+0x63e84c)[0x564d3850c84c]
/usr/sbin/mariadbd(+0xcfa18e)[0x564d38bc818e]
/usr/sbin/mariadbd(+0xccee26)[0x564d38b9ce26]
/usr/sbin/mariadbd(+0xccfe3b)[0x564d38b9de3b]
/usr/sbin/mariadbd(+0xddc32c)[0x564d38caa32c]
/usr/sbin/mariadbd(+0xdcdaca)[0x564d38c9baca]
/usr/sbin/mariadbd(+0xd5e047)[0x564d38c2c047]
/usr/sbin/mariadbd(+0xc8f545)[0x564d38b5d545]
/usr/sbin/mariadbd(_ZN7handler13ha_index_nextEPh+0x2b7)[0x564d38882847]
/usr/sbin/mariadbd(+0x7cf22c)[0x564d3869d22c]
/usr/sbin/mariadbd(_Z10sub_selectP4JOINP13st_join_tableb+0x1c5)[0x564d3868fef5]
/usr/sbin/mariadbd(_ZN4JOIN10exec_innerEv+0xf94)[0x564d386bd404]
/usr/sbin/mariadbd(_ZN4JOIN4execEv+0x35)[0x564d386bd7d5]
/usr/sbin/mariadbd(_Z12mysql_selectP3THDP10TABLE_LISTR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0xf6)[0x564d386bb8a6]
/usr/sbin/mariadbd(_Z13handle_selectP3THDP3LEXP13select_resultm+0x14f)[0x564d386bc05f]
/usr/sbin/mariadbd(+0x778a41)[0x564d38646a41]
/usr/sbin/mariadbd(_Z21mysql_execute_commandP3THDb+0x4262)[0x564d38655342]
/usr/sbin/mariadbd(_Z11mysql_parseP3THDPcjP12Parser_state+0x1e7)[0x564d38656aa7]
/usr/sbin/mariadbd(_Z16dispatch_command19enum_server_commandP3THDPcjb+0xf85)[0x564d38658ab5]
/usr/sbin/mariadbd(_Z10do_commandP3THDb+0x135)[0x564d3865aa85]
/usr/sbin/mariadbd(_Z24do_handle_one_connectionP7CONNECTb+0x3af)[0x564d38763bff]
/usr/sbin/mariadbd(handle_one_connection+0x5d)[0x564d38763f4d]
/usr/sbin/mariadbd(+0xbe6ec9)[0x564d38ab4ec9]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7)[0x7fdb153c4ea7]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x3f)[0x7fdb14fdbdef]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fd8440108b0): SELECT COUNT(*) FROM `user_access`
 
Connection ID (thread ID): 61
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             63904                63904                processes 
Max open files            32768                32768                files     
Max locked memory         2097152              2097152              bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       63904                63904                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: core

Comment by Marko Mäkelä [ 2021-10-25 ]

This assertion failure is indeed equivalent to the one that was reported in MDEV-11131.

Based on the available information, we know that you have a corrupted InnoDB index record in a table that does not use the ROW_FORMAT=REDUNDANT format.

I wonder why this corruption was not caught by the page checksum. A possible explanation is that the record was corrupted while the page was loaded in the buffer pool, and the corrupted page with a correct checksum was later written back to the data file. I am not aware of such memory corruption bugs in our code base (other than MDEV-24096 that was caused by an incorrect fix of MDEV-23672).

I would suggest you to test the memory, to check if the data could have been corrupted spontaneuously in the memory. With memtester you can do that even without rebooting the operating system.

Generated at Thu Feb 08 09:48:45 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.