[MDEV-31245] Assertion Failure results in crash Created: 2023-05-11  Updated: 2023-10-16  Resolved: 2023-10-16

Status: Closed
Project: MariaDB Server
Component/s: Server
Affects Version/s: 10.4.29
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Rob Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: regression
Environment:

Centos 7


Issue Links:
Duplicate
duplicates MDEV-22739 !cursor->index->is_committed() in row... Closed
Relates
relates to MDEV-9663 InnoDB assertion failure: *cursor->in... Closed
relates to MDEV-13542 Crashing on a corrupted page is unhel... Closed

 Description   

Server crashes on simple inserts from backup files after upgrade from v28 to v29.
After downgrade and optimize tables for various tables, it works again.

Maybe something to do also with parallel replication

2023-05-11 09:05:12 0x7fbdda556700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.29/storage/innobase/row/row0ins.cc line 219
InnoDB: Failing assertion: !cursor->index->is_committed()
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
230511  9:05:12 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.4.29-MariaDB-log source revision: 2594da7a33580bf03590502a011679c878487d0c
key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=3
max_threads=2002
thread_count=49
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4411983 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fba180008c8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fbdda555cf0 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55c3608c156e]
/usr/sbin/mysqld(handle_fatal_signal+0x317)[0x55c360313857]
/lib64/libpthread.so.0(+0xf630)[0x7ff1665ab630]
/lib64/libc.so.6(gsignal+0x37)[0x7ff1659f6387]
/lib64/libc.so.6(abort+0x148)[0x7ff1659f7a78]
/usr/sbin/mysqld(+0x5d7040)[0x55c360014040]
/usr/sbin/mysqld(+0xc63055)[0x55c3606a0055]
/usr/sbin/mysqld(+0xc636d3)[0x55c3606a06d3]
/usr/sbin/mysqld(+0xc63cc2)[0x55c3606a0cc2]
/usr/sbin/mysqld(+0xc75c30)[0x55c3606b2c30]
/usr/sbin/mysqld(+0xbb36ed)[0x55c3605f06ed]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPKh+0x2af)[0x55c36032130f]
/usr/sbin/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x6a)[0x55c3600e71ba]
/usr/sbin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0x1284)[0x55c3600edba4]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3ae0)[0x55c36011d6e0]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x22b)[0x55c36012250b]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x1956)[0x55c360124e26]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x112)[0x55c360126332]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x229)[0x55c36020aab9]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x55c36020ab7d]
pthread_create.c:0(start_thread)[0x7ff1665a3ea5]
/lib64/libc.so.6(clone+0x6d)[0x7ff165abeb0d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fba1805ae40): insert into director.proc (......
 
Connection ID (thread ID): 1804685
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /data/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             1029879              1029879              processes 
Max open files            32768                32768                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       1029879              1029879              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Kernel version: Linux version 3.10.0-1160.81.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Fri Dec 16 17:29:43 UTC 2022

My.cnf (partial)

[mysqld]
tmpdir=/tmp # /data/tmp if the linux drive is too small 
datadir=/data/mysql
port            = 3306
socket          = /var/lib/mysql/mysql.sock
skip-external-locking
key_buffer_size = 256M
max_allowed_packet=1G
table_open_cache = 256
sort_buffer_size = 1M
read_buffer_size = 1M
read_rnd_buffer_size = 4M
myisam_sort_buffer_size = 64M
thread_cache_size = 8
query_cache_size= 100M
 
lower_case_table_names=1
default_storage_engine=InnoDB
 
tmp_table_size = 4096M #64m for each GB
max_heap_table_size = 4096M
 
# replication
server-id = 100              
 
#multi source
#master-info-repository=TABLE
#relay-log-info-repository=TABLE
#Addition for GTID
#binlog_format=ROW
#gtid-domain-id=1 #up one on each master
#enforce-gtid-consistency
#log-slave-updates
#lower usage of network link
slave_compressed_protocol = 1
 
# on master
log-bin=mysql-bin
log-bin=/data/mysql/mysql-bin
expire_logs_days=10                                           
innodb_flush_log_at_trx_commit=1
sync_binlog=1
 
# for master master
auto_increment_increment=2
auto_increment_offset=1 
 
relay-log=ctc-80-relay-bin     
#parallel replication
slave_parallel_threads=32
slave_parallel_mode=optimistic
 
#to ensure that users with problems do not kill other users interaction
max_connections=2000 # number of directors times max_user_connections
max_user_connections=250
 
#for fulltext search
ft_min_word_len=2
 
# Innodb parameters
innodb_buffer_pool_size = 200G #70% of physical memory
 
innodb_log_file_size = 4G
#innodb_log_group_home_dir=
innodb_file_per_table
innodb_lock_wait_timeout=100
# Calculate index stats more often
innodb_stats_sample_pages=100
 
innodb_log_buffer_size=4M
innodb_thread_concurrency = 128  
 
innodb_io_capacity=400
innodb_io_capacity_max=4000
 
innodb_strict_mode=on



 Comments   
Comment by Alice Sherepa [ 2023-05-11 ]

Could you please provide the output of SHOW CREATE TABLE director.proc ;

Comment by Marko Mäkelä [ 2023-05-12 ]

This is basically a continuation of the infamous MDEV-9663.

Until MDEV-27734 in MariaDB Server 10.5.15 and 10.6.7, InnoDB change buffering was enabled by default. It has been the source of many corruption bugs, the MDEV-30009 "double buffering" bug being the latest that I am aware of. I think that it could explain this corruption and crash.

If the affected index includes virtual columns (either explicitly or via MDEV-371), there could be other open bugs that would explain the corruption.

The crash upon encountering this particular corruption has not been fixed in any version yet. MDEV-13542 in MariaDB Server 10.6 did remove many crashes on corruption, but not this one. It could make sense to go through each and every ut_a() in MariaDB Server 10.6.

Comment by Rob [ 2023-05-12 ]

Thanks for your review
Here is the requested create table.
As you can see it is a small table with few rows.
The crash seemed to have occurred at a moment when SET FOREIGN_KEY_CHECKS=0 and the insert is performed after deleting the same records;
The issue was also replicated so it affected all masters and slaves. The other masters and slaves end up in a loop of crashing, after each restart it attempts to do the query again, crashing the server. This was resolved by updating my.cnf not to start replication.
There were about 40 tables restored like this and about 10 of them got corrupted (like Proc). This were all small tables with few rows. No such restore was attempted on large tables.
Optimize table resolved the corruption after we downgrade to version 10.4.28. 24 hours later it is still working like sunshine on the 4.28.
Looking at the release notes it is likely a replication failure. Given that we crashed 10 tables, probably on the master doing inserts it was working and the corruption occurred on the other master and slaves only.

CREATE TABLE `proc` (
`PROCID` int(10) unsigned NOT NULL AUTO_INCREMENT,
`GOALID` int(10) unsigned NOT NULL,
`TEST` varchar(5) CHARACTER SET latin1 COLLATE latin1_swedish_ci NOT NULL,
.....
PRIMARY KEY (`PROCID`) USING BTREE,
KEY `new_fk_constraint` (`GOALID`),
CONSTRAINT `new_fk_constraint` FOREIGN KEY (`GOALID`) REFERENCES `goal` (`GOALID`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=611 DEFAULT CHARSET=utf8 COLLATE=utf8_general_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4

Comment by Marko Mäkelä [ 2023-05-12 ]

Hrehora, have you executed ALTER TABLE…ADD INDEX or CREATE INDEX or added a foreign key constraint (which could implicitly create an index) on any of these tables in the past? If yes, this could be due to MDEV-30009.

Comment by Rob [ 2023-05-12 ]

The addition or change of an index to the proc table is at least 10 years ago...

Comment by Marko Mäkelä [ 2023-05-17 ]

Without a repeatable test case, it is hard or impossible to say what caused the corruption in the first place.

What I can do is to replace the crash with proper error propagation. This report duplicates MDEV-22739.

Comment by Sergei Golubchik [ 2023-06-06 ]

This proper error propagation was implemented in 10.6.15 (planned release date — end of July). Could you try it, when it's out and comment here how it helped?

Comment by Rob [ 2023-06-06 ]

Oke I am going to check it when it comes out. Thanks

Comment by Elena Stepanova [ 2023-09-10 ]

Hrehora,
10.6.15 has been out for a while, did you have a chance to try it out?

Comment by Rob [ 2023-09-13 ]

Regretfully on v31 I continue to have the same problem.

Rob

Comment by Rob [ 2023-09-14 ]

What we did

  • We were running 10.4.28 master- master on centos 7
  • We yum update to 10.4.31
  • When we perform a restore of a table we get the error below
  • We downgrade to 10.4.28
  • We have to repair a set of innodb tables affected with optimize table

the master master runs great again
Let me know if I can help.

2023-09-13 07:42:14 0x7f4cae4c3700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.31/storage/innobase/row/row0ins.cc line 219
InnoDB: Failing assertion: !cursor->index->is_committed()
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
230913  7:42:14 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.4.31-MariaDB-log source revision: 2aea9387497cecb5668ef605b8f80886f9de812c
key_buffer_size=268435456
read_buffer_size=1048576
max_used_connections=264
max_threads=2002
thread_count=49
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4412014 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7f47580008c8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f4cae4c2cf0 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x563c4684ce5e]
/usr/sbin/mysqld(handle_fatal_signal+0x317)[0x563c4629aa07]
sigaction.c:0(__restore_rt)[0x7f7fdd68e630]
/lib64/libc.so.6(gsignal+0x37)[0x7f7fdcad9387]
/lib64/libc.so.6(abort+0x148)[0x7f7fdcadaa78]
/usr/sbin/mysqld(+0x5d6fb0)[0x563c45f98fb0]
/usr/sbin/mysqld(+0xc69a35)[0x563c4662ba35]
/usr/sbin/mysqld(+0xc6a0b3)[0x563c4662c0b3]
/usr/sbin/mysqld(+0xc6a6a2)[0x563c4662c6a2]
/usr/sbin/mysqld(+0xc7c610)[0x563c4663e610]
/usr/sbin/mysqld(+0xbb78fd)[0x563c465798fd]
/usr/sbin/mysqld(_ZN7handler12ha_write_rowEPKh+0x2af)[0x563c462a85ef]
/usr/sbin/mysqld(_Z12write_recordP3THDP5TABLEP12st_copy_info+0x6a)[0x563c4606c53a]
/usr/sbin/mysqld(_Z12mysql_insertP3THDP10TABLE_LISTR4ListI4ItemERS3_IS5_ES6_S6_15enum_duplicatesb+0x1284)[0x563c46073144]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3e78)[0x563c460a2c58]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x39d)[0x563c460a838d]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x2113)[0x563c460ab403]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x112)[0x563c460acc42]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x229)[0x563c46191ce9]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x563c46191dad]
pthread_create.c:0(start_thread)[0x7f7fdd686ea5]
/lib64/libc.so.6(clone+0x6d)[0x7f7fdcba1b0d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7f475801e1b0): insert into log.templateacttp (TEMPLATEID,ACTTYPEID) VALUES 
(8176,23863),(8176,23864),(8176,23868),(8176,23876),(8177,23851),(8177,23863),(8177,23864),(8177,23868),(8177,23876),(8178,23851),(8178,23863),(8178,23864),(8178,23868),(8178,23876),(8179,23851),(8179,23863),(8179,23864),(8179,23868),(8179,23876),(8180,23851)
 
Connection ID (thread ID): 3691832
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /data/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        0                    unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             1029880              1029880              processes 
Max open files            32768                32768                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       1029880              1029880              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        

Kernel version: Linux version 3.10.0-1160.83.1.el7.x86_64 (mockbuild@kbuilder.bsys.centos.org) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) ) #1 SMP Wed Jan 25 16:41:43 UTC 2023

Comment by Marko Mäkelä [ 2023-09-18 ]

Hrehora, by design, InnoDB used to deal with corruption by intentionally crashing itself. This was fixed in MariaDB Server 10.6 by MDEV-13542 and some follow-up fixes, such as MDEV-22739, where the assertion that fails for you in MariaDB Server 10.4 was removed.

A possible cause of the corruption is MDEV-32132.

It is not feasible to port these fixes to earlier major versions than 10.6. Can you please try an upgrade to MariaDB Server 10.6.15?

Generated at Thu Feb 08 10:22:23 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.