[MDEV-25296] Failing assertion: UT_LIST_GET_LEN(trx->lock.trx_locks) == 0 sporadically on SELECT with certain tables Created: 2021-03-30  Updated: 2023-03-14  Resolved: 2021-05-04

Status: Closed
Project: MariaDB Server
Component/s: Platform RedHat, Replication, Storage Engine - InnoDB, Triggers
Affects Version/s: 10.3.27
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Markus Nägele Assignee: Unassigned
Resolution: Incomplete Votes: 0
Labels: need_feedback
Environment:

CentOS release 6.10


Attachments: File my.cnf    
Issue Links:
Relates
relates to MDEV-24581 MDL race condition in trans_rollback_... Closed
relates to MDEV-25163 Rowid filter does not process storage... Closed

 Description   

Hi there,
we experience a failing assertion like this very often, when certain SELECT statements are running:

2021-03-29 21:02:40 0x7faae5bea700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.27/storage/innobase/trx/trx0trx.cc line 1360
InnoDB: Failing assertion: UT_LIST_GET_LEN(trx->lock.trx_locks) == 0
InnoDB: We intentionally generate a memory trap.
InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
InnoDB: If you get repeated assertion failures or crashes, even
InnoDB: immediately after the mysqld startup, there may be
InnoDB: corruption in the InnoDB tablespace. Please refer to
InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
InnoDB: about forcing recovery.
210329 21:02:40 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.3.27-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=79
max_threads=1502
thread_count=71
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3433243 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7fa909024fa8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7faae5be9d40 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x559a93cfb35e]
/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x559a9379418f]
/lib64/libpthread.so.0(+0xf7e0)[0x7fb9510697e0]
/lib64/libc.so.6(gsignal+0x35)[0x7fb94f6c34f5]
/lib64/libc.so.6(abort+0x175)[0x7fb94f6c4cd5]
/usr/sbin/mysqld(+0x5453fe)[0x559a934d23fe]
/usr/sbin/mysqld(+0xae927d)[0x559a93a7627d]
/usr/sbin/mysqld(+0xae9414)[0x559a93a76414]
/usr/sbin/mysqld(+0xae9724)[0x559a93a76724]
/usr/sbin/mysqld(+0x99f5b8)[0x559a9392c5b8]
/usr/sbin/mysqld(+0x99f78b)[0x559a9392c78b]
/usr/sbin/mysqld(+0x9ac765)[0x559a93939765]
/usr/sbin/mysqld(_ZN7handler16ha_external_lockEP3THDi+0x6f)[0x559a9379f65f]
/usr/sbin/mysqld(+0x8f7e3a)[0x559a93884e3a]
/usr/sbin/mysqld(_Z24mysql_unlock_read_tablesP3THDP13st_mysql_lock+0x68)[0x559a93885628]
/usr/sbin/mysqld(_ZN4JOIN9join_freeEv+0x169)[0x559a9360d649]
/usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x9ed)[0x559a9360e04d]
/usr/sbin/mysqld(_ZN4JOIN4execEv+0x33)[0x559a9360e303]
/usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x11a)[0x559a9360e45a]
/usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x1dc)[0x559a9360ef7c]
/usr/sbin/mysqld(+0x531ff9)[0x559a934beff9]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x244a)[0x559a935b4e5a]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1d2)[0x559a935bb762]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0xf4f)[0x559a935bcfaf]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x11b)[0x559a935beadb]
/usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1d6)[0x559a93694ac6]
/usr/sbin/mysqld(handle_one_connection+0x3d)[0x559a93694bdd]
/usr/sbin/mysqld(+0xd1f3ad)[0x559a93cac3ad]
/lib64/libpthread.so.0(+0x7aa1)[0x7fb951061aa1]
/lib64/libc.so.6(clone+0x6d)[0x7fb94f779c4d]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7fa909078320): select d.data_id, d.table_name, d.event_type, d.row_data as row_data, d.pk_data as pk_data, d.old_data as old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, d.node_list from sym_data d where d.channel_id='default'  and ((d.data_id between 9677318210 and 14677318209))   order by d.data_id asc
 
Connection ID (thread ID): 84913
Status: NOT_KILLED
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Core pattern: core

This instance is configured as a replica and additionally runs another trigger-based replication tool SymmetriDS (https://www.symmetricds.org/).
SymmetricDS runs plenty of statements like this once a day to clean old data:

select d.data_id, d.table_name, d.event_type, d.row_data as row_data, d.pk_data as pk_data, d.old_data as old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, d.node_list from sym_data d where d.channel_id='default'  and ((d.data_id between 9677318210 and 14677318209))   order by d.data_id asc

Unfortunately there is no safe way to reproduce it, but the more statements like this are running, the higher is the chance that the failure above happens after some minutes.

The schema of the queried table is as follows:

CREATE DEFINER=`SymmetricDS`@`127.0.0.1` TRIGGER `SYM_ON_D_FOR_JBDT_V_LNGN` AFTER DELETE ON `jobdata` FOR EACH ROW begin                                                                                                                                                     
                                   
                                  if 1=1 and @sync_triggers_disabled is null then                                                                                                 
                                    insert into `replication_info`.sym_data (table_name, event_type, trigger_hist_id, pk_data, old_data, channel_id, transaction_id, source_node_id, external_data, create_time)
                                    values(                                                                                                                                                            
                                      'jobdata',                                                                                                                                            
                                      'D',                                                                                                                                                             
                                      494,                                                                                                                                             
                                      concat(
          if(old.`OID` is null,'',concat('"',cast(old.`OID` as char),'"'))                                                                                                                                                
                                       ),                                                                                                                                                              
                                       concat(
          if(old.`OID` is null,'',concat('"',cast(old.`OID` as char),'"')),',',
          if(old.`IDJOB` is null,'',concat('"',cast(old.`IDJOB` as char),'"')),',',
          if(old.`IDATTR` is null,'',concat('"',cast(old.`IDATTR` as char),'"')),',',
          cast(if(old.`SIDE` is null,'',concat('"',replace(replace(old.`SIDE`,'\\','\\\\'),'"','\\"'),'"')) as char),',',
          cast(if(old.`VALUE` is null,'',concat('"',replace(replace(old.`VALUE`,'\\','\\\\'),'"','\\"'),'"')) as char),',',
          cast(if(old.`BIGVALUE` is null,'',concat('"',replace(replace(old.`BIGVALUE`,'\\','\\\\'),'"','\\"'),'"')) as char),',',
          if(old.`OptimisticLockField` is null,'',concat('"',cast(old.`OptimisticLockField` as char),'"')),',',
          if(old.`DVALUE` is null,'',concat('"',cast(old.`DVALUE` as char),'"'))                                                                                                                                            
                                       ),                                                                                                                                                              
                                      'default', `replication_info`.sym_transaction_id_pre_5_7_6(), @sync_node_disabled,                                                                                                        
                                      (select CONCAT(c.KDLKZ, ' ', c.A_PJHG, ' ', c.A_PJUG, ' ', c.A_VKGB) from optovision.cusd001 c
join optovision.jobhead h on old.IDJOB = h.OID
where c.A_KDNR = h.colACCN),                                                                                                                                               
                                      CURRENT_TIMESTAMP                                                                                                                                                
                                    );                                                                                                                                                                 
                                  end if;                                                                                                                                                              
                                                                                                                                                                                  
                                end

The function sym_transaction_id_pre_5_7_6 that is used in the triggers is defined like this:

CREATE DEFINER=`SymmetricDS`@`127.0.0.1` FUNCTION `sym_transaction_id_pre_5_7_6`()
RETURNS varchar(50) CHARSET utf8
LANGUAGE SQL
NOT DETERMINISTIC
READS SQL DATA
SQL SECURITY DEFINER
COMMENT ''
begin                                                                                                                          
    declare comm_value varchar(50);                                                                                             
    declare comm_cur cursor for select VARIABLE_VALUE from INFORMATION_SCHEMA.SESSION_STATUS where VARIABLE_NAME='COM_COMMIT';  
    open comm_cur;                                                                                                              
    fetch comm_cur into comm_value;                                                                                             
    close comm_cur;                                                                                                             
    return concat(concat(connection_id(), '.'), comm_value);                                                                    
 end

You can find our my.cnf attached. If needed, I can also provide a core dump, but it's quite big (14 GB compressed).

We had the very same configuration in production with version 10.3.20 for over a year ( also with different versions of 10.1 even longer) without any issues. Since we updated from 10.3.20 to 10.3.27 this happens almost every day.

Please let me know, if the core dump or any further details are needed.

The defintion of the table sym_data is as follows:

CREATE TABLE `sym_data` (
	`data_id` BIGINT(20) NOT NULL AUTO_INCREMENT,
	`table_name` VARCHAR(255) NOT NULL COLLATE 'utf8mb4_unicode_ci',
	`event_type` CHAR(1) NOT NULL COLLATE 'utf8mb4_unicode_ci',
	`row_data` MEDIUMTEXT NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`pk_data` MEDIUMTEXT NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`old_data` MEDIUMTEXT NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`trigger_hist_id` INT(11) NOT NULL,
	`channel_id` VARCHAR(128) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`transaction_id` VARCHAR(255) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`source_node_id` VARCHAR(50) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`external_data` VARCHAR(50) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`node_list` VARCHAR(255) NULL DEFAULT NULL COLLATE 'utf8mb4_unicode_ci',
	`create_time` DATETIME NULL DEFAULT NULL,
	PRIMARY KEY (`data_id`) USING BTREE,
	INDEX `sym_idx_d_channel_id` (`data_id`, `channel_id`) USING BTREE
)
COLLATE='utf8mb4_unicode_ci'
ENGINE=InnoDB
AUTO_INCREMENT=15653039877;



 Comments   
Comment by Marko Mäkelä [ 2021-03-30 ]

This looks similar to MDEV-25163. A repeatable test case would be preferred, because this should be the result of a race condition. A core dump could deliver us at most a snapshot of a bad state, but not the steps that lead to it.

Ideally, this should be repeated under rr record (https://rr-project.org) but those traces are not portable at all. Hence, a repeatable test case (starting with CREATE TABLE and INSERT) would be easiest to work with.

Comment by Roel Van de Paar [ 2023-03-10 ]

optonaegele Hi! Could you please provide the table def for sym_data? Thank you.

Comment by Markus Nägele [ 2023-03-14 ]

@Roel Hi, thanks for your feedback. I added the definition of sym_data to the description.

Generated at Thu Feb 08 09:36:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.