Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25296

Failing assertion: UT_LIST_GET_LEN(trx->lock.trx_locks) == 0 sporadically on SELECT with certain tables

    XMLWordPrintable

    Details

      Description

      Hi there,
      we experience a failing assertion like this very often, when certain SELECT statements are running:

      2021-03-29 21:02:40 0x7faae5bea700  InnoDB: Assertion failure in file /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.3.27/storage/innobase/trx/trx0trx.cc line 1360
      InnoDB: Failing assertion: UT_LIST_GET_LEN(trx->lock.trx_locks) == 0
      InnoDB: We intentionally generate a memory trap.
      InnoDB: Submit a detailed bug report to https://jira.mariadb.org/
      InnoDB: If you get repeated assertion failures or crashes, even
      InnoDB: immediately after the mysqld startup, there may be
      InnoDB: corruption in the InnoDB tablespace. Please refer to
      InnoDB: https://mariadb.com/kb/en/library/innodb-recovery-modes/
      InnoDB: about forcing recovery.
      210329 21:02:40 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
       
      Server version: 10.3.27-MariaDB-log
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=79
      max_threads=1502
      thread_count=71
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3433243 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7fa909024fa8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7faae5be9d40 thread_stack 0x49000
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x559a93cfb35e]
      /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x559a9379418f]
      /lib64/libpthread.so.0(+0xf7e0)[0x7fb9510697e0]
      /lib64/libc.so.6(gsignal+0x35)[0x7fb94f6c34f5]
      /lib64/libc.so.6(abort+0x175)[0x7fb94f6c4cd5]
      /usr/sbin/mysqld(+0x5453fe)[0x559a934d23fe]
      /usr/sbin/mysqld(+0xae927d)[0x559a93a7627d]
      /usr/sbin/mysqld(+0xae9414)[0x559a93a76414]
      /usr/sbin/mysqld(+0xae9724)[0x559a93a76724]
      /usr/sbin/mysqld(+0x99f5b8)[0x559a9392c5b8]
      /usr/sbin/mysqld(+0x99f78b)[0x559a9392c78b]
      /usr/sbin/mysqld(+0x9ac765)[0x559a93939765]
      /usr/sbin/mysqld(_ZN7handler16ha_external_lockEP3THDi+0x6f)[0x559a9379f65f]
      /usr/sbin/mysqld(+0x8f7e3a)[0x559a93884e3a]
      /usr/sbin/mysqld(_Z24mysql_unlock_read_tablesP3THDP13st_mysql_lock+0x68)[0x559a93885628]
      /usr/sbin/mysqld(_ZN4JOIN9join_freeEv+0x169)[0x559a9360d649]
      /usr/sbin/mysqld(_ZN4JOIN10exec_innerEv+0x9ed)[0x559a9360e04d]
      /usr/sbin/mysqld(_ZN4JOIN4execEv+0x33)[0x559a9360e303]
      /usr/sbin/mysqld(_Z12mysql_selectP3THDP10TABLE_LISTjR4ListI4ItemEPS4_jP8st_orderS9_S7_S9_yP13select_resultP18st_select_lex_unitP13st_select_lex+0x11a)[0x559a9360e45a]
      /usr/sbin/mysqld(_Z13handle_selectP3THDP3LEXP13select_resultm+0x1dc)[0x559a9360ef7c]
      /usr/sbin/mysqld(+0x531ff9)[0x559a934beff9]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x244a)[0x559a935b4e5a]
      /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x1d2)[0x559a935bb762]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0xf4f)[0x559a935bcfaf]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x11b)[0x559a935beadb]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1d6)[0x559a93694ac6]
      /usr/sbin/mysqld(handle_one_connection+0x3d)[0x559a93694bdd]
      /usr/sbin/mysqld(+0xd1f3ad)[0x559a93cac3ad]
      /lib64/libpthread.so.0(+0x7aa1)[0x7fb951061aa1]
      /lib64/libc.so.6(clone+0x6d)[0x7fb94f779c4d]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7fa909078320): select d.data_id, d.table_name, d.event_type, d.row_data as row_data, d.pk_data as pk_data, d.old_data as old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, d.node_list from sym_data d where d.channel_id='default'  and ((d.data_id between 9677318210 and 14677318209))   order by d.data_id asc
       
      Connection ID (thread ID): 84913
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
       
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /var/lib/mysql
      Core pattern: core
      

      This instance is configured as a replica and additionally runs another trigger-based replication tool SymmetriDS (https://www.symmetricds.org/).
      SymmetricDS runs plenty of statements like this once a day to clean old data:

      select d.data_id, d.table_name, d.event_type, d.row_data as row_data, d.pk_data as pk_data, d.old_data as old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, d.node_list from sym_data d where d.channel_id='default'  and ((d.data_id between 9677318210 and 14677318209))   order by d.data_id asc
      

      Unfortunately there is no safe way to reproduce it, but the more statements like this are running, the higher is the chance that the failure above happens after some minutes.

      The schema of the queried table is as follows:

      CREATE DEFINER=`SymmetricDS`@`127.0.0.1` TRIGGER `SYM_ON_D_FOR_JBDT_V_LNGN` AFTER DELETE ON `jobdata` FOR EACH ROW begin                                                                                                                                                     
                                         
                                        if 1=1 and @sync_triggers_disabled is null then                                                                                                 
                                          insert into `replication_info`.sym_data (table_name, event_type, trigger_hist_id, pk_data, old_data, channel_id, transaction_id, source_node_id, external_data, create_time)
                                          values(                                                                                                                                                            
                                            'jobdata',                                                                                                                                            
                                            'D',                                                                                                                                                             
                                            494,                                                                                                                                             
                                            concat(
                if(old.`OID` is null,'',concat('"',cast(old.`OID` as char),'"'))                                                                                                                                                
                                             ),                                                                                                                                                              
                                             concat(
                if(old.`OID` is null,'',concat('"',cast(old.`OID` as char),'"')),',',
                if(old.`IDJOB` is null,'',concat('"',cast(old.`IDJOB` as char),'"')),',',
                if(old.`IDATTR` is null,'',concat('"',cast(old.`IDATTR` as char),'"')),',',
                cast(if(old.`SIDE` is null,'',concat('"',replace(replace(old.`SIDE`,'\\','\\\\'),'"','\\"'),'"')) as char),',',
                cast(if(old.`VALUE` is null,'',concat('"',replace(replace(old.`VALUE`,'\\','\\\\'),'"','\\"'),'"')) as char),',',
                cast(if(old.`BIGVALUE` is null,'',concat('"',replace(replace(old.`BIGVALUE`,'\\','\\\\'),'"','\\"'),'"')) as char),',',
                if(old.`OptimisticLockField` is null,'',concat('"',cast(old.`OptimisticLockField` as char),'"')),',',
                if(old.`DVALUE` is null,'',concat('"',cast(old.`DVALUE` as char),'"'))                                                                                                                                            
                                             ),                                                                                                                                                              
                                            'default', `replication_info`.sym_transaction_id_pre_5_7_6(), @sync_node_disabled,                                                                                                        
                                            (select CONCAT(c.KDLKZ, ' ', c.A_PJHG, ' ', c.A_PJUG, ' ', c.A_VKGB) from optovision.cusd001 c
      join optovision.jobhead h on old.IDJOB = h.OID
      where c.A_KDNR = h.colACCN),                                                                                                                                               
                                            CURRENT_TIMESTAMP                                                                                                                                                
                                          );                                                                                                                                                                 
                                        end if;                                                                                                                                                              
                                                                                                                                                                                        
                                      end
      

      The function sym_transaction_id_pre_5_7_6 that is used in the triggers is defined like this:

      CREATE DEFINER=`SymmetricDS`@`127.0.0.1` FUNCTION `sym_transaction_id_pre_5_7_6`()
      RETURNS varchar(50) CHARSET utf8
      LANGUAGE SQL
      NOT DETERMINISTIC
      READS SQL DATA
      SQL SECURITY DEFINER
      COMMENT ''
      begin                                                                                                                          
          declare comm_value varchar(50);                                                                                             
          declare comm_cur cursor for select VARIABLE_VALUE from INFORMATION_SCHEMA.SESSION_STATUS where VARIABLE_NAME='COM_COMMIT';  
          open comm_cur;                                                                                                              
          fetch comm_cur into comm_value;                                                                                             
          close comm_cur;                                                                                                             
          return concat(concat(connection_id(), '.'), comm_value);                                                                    
       end
      

      You can find our my.cnf attached. If needed, I can also provide a core dump, but it's quite big (14 GB compressed).

      We had the very same configuration in production with version 10.3.20 for over a year ( also with different versions of 10.1 even longer) without any issues. Since we updated from 10.3.20 to 10.3.27 this happens almost every day.

      Please let me know, if the core dump or any further details are needed.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              Unassigned
              Reporter:
              optonaegele Markus Nägele
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: