Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-32822

Crash signal 11 in spider_sys_open_table from spider_internal_xa_rollback_by_xid during recovery after crash

Details

    Description

      It's was long time it's happend to me last time : (version of MariaDB : 10.11.6, on ubuntu 22.04)

      2023-11-16 12:19:11 0 [Note] Starting MariaDB 10.11.6-MariaDB-1:10.11.6+maria~ubu2204-log source revision fecd78b83785d5ae96f2c6ff340375be803cd299 as process 37569
      2023-11-16 12:19:11 0 [Note] mariadbd: Aria engine: starting recovery
      recovered pages: 0% 10% 20% 32% 46% 60% 71% 86% 96% 100% (0.0 seconds); tables to flush: 1 0
       (0.0 seconds); 
      2023-11-16 12:19:11 0 [Note] mariadbd: Aria engine: recovery done
      2023-11-16 12:19:11 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2023-11-16 12:19:11 0 [Note] InnoDB: Number of transaction pools: 1
      2023-11-16 12:19:11 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2023-11-16 12:19:11 0 [Note] InnoDB: Using liburing
      2023-11-16 12:19:11 0 [Note] InnoDB: Initializing buffer pool, total size = 47.000GiB, chunk size = 0.734GiB
      2023-11-16 12:19:11 0 [Note] InnoDB: Completed initialization of buffer pool
      2023-11-16 12:19:11 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
      2023-11-16 12:19:11 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=4695286604
      2023-11-16 12:19:12 0 [Note] InnoDB: End of log at LSN=4868790831
      2023-11-16 12:19:12 0 [Note] InnoDB: To recover: 464 pages
      2023-11-16 12:19:13 0 [Note] InnoDB: Last binlog file '/srv/mysql/binlog/mariadb-bin.000620', position 93990914
      2023-11-16 12:19:13 0 [Note] InnoDB: 128 rollback segments are active.
      2023-11-16 12:19:13 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
      2023-11-16 12:19:13 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2023-11-16 12:19:13 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
      2023-11-16 12:19:13 0 [Note] InnoDB: log sequence number 4868790831; transaction id 2877698
      2023-11-16 12:19:13 0 [Note] InnoDB: Loading buffer pool(s) from /srv/mysql/data/ib_buffer_pool
      2023-11-16 12:19:13 0 [Note] Plugin 'FEEDBACK' is disabled.
      231116 12:19:13 server_audit: MariaDB Audit Plugin version 1.4.14 STARTED.
      231116 12:19:13 server_audit: logging started to the file /srv/mysql/log/audit.log.
      2023-11-16 12:19:13 0 [Note] InnoDB: Buffer pool(s) load completed at 231116 12:19:13
      2023-11-16 12:19:13 0 [Note] RocksDB: 2 column families found
      2023-11-16 12:19:13 0 [Note] RocksDB: Column Families at start:
      2023-11-16 12:19:13 0 [Note]   cf=default
      2023-11-16 12:19:13 0 [Note]     write_buffer_size=67108864
      2023-11-16 12:19:13 0 [Note]     target_file_size_base=67108864
      2023-11-16 12:19:13 0 [Note]   cf=__system__
      2023-11-16 12:19:13 0 [Note]     write_buffer_size=67108864
      2023-11-16 12:19:13 0 [Note]     target_file_size_base=67108864
      2023-11-16 12:19:16 0 [Note] RocksDB: Table_store: loaded DDL data for 270 tables
      2023-11-16 12:19:16 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
      2023-11-16 12:19:16 0 [Note] MyRocks storage engine plugin has been successfully initialized.
      2023-11-16 12:19:16 0 [Note] Recovering after a crash using /srv/mysql/binlog/mariadb-bin
      2023-11-16 12:19:16 0 [Note] Starting table crash recovery...
      2023-11-16 12:19:16 0 [Note] Found 1 prepared transaction(s) in ROCKSDB
      2023-11-16 12:19:16 0 [Note] Crash table recovery finished.
      2023-11-16 12:19:16 0 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 0
      231116 12:19:16 [ERROR] mysqld got signal 11 ;
      Sorry, we probably made a mistake, and this is a bug.
       
      Your assistance in bug reporting will enable us to fix this for the next release.
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.11.6-MariaDB-1:10.11.6+maria~ubu2204-log source revision: fecd78b83785d5ae96f2c6ff340375be803cd299
      key_buffer_size=134217728
      read_buffer_size=2097152
      max_used_connections=0
      max_threads=1002
      thread_count=20
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 6314082 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x49000
      /usr/sbin/mariadbd(my_print_stacktrace+0x32)[0x56188120a8b2]
      /usr/sbin/mariadbd(handle_fatal_signal+0x478)[0x561880cd6e68]
      libc_sigaction.c:0(__restore_rt)[0x7fb66e042520]
      /usr/lib/mysql/plugin/ha_spider.so(_Z21spider_sys_open_tableP3THDP10TABLE_LISTPP15start_new_trans+0x1b)[0x7faa4593420b]
      /usr/lib/mysql/plugin/ha_spider.so(_Z21spider_open_sys_tableP3THDPKcibPP15start_new_transbPi+0xf6)[0x7faa45934496]
      /usr/lib/mysql/plugin/ha_spider.so(_Z34spider_internal_xa_rollback_by_xidP3THDP21st_spider_transactionP5xid_t+0x15b)[0x7faa4594238b]
      /usr/lib/mysql/plugin/ha_spider.so(_Z25spider_xa_rollback_by_xidP10handlertonP5xid_t+0x3f)[0x7faa4594534f]
      /usr/sbin/mariadbd(+0xa8b09d)[0x561880cd909d]
      /usr/sbin/mariadbd(_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0x1a9)[0x561880a8dc99]
      /usr/sbin/mariadbd(+0xa89dcf)[0x561880cd7dcf]
      /usr/sbin/mariadbd(my_hash_iterate+0x50)[0x5618811eb3c0]
      /usr/sbin/mariadbd(_Z19ha_recover_completeP7st_hashPSt4pairIjyE+0x30)[0x561880cdcc30]
      /usr/sbin/mariadbd(_ZN16Recovery_context8completeEP13MYSQL_BIN_LOGR7st_hash+0x6f)[0x561880df8b8f]
      /usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG7recoverEP11st_log_infoPKcP11st_io_cacheP28Format_description_log_eventb+0x780)[0x561880df9a90]
      /usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG18do_binlog_recoveryEPKcb+0x303)[0x561880df9ee3]
      /usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG4openEPKc+0x3f)[0x561880e0043f]
      /usr/sbin/mariadbd(+0x7147a9)[0x5618809627a9]
      /usr/sbin/mariadbd(_Z11mysqld_mainiPPc+0x424)[0x561880967384]
      nptl/libc_start_call_main.h:58(__libc_start_call_main)[0x7fb66e029d90]
      csu/libc-start.c:128(call_init)[0x7fb66e029e40]
      /usr/sbin/mariadbd(_start+0x25)[0x56188095be65]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mariadbd/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /srv/mysql/data
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units     
      Max cpu time              unlimited            unlimited            seconds   
      Max file size             unlimited            unlimited            bytes     
      Max data size             unlimited            unlimited            bytes     
      Max stack size            8388608              unlimited            bytes     
      Max core file size        0                    unlimited            bytes     
      Max resident set          unlimited            unlimited            bytes     
      Max processes             255361               255361               processes 
      Max open files            32768                32768                files     
      Max locked memory         524288               524288               bytes     
      Max address space         unlimited            unlimited            bytes     
      Max file locks            unlimited            unlimited            locks     
      Max pending signals       255361               255361               signals   
      Max msgqueue size         819200               819200               bytes     
      Max nice priority         0                    0                    
      Max realtime priority     0                    0                    
      Max realtime timeout      unlimited            unlimited            us        
      Core pattern: |/usr/share/apport/apport -p%p -s%s -c%c -d%d -P%P -u%u -g%g -- %E
       
      Kernel version: Linux version 6.2.0-36-generic (buildd@lcy02-amd64-050) (x86_64-linux-gnu-gcc-11 (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, GNU ld (GNU Binutils for Ubuntu) 2.38) #37~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct  9 15:34:04 UTC 2
       
      2023-11-16 12:20:45 0 [Note] Starting MariaDB 10.11.6-MariaDB-1:10.11.6+maria~ubu2204-log source revision fecd78b83785d5ae96f2c6ff340375be803cd299 as process 37740
      2023-11-16 12:20:45 0 [Note] mysqld: Aria engine: starting recovery
      recovered pages: 0% 10% 20% 32% 46% 60% 71% 86% 96% 100% (0.0 seconds); tables to flush: 1 0
       (0.0 seconds); 
      2023-11-16 12:20:45 0 [Note] mysqld: Aria engine: recovery done
      2023-11-16 12:20:45 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2023-11-16 12:20:45 0 [Note] InnoDB: Number of transaction pools: 1
      2023-11-16 12:20:45 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2023-11-16 12:20:45 0 [Note] InnoDB: Using liburing
      2023-11-16 12:20:45 0 [Note] InnoDB: Initializing buffer pool, total size = 47.000GiB, chunk size = 0.734GiB
      2023-11-16 12:20:45 0 [Note] InnoDB: Completed initialization of buffer pool
      2023-11-16 12:20:45 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
      2023-11-16 12:20:45 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=4695286604
      2023-11-16 12:20:47 0 [Note] InnoDB: End of log at LSN=4868791288
      2023-11-16 12:20:47 0 [Note] InnoDB: To recover: 464 pages
      2023-11-16 12:20:47 0 [Note] InnoDB: Last binlog file '/srv/mysql/binlog/mariadb-bin.000620', position 93990914
      2023-11-16 12:20:47 0 [Note] InnoDB: 128 rollback segments are active.
      2023-11-16 12:20:47 0 [Note] InnoDB: Removed temporary tablespace data file: "./ibtmp1"
      2023-11-16 12:20:47 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2023-11-16 12:20:47 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
      2023-11-16 12:20:47 0 [Note] InnoDB: log sequence number 4868791288; transaction id 2877698
      2023-11-16 12:20:47 0 [Note] Plugin 'FEEDBACK' is disabled.
      231116 12:20:47 server_audit: MariaDB Audit Plugin version 1.4.14 STARTED.
      231116 12:20:47 server_audit: logging started to the file /srv/mysql/log/audit.log.
      2023-11-16 12:20:47 0 [Note] InnoDB: Loading buffer pool(s) from /srv/mysql/data/ib_buffer_pool
      2023-11-16 12:20:47 0 [Note] InnoDB: Buffer pool(s) load completed at 231116 12:20:47
      2023-11-16 12:20:47 0 [Note] RocksDB: 2 column families found
      2023-11-16 12:20:47 0 [Note] RocksDB: Column Families at start:
      2023-11-16 12:20:47 0 [Note]   cf=default
      2023-11-16 12:20:47 0 [Note]     write_buffer_size=67108864
      2023-11-16 12:20:47 0 [Note]     target_file_size_base=67108864
      2023-11-16 12:20:47 0 [Note]   cf=__system__
      2023-11-16 12:20:47 0 [Note]     write_buffer_size=67108864
      2023-11-16 12:20:47 0 [Note]     target_file_size_base=67108864
      2023-11-16 12:20:50 0 [Note] RocksDB: Table_store: loaded DDL data for 270 tables
      2023-11-16 12:20:50 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
      2023-11-16 12:20:50 0 [Note] MyRocks storage engine plugin has been successfully initialized.
      2023-11-16 12:20:50 0 [Note] Heuristic crash recovery mode
      2023-11-16 12:20:50 0 [Note] Found 1 prepared transaction(s) in ROCKSDB
      2023-11-16 12:20:50 0 [Note] Please restart without --tc-heuristic-recover
      2023-11-16 12:20:50 0 [Note] Recovering after a crash using /srv/mysql/binlog/mariadb-bin
      2023-11-16 12:20:50 0 [ERROR] Error in Log_event::read_log_event(): 'Event invalid', data_len: 0, event_type: 0
      2023-11-16 12:20:50 0 [ERROR] Can't init tc log
      2023-11-16 12:20:50 0 [ERROR] Aborting
      231116 12:20:50 server_audit: STOPPED
      2023-11-16 12:21:00 0 [Note] Starting MariaDB 10.11.6-MariaDB-1:10.11.6+maria~ubu2204-log source revision fecd78b83785d5ae96f2c6ff340375be803cd299 as process 38171
      2023-11-16 12:21:00 0 [Note] InnoDB: Compressed tables use zlib 1.2.11
      2023-11-16 12:21:00 0 [Note] InnoDB: Number of transaction pools: 1
      2023-11-16 12:21:00 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2023-11-16 12:21:00 0 [Note] InnoDB: Using liburing
      2023-11-16 12:21:00 0 [Note] InnoDB: Initializing buffer pool, total size = 47.000GiB, chunk size = 0.734GiB
      2023-11-16 12:21:00 0 [Note] InnoDB: Completed initialization of buffer pool
      2023-11-16 12:21:00 0 [Note] InnoDB: Buffered log writes (block size=512 bytes)
      2023-11-16 12:21:00 0 [Note] InnoDB: End of log at LSN=4868792460
      2023-11-16 12:21:00 0 [Note] InnoDB: 128 rollback segments are active.
      2023-11-16 12:21:00 0 [Note] InnoDB: Setting file './ibtmp1' size to 12.000MiB. Physically writing the file full; Please wait ...
      2023-11-16 12:21:00 0 [Note] InnoDB: File './ibtmp1' size is now 12.000MiB.
      2023-11-16 12:21:00 0 [Note] InnoDB: log sequence number 4868792460; transaction id 2877698
      2023-11-16 12:21:00 0 [Note] Plugin 'FEEDBACK' is disabled.
      2023-11-16 12:21:00 0 [Note] InnoDB: Loading buffer pool(s) from /srv/mysql/data/ib_buffer_pool
      231116 12:21:00 server_audit: MariaDB Audit Plugin version 1.4.14 STARTED.
      231116 12:21:00 server_audit: logging started to the file /srv/mysql/log/audit.log.
      2023-11-16 12:21:00 0 [Note] InnoDB: Buffer pool(s) load completed at 231116 12:21:00
      2023-11-16 12:21:01 0 [Note] RocksDB: 2 column families found
      2023-11-16 12:21:01 0 [Note] RocksDB: Column Families at start:
      2023-11-16 12:21:01 0 [Note]   cf=default
      2023-11-16 12:21:01 0 [Note]     write_buffer_size=67108864
      2023-11-16 12:21:01 0 [Note]     target_file_size_base=67108864
      2023-11-16 12:21:01 0 [Note]   cf=__system__
      2023-11-16 12:21:01 0 [Note]     write_buffer_size=67108864
      2023-11-16 12:21:01 0 [Note]     target_file_size_base=67108864
      2023-11-16 12:21:01 0 [Note] RocksDB: Table_store: loaded DDL data for 270 tables
      2023-11-16 12:21:01 0 [Note] RocksDB: global statistics using get_sched_indexer_t indexer
      2023-11-16 12:21:01 0 [Note] MyRocks storage engine plugin has been successfully initialized.
      2023-11-16 12:21:01 0 [Note] Server socket created on IP: '0.0.0.0'.
      2023-11-16 12:21:01 22 [Note] Event Scheduler: scheduler thread started with id 22
      2023-11-16 12:21:01 0 [Note] /usr/sbin/mariadbd: ready for connections.
      Version: '10.11.6-MariaDB-1:10.11.6+maria~ubu2204-log'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  mariadb.org binary distribution
      

      I guess it's happened when my laptop was out of battery

      to solve it :

      mysqld --tc-heuristic-recover=rollback
      

      and start again mysql :

      service mysql start
      

      Attachments

        Issue Links

          Activity

            ycp Yuchen Pei added a comment -

            I managed to modify the binlog.binlog_xa_recover test to use a spider table to the point of usual crash site when adding the abort(); to xarecover_do_commit_or_rollback() in the original test. However, the function is not called in the updated test, because there is no xa transaction to recover. In the original case, there are four.

            It seems like spider skips internal XA without setting spider_internal_xa to 1:

            // in spider_internal_start_trx()
                if (
                  thd->transaction->xid_state.is_explicit_XA() && // external XA
                  spider_param_support_xa()
                ) {
                  trx->trx_xa = TRUE;
                  thd_get_xid(thd, (MYSQL_XID*) &trx->xid);
                }
             
                if (
                  !trx->trx_xa &&
                  trx->internal_xa &&       // from the spider_internal_xa variable
                  (!trx->trx_consistent_snapshot || trx->internal_xa_snapshot == 3) &&
                  spider->wide_handler->sql_command != SQLCOM_LOCK_TABLES
                ) {
                  trx->trx_xa = TRUE;

            When we do set spider_internal_xa to 1, we get "xid already exists" failure, most likely because spider tries to START XA with duplicate xid in its pre-query.

            Furthermore, spider triggers DBUG_SUICIDE() at xa_recover_expect_master_bin_000004 at the first server, as the binlog_checkpoint_name is master-bin.000007, rather than the expected master-bin.000004:

            // in TC_LOG_BINLOG::recover()
                if (round == 1)
                {
                  if (!binlog_checkpoint_found)
                    break;
                  DBUG_EXECUTE_IF("xa_recover_expect_master_bin_000004",
                      if (0 != strcmp("./master-bin.000004", binlog_checkpoint_name) &&
                          0 != strcmp(".\\master-bin.000004", binlog_checkpoint_name))
                        DBUG_SUICIDE();
                    );
                  if (find_log_pos(linfo, binlog_checkpoint_name, 1))

            I suspect this is caused by spider not doing internal XA. So the first step will be to find a way to make spider do internal XA before the first server crash and restart. The current attempt is at

            upstream/bb-10.11-mdev-27438-debug 58c293f2ec8266d441f699f46676bfe7a61c0cdc [demo][debug] spider: debugging MDEV-27438

            ycp Yuchen Pei added a comment - I managed to modify the binlog.binlog_xa_recover test to use a spider table to the point of usual crash site when adding the abort(); to xarecover_do_commit_or_rollback() in the original test. However, the function is not called in the updated test, because there is no xa transaction to recover. In the original case, there are four. It seems like spider skips internal XA without setting spider_internal_xa to 1: // in spider_internal_start_trx() if ( thd->transaction->xid_state.is_explicit_XA() && // external XA spider_param_support_xa() ) { trx->trx_xa = TRUE; thd_get_xid(thd, (MYSQL_XID*) &trx->xid); }   if ( !trx->trx_xa && trx->internal_xa && // from the spider_internal_xa variable (!trx->trx_consistent_snapshot || trx->internal_xa_snapshot == 3) && spider->wide_handler->sql_command != SQLCOM_LOCK_TABLES ) { trx->trx_xa = TRUE; When we do set spider_internal_xa to 1, we get "xid already exists" failure, most likely because spider tries to START XA with duplicate xid in its pre-query. Furthermore, spider triggers DBUG_SUICIDE() at xa_recover_expect_master_bin_000004 at the first server, as the binlog_checkpoint_name is master-bin.000007 , rather than the expected master-bin.000004 : // in TC_LOG_BINLOG::recover() if (round == 1) { if (!binlog_checkpoint_found) break ; DBUG_EXECUTE_IF( "xa_recover_expect_master_bin_000004" , if (0 != strcmp ( "./master-bin.000004" , binlog_checkpoint_name) && 0 != strcmp ( ".\\master-bin.000004" , binlog_checkpoint_name)) DBUG_SUICIDE(); ); if (find_log_pos(linfo, binlog_checkpoint_name, 1)) I suspect this is caused by spider not doing internal XA. So the first step will be to find a way to make spider do internal XA before the first server crash and restart. The current attempt is at upstream/bb-10.11-mdev-27438-debug 58c293f2ec8266d441f699f46676bfe7a61c0cdc [demo][debug] spider: debugging MDEV-27438

            That's an amazing news !

            Aurelien_LEQUOY Aurélien LEQUOY added a comment - That's an amazing news !
            ycp Yuchen Pei added a comment -

            Hi Aurelien_LEQUOY, can you try with this patch applied and let me know if the crash persists? Same goes for MDEV-34302. Thanks

            b7e0895e731 upstream/bb-10.5-mdev-34925 MDEV-34925 Fix segv on thd in spider xa recovery.
            

            ycp Yuchen Pei added a comment - Hi Aurelien_LEQUOY , can you try with this patch applied and let me know if the crash persists? Same goes for MDEV-34302 . Thanks b7e0895e731 upstream/bb-10.5-mdev-34925 MDEV-34925 Fix segv on thd in spider xa recovery.
            ycp Yuchen Pei added a comment -

            I'm marking this fixed by MDEV-34925, which is likely the same problem. Please reopen if the problem remains with the patch.

            ycp Yuchen Pei added a comment - I'm marking this fixed by MDEV-34925 , which is likely the same problem. Please reopen if the problem remains with the patch.

            i will try it tomorow

            Aurelien_LEQUOY Aurélien LEQUOY added a comment - i will try it tomorow

            People

              ycp Yuchen Pei
              Aurelien_LEQUOY Aurélien LEQUOY
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.