Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-27438

Spider: crash in spider_sys_open_table, directly after crash recovery finished

Details

    Description

      After crash (probably related to MDEV-24769) eventually MariaDB 10.5.12-8 instance got a different crash directly after startup, as we can see in this log snippet:

      ...
      2021-12-09  1:14:49 0 [Note] Loaded 'server_audit2.so' with offset 0x7fb034965000
      2021-12-09  1:14:49 0 [Note] Loaded 'disks.so' with offset 0x7fb034762000
      2021-12-09  1:14:49 0 [Note] Loaded 'auth_ed25519.so' with offset 0x7fb034551000
      2021-12-09  1:14:49 0 [Note] Loaded 'simple_password_check.so' with offset 0x7fb03434e000
      2021-12-09  1:14:49 0 [Note] Loaded 'ha_spider.so' with offset 0x7fb02fc9e000
      2021-12-09  1:14:53 0 [Note] mariadbd: Aria engine: starting recovery
      tables to flush: 9 8 7 6 5 4 3 2 1 0
       (0.0 seconds); 
      2021-12-09  1:14:53 0 [Note] mariadbd: Aria engine: recovery done
      2021-12-09  1:14:53 0 [Warning] The parameter innodb_thread_concurrency is deprecated and has no effect.
      2021-12-09  1:14:53 0 [Warning] InnoDB: innodb_open_files 1000000 should not be greater than the open_files_limit 32768
      2021-12-09  1:14:53 0 [Note] InnoDB: Uses event mutexes
      2021-12-09  1:14:53 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
      2021-12-09  1:14:53 0 [Note] InnoDB: Number of pools: 1
      2021-12-09  1:14:53 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
      2021-12-09  1:14:53 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
      2021-12-09  1:14:53 0 [Note] InnoDB: Using Linux native AIO
      2021-12-09  1:14:53 0 [Note] InnoDB: Initializing buffer pool, total size = 214748364800, chunk size = 134217728
      2021-12-09  1:15:04 0 [Note] InnoDB: Completed initialization of buffer pool
      2021-12-09  1:15:04 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=9289381264203,9289408027088
      2021-12-09  1:15:08 0 [Note] InnoDB: Transaction 1550231304 was in the XA prepared state.
      2021-12-09  1:15:08 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
      2021-12-09  1:15:08 0 [Note] InnoDB: Trx id counter is 1550231305
      2021-12-09  1:15:08 0 [Note] InnoDB: Starting final batch to recover 39401 pages from redo log.
      2021-12-09  1:15:11 0 [Note] InnoDB: Last binlog file '/mariadblog/binlog/db2-bin.000240', position 567549
      2021-12-09  1:15:11 0 [Note] InnoDB: 128 rollback segments are active.
      2021-12-09  1:15:11 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
      2021-12-09  1:15:11 0 [Note] InnoDB: Rollback of non-prepared transactions completed
      2021-12-09  1:15:11 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
      2021-12-09  1:15:11 0 [Note] InnoDB: Creating shared tablespace for temporary tables
      2021-12-09  1:15:11 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
      2021-12-09  1:15:11 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
      2021-12-09  1:15:11 0 [Note] InnoDB: 10.5.12 started; log sequence number 9289421791370; transaction id 1550231306
      2021-12-09  1:15:11 0 [Note] InnoDB: Loading buffer pool(s) from /data/mysql/ib_buffer_pool
      2021-12-09  1:15:11 0 [Note] Plugin 'FEEDBACK' is disabled.
      2021-12-09  1:15:11 server_audit: MariaDB Audit Plugin version 2.4.0 STARTED.
      2021-12-09  1:15:11 0 [Note] Loaded 'sql_errlog.so' with offset 0x7f77a05f6000
      2021-12-09  1:15:11 0 [ERROR] mariadbd: Plugin 'spider' already installed
      2021-12-09  1:15:11 0 [ERROR] mariadbd: Plugin 'spider_alloc_mem' already installed
      2021-12-09  1:15:11 0 [Note] Loaded 'metadata_lock_info.so' with offset 0x7f77a03f3000
      2021-12-09  1:15:11 0 [Note] Using unique option prefix 'sql_error_log_file' is error-prone and can break in the future. Please use the full name 'sql-error-log-filename' instead.
      2021-12-09  1:15:11 0 [Note] Recovering after a crash using /mariadblog/binlog/db2-bin
      2021-12-09  1:15:11 0 [Note] Starting crash recovery...
      2021-12-09  1:15:11 0 [Note] InnoDB: Starting recovery for XA transactions...
      2021-12-09  1:15:11 0 [Note] InnoDB: Transaction 1550231304 in prepared state after recovery
      2021-12-09  1:15:11 0 [Note] InnoDB: Transaction contains changes to 1 rows
      2021-12-09  1:15:11 0 [Note] InnoDB: 1 transactions in prepared state after recovery
      2021-12-09  1:15:11 0 [Note] Found 1 prepared transaction(s) in InnoDB
      2021-12-09  1:15:11 0 [Note] Crash recovery finished.
      211209  1:15:11 [ERROR] mysqld got signal 11 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.5.12-8-MariaDB-enterprise-log
      key_buffer_size=8388608
      read_buffer_size=2097152
      max_used_connections=0
      max_threads=65537
      thread_count=20
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 939623430 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x0 thread_stack 0x16d000
      /usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x557b97dabd0e]
      /usr/sbin/mariadbd(handle_fatal_signal+0x307)[0x557b9777bcc7]
      sigaction.c:0(__restore_rt)[0x7fb0388b8630]
      ??:0(spider_sys_open_table(THD*, TABLE_LIST*, start_new_trans**))[0x7fb02fcff5bd]
      ??:0(spider_open_sys_table(THD*, char const*, int, bool, start_new_trans**, bool, int*))[0x7fb02fcff839]
      ??:0(spider_internal_xa_rollback_by_xid(THD*, st_spider_transaction*, xid_t*))[0x7fb02fd0d889]
      ??:0(spider_xa_rollback_by_xid(handlerton*, xid_t*))[0x7fb02fd108da]
      /usr/sbin/mariadbd(+0x994d36)[0x557b9777dd36]
      /usr/sbin/mariadbd(_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0x148)[0x557b9757de08]
      /usr/sbin/mariadbd(+0x9940db)[0x557b9777d0db]
      /usr/sbin/mariadbd(my_hash_iterate+0x50)[0x557b97d89b00]
      /usr/sbin/mariadbd(_Z19ha_recover_completeP7st_hashPSt4pairIjyE+0x2c)[0x557b9777fbbc]
      /usr/sbin/mariadbd(_ZN16Recovery_context8completeEP13MYSQL_BIN_LOGR7st_hash+0x8a)[0x557b97875b1a]
      /usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG7recoverEP11st_log_infoPKcP11st_io_cacheP28Format_description_log_eventb+0x5c3)[0x557b97876863]
      /usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG18do_binlog_recoveryEPKcb+0x28d)[0x557b97876bfd]
      sql/handler.cc:2479(Vers_type_trx::check_sys_fields(st_mysql_const_lex_string const&, Column_definition const*, Column_definition const*) const)[0x557b9787b77a]
      sql/sql_plugin.cc:2508(plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int*, void*), int, unsigned int, void*))[0x557b9749e263]
      sql/handler.cc:2535(Discovered_table_list::add_file(char const*))[0x557b974a3cfb]
      ??:0(__libc_start_main)[0x7fb037cef555]
      /usr/sbin/mariadbd(+0x6af503)[0x557b97498503]
      The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
      information that should help you find out what is causing the crash.
      Writing a core file...
      Working directory at /data/mysql
      Resource Limits:
      Limit                     Soft Limit           Hard Limit           Units     
      Max cpu time              unlimited            unlimited            seconds   
      Max file size             unlimited            unlimited            bytes     
      Max data size             unlimited            unlimited            bytes     
      Max stack size            8388608              unlimited            bytes     
      Max core file size        unlimited            unlimited            bytes     
      Max resident set          unlimited            unlimited            bytes     
      Max processes             1006755              1006755              processes 
      Max open files            32768                32768                files     
      Max locked memory         65536                65536                bytes     
      Max address space         unlimited            unlimited            bytes     
      Max file locks            unlimited            unlimited            locks     
      Max pending signals       1006755              1006755              signals   
      Max msgqueue size         819200               819200               bytes     
      Max nice priority         0                    0                    
      Max realtime priority     0                    0                    
      Max realtime timeout      unlimited            unlimited            us        
      Core pattern: core
      ...
      

      After that the instance continues crashing like that on every startup.

      Attachments

        Issue Links

          Activity

            ycp Is it not possible to generate a test case where we do an XA commit in spider and then cause it to crash and check if recovery works

            Have you tried that?
            You could also ask QA to try to do the above, but I suggest that you try to do it first yourself

            monty Michael Widenius added a comment - ycp Is it not possible to generate a test case where we do an XA commit in spider and then cause it to crash and check if recovery works Have you tried that? You could also ask QA to try to do the above, but I suggest that you try to do it first yourself
            ycp Yuchen Pei added a comment -

            monty I took a look at the code. The trace looks a bit confusing -
            for example, the line handler.h:3952 appears twice with different
            functions (handler::delete_table(char const*) and
            Discovered_table_list::add_file(char const*)), neither agrees with the
            function on that line at the commit hash quoted above
            (0c6b89c694a88aae764b152045bf5bdb234d7edf). So I did a bit of manual
            fuzzy match (against CS 10.6 4e7d2e73c0978707fb862cdbd842090ba635163a)
            and the closest I got is the following "trace", where "..." separates
            function of the stack frame from the relevant line in that function
            that calls the function in the upper stack frame.

            spider_hton->commit_by_xid = spider_xa_commit_by_xid;
             
            static void xarecover_do_commit_or_rollback(handlerton *hton,
            ...
              rc= xarecover_decide_to_commit(member, ptr_commit_max) ?
                hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);
             
            static my_bool xarecover_do_commit_or_rollback_handlerton(THD *unused,
            ...
                xarecover_do_commit_or_rollback(hton, (xarecover_complete_arg *) arg);
             
            static my_bool xarecover_complete_and_count(void *member_arg,
            ...
              (void) plugin_foreach(NULL, xarecover_do_commit_or_rollback_handlerton,
                                    MYSQL_STORAGE_ENGINE_PLUGIN, complete_params);
             
            uint ha_recover_complete(HASH *commit_list, Binlog_offset *coord)
            ...
              (void) my_hash_iterate(commit_list, xarecover_complete_and_count, &complete);
             
            bool Recovery_context::complete(MYSQL_BIN_LOG *log, HASH &xids)
            ...
                uint count_in_prepare=
                  ha_recover_complete(&xids,
                                      !do_truncate ? NULL :
                                      (truncate_gtid.seq_no > 0 ?
                                       &binlog_truncate_coord : &last_gtid_coord));
             
            int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
            ...
            #ifndef HAVE_REPLICATION
                  if (ha_recover_complete(&xids))
            #else
                  if (ctx.complete(this, xids))
            #endif
                    goto err2;
             
            MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery)
            ...
            [two occurrences of TC_LOG_BINLOG::recover()]

            And it looks like the recovery requires replication which I am not
            very familiar with, so let us consult someone who is. Elkin: what
            is the simplest way to trigger a recovery with mtr that reaches some
            of these functions like MYSQL_BIN_LOG::do_binlog_recovery() and
            TC_LOG_BINLOG::recover()?

            ycp Yuchen Pei added a comment - monty I took a look at the code. The trace looks a bit confusing - for example, the line handler.h:3952 appears twice with different functions (handler::delete_table(char const*) and Discovered_table_list::add_file(char const*)), neither agrees with the function on that line at the commit hash quoted above (0c6b89c694a88aae764b152045bf5bdb234d7edf). So I did a bit of manual fuzzy match (against CS 10.6 4e7d2e73c0978707fb862cdbd842090ba635163a) and the closest I got is the following "trace", where "..." separates function of the stack frame from the relevant line in that function that calls the function in the upper stack frame. spider_hton->commit_by_xid = spider_xa_commit_by_xid;   static void xarecover_do_commit_or_rollback(handlerton *hton, ... rc= xarecover_decide_to_commit(member, ptr_commit_max) ? hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);   static my_bool xarecover_do_commit_or_rollback_handlerton(THD *unused, ... xarecover_do_commit_or_rollback(hton, (xarecover_complete_arg *) arg);   static my_bool xarecover_complete_and_count(void *member_arg, ... (void) plugin_foreach(NULL, xarecover_do_commit_or_rollback_handlerton, MYSQL_STORAGE_ENGINE_PLUGIN, complete_params);   uint ha_recover_complete(HASH *commit_list, Binlog_offset *coord) ... (void) my_hash_iterate(commit_list, xarecover_complete_and_count, &complete);   bool Recovery_context::complete(MYSQL_BIN_LOG *log, HASH &xids) ... uint count_in_prepare= ha_recover_complete(&xids, !do_truncate ? NULL : (truncate_gtid.seq_no > 0 ? &binlog_truncate_coord : &last_gtid_coord));   int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, ... #ifndef HAVE_REPLICATION if (ha_recover_complete(&xids)) #else if (ctx.complete(this, xids)) #endif goto err2;   MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery) ... [two occurrences of TC_LOG_BINLOG::recover()] And it looks like the recovery requires replication which I am not very familiar with, so let us consult someone who is. Elkin : what is the simplest way to trigger a recovery with mtr that reaches some of these functions like MYSQL_BIN_LOG::do_binlog_recovery() and TC_LOG_BINLOG::recover()?
            Elkin Andrei Elkin added a comment -

            ycp, these functions work in the normal binlogging server case. binlog.binlog_xa_recover could be instructive for you to understand low-level details. Specifically ha_recover_complete concludes the binlog-based server recovery with either committing of rolling back transactions in doubt (those that are present in binlog, but not committed yet in engine(s)).
            Just in case, when the xid of the transaction is defined by the user the recovery may not work yet - see MDEV-21469 whose patch is in its completion phase.

            Elkin Andrei Elkin added a comment - ycp , these functions work in the normal binlogging server case. binlog.binlog_xa_recover could be instructive for you to understand low-level details. Specifically ha_recover_complete concludes the binlog-based server recovery with either committing of rolling back transactions in doubt (those that are present in binlog, but not committed yet in engine(s)). Just in case, when the xid of the transaction is defined by the user the recovery may not work yet - see MDEV-21469 whose patch is in its completion phase.
            ycp Yuchen Pei added a comment -

            Hi elenst, I noticed that you changed the status to Open. Has a
            test case been provided?

            ycp Yuchen Pei added a comment - Hi elenst , I noticed that you changed the status to Open. Has a test case been provided?
            elenst Elena Stepanova added a comment - - edited

            Not that I know of; but it doesn't need to be set to "Needs Feedback" if it's already assigned to Valerii from whom you expect the action. This status doesn't do what you may expect it to do.

            That said, if Valerii wants it to be in this status (e.g. because he is waiting for something from somebody to whom the ticket cannot be assigned), he can of course set the status himself, in this case it will make sense.

            elenst Elena Stepanova added a comment - - edited Not that I know of; but it doesn't need to be set to "Needs Feedback" if it's already assigned to Valerii from whom you expect the action. This status doesn't do what you may expect it to do. That said, if Valerii wants it to be in this status (e.g. because he is waiting for something from somebody to whom the ticket cannot be assigned), he can of course set the status himself, in this case it will make sense.

            People

              Unassigned Unassigned
              valerii Valerii Kravchuk
              Votes:
              1 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.