[MDEV-27438] Spider: crash in spider_sys_open_table, directly after crash recovery finished - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Cannot Reproduce
Affects Version/s: 10.6.14, 10.5
Fix Version/s: N/A
Component/s: Storage Engine - Spider, XA
Labels:
- crash

Description

After crash (probably related to ~~MDEV-24769~~) eventually MariaDB 10.5.12-8 instance got a different crash directly after startup, as we can see in this log snippet:

...

2021-12-09  1:14:49 0 [Note] Loaded 'server_audit2.so' with offset 0x7fb034965000

2021-12-09  1:14:49 0 [Note] Loaded 'disks.so' with offset 0x7fb034762000

2021-12-09  1:14:49 0 [Note] Loaded 'auth_ed25519.so' with offset 0x7fb034551000

2021-12-09  1:14:49 0 [Note] Loaded 'simple_password_check.so' with offset 0x7fb03434e000

2021-12-09  1:14:49 0 [Note] Loaded 'ha_spider.so' with offset 0x7fb02fc9e000

2021-12-09  1:14:53 0 [Note] mariadbd: Aria engine: starting recovery

tables to flush: 9 8 7 6 5 4 3 2 1 0

 (0.0 seconds);

2021-12-09  1:14:53 0 [Note] mariadbd: Aria engine: recovery done

2021-12-09  1:14:53 0 [Warning] The parameter innodb_thread_concurrency is deprecated and has no effect.

2021-12-09  1:14:53 0 [Warning] InnoDB: innodb_open_files 1000000 should not be greater than the open_files_limit 32768

2021-12-09  1:14:53 0 [Note] InnoDB: Uses event mutexes

2021-12-09  1:14:53 0 [Note] InnoDB: Compressed tables use zlib 1.2.7

2021-12-09  1:14:53 0 [Note] InnoDB: Number of pools: 1

2021-12-09  1:14:53 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions

2021-12-09  1:14:53 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)

2021-12-09  1:14:53 0 [Note] InnoDB: Using Linux native AIO

2021-12-09  1:14:53 0 [Note] InnoDB: Initializing buffer pool, total size = 214748364800, chunk size = 134217728

2021-12-09  1:15:04 0 [Note] InnoDB: Completed initialization of buffer pool

2021-12-09  1:15:04 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=9289381264203,9289408027088

2021-12-09  1:15:08 0 [Note] InnoDB: Transaction 1550231304 was in the XA prepared state.

2021-12-09  1:15:08 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo

2021-12-09  1:15:08 0 [Note] InnoDB: Trx id counter is 1550231305

2021-12-09  1:15:08 0 [Note] InnoDB: Starting final batch to recover 39401 pages from redo log.

2021-12-09  1:15:11 0 [Note] InnoDB: Last binlog file '/mariadblog/binlog/db2-bin.000240', position 567549

2021-12-09  1:15:11 0 [Note] InnoDB: 128 rollback segments are active.

2021-12-09  1:15:11 0 [Note] InnoDB: Starting in background the rollback of recovered transactions

2021-12-09  1:15:11 0 [Note] InnoDB: Rollback of non-prepared transactions completed

2021-12-09  1:15:11 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"

2021-12-09  1:15:11 0 [Note] InnoDB: Creating shared tablespace for temporary tables

2021-12-09  1:15:11 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...

2021-12-09  1:15:11 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.

2021-12-09  1:15:11 0 [Note] InnoDB: 10.5.12 started; log sequence number 9289421791370; transaction id 1550231306

2021-12-09  1:15:11 0 [Note] InnoDB: Loading buffer pool(s) from /data/mysql/ib_buffer_pool

2021-12-09  1:15:11 0 [Note] Plugin 'FEEDBACK' is disabled.

2021-12-09  1:15:11 server_audit: MariaDB Audit Plugin version 2.4.0 STARTED.

2021-12-09  1:15:11 0 [Note] Loaded 'sql_errlog.so' with offset 0x7f77a05f6000

2021-12-09  1:15:11 0 [ERROR] mariadbd: Plugin 'spider' already installed

2021-12-09  1:15:11 0 [ERROR] mariadbd: Plugin 'spider_alloc_mem' already installed

2021-12-09  1:15:11 0 [Note] Loaded 'metadata_lock_info.so' with offset 0x7f77a03f3000

2021-12-09  1:15:11 0 [Note] Using unique option prefix 'sql_error_log_file' is error-prone and can break in the future. Please use the full name 'sql-error-log-filename' instead.

2021-12-09  1:15:11 0 [Note] Recovering after a crash using /mariadblog/binlog/db2-bin

2021-12-09  1:15:11 0 [Note] Starting crash recovery...

2021-12-09  1:15:11 0 [Note] InnoDB: Starting recovery for XA transactions...

2021-12-09  1:15:11 0 [Note] InnoDB: Transaction 1550231304 in prepared state after recovery

2021-12-09  1:15:11 0 [Note] InnoDB: Transaction contains changes to 1 rows

2021-12-09  1:15:11 0 [Note] InnoDB: 1 transactions in prepared state after recovery

2021-12-09  1:15:11 0 [Note] Found 1 prepared transaction(s) in InnoDB

2021-12-09  1:15:11 0 [Note] Crash recovery finished.

211209  1:15:11 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.5.12-8-MariaDB-enterprise-log

key_buffer_size=8388608

read_buffer_size=2097152

max_used_connections=0

max_threads=65537

thread_count=20

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 939623430 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x16d000

/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x557b97dabd0e]

/usr/sbin/mariadbd(handle_fatal_signal+0x307)[0x557b9777bcc7]

sigaction.c:0(__restore_rt)[0x7fb0388b8630]

??:0(spider_sys_open_table(THD*, TABLE_LIST*, start_new_trans**))[0x7fb02fcff5bd]

??:0(spider_open_sys_table(THD*, char const*, int, bool, start_new_trans**, bool, int*))[0x7fb02fcff839]

??:0(spider_internal_xa_rollback_by_xid(THD*, st_spider_transaction*, xid_t*))[0x7fb02fd0d889]

??:0(spider_xa_rollback_by_xid(handlerton*, xid_t*))[0x7fb02fd108da]

/usr/sbin/mariadbd(+0x994d36)[0x557b9777dd36]

/usr/sbin/mariadbd(_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0x148)[0x557b9757de08]

/usr/sbin/mariadbd(+0x9940db)[0x557b9777d0db]

/usr/sbin/mariadbd(my_hash_iterate+0x50)[0x557b97d89b00]

/usr/sbin/mariadbd(_Z19ha_recover_completeP7st_hashPSt4pairIjyE+0x2c)[0x557b9777fbbc]

/usr/sbin/mariadbd(_ZN16Recovery_context8completeEP13MYSQL_BIN_LOGR7st_hash+0x8a)[0x557b97875b1a]

/usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG7recoverEP11st_log_infoPKcP11st_io_cacheP28Format_description_log_eventb+0x5c3)[0x557b97876863]

/usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG18do_binlog_recoveryEPKcb+0x28d)[0x557b97876bfd]

sql/handler.cc:2479(Vers_type_trx::check_sys_fields(st_mysql_const_lex_string const&, Column_definition const*, Column_definition const*) const)[0x557b9787b77a]

sql/sql_plugin.cc:2508(plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int*, void*), int, unsigned int, void*))[0x557b9749e263]

sql/handler.cc:2535(Discovered_table_list::add_file(char const*))[0x557b974a3cfb]

??:0(__libc_start_main)[0x7fb037cef555]

/usr/sbin/mariadbd(+0x6af503)[0x557b97498503]

The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains

information that should help you find out what is causing the crash.

Writing a core file...

Working directory at /data/mysql

Resource Limits:

Limit                     Soft Limit           Hard Limit           Units

Max cpu time              unlimited            unlimited            seconds

Max file size             unlimited            unlimited            bytes

Max data size             unlimited            unlimited            bytes

Max stack size            8388608              unlimited            bytes

Max core file size        unlimited            unlimited            bytes

Max resident set          unlimited            unlimited            bytes

Max processes             1006755              1006755              processes

Max open files            32768                32768                files

Max locked memory         65536                65536                bytes

Max address space         unlimited            unlimited            bytes

Max file locks            unlimited            unlimited            locks

Max pending signals       1006755              1006755              signals

Max msgqueue size         819200               819200               bytes

Max nice priority         0                    0

Max realtime priority     0                    0

Max realtime timeout      unlimited            unlimited            us

Core pattern: core

...

After that the instance continues crashing like that on every startup.

Attachments

Issue Links

is blocked by

MDEV-34925 Mariadb server 11.4.3-1-1 crashes on bootup in spider_sys_open_table

Closed

relates to

MDEV-32822 Crash signal 11 in spider_sys_open_table from spider_internal_xa_rollback_by_xid during recovery after crash

Closed

MDEV-34302 Crash with RocksDB

Closed

MDEV-34925 Mariadb server 11.4.3-1-1 crashes on bootup in spider_sys_open_table

Closed

split to

MDEV-30191 SIGSEGV & heap-use-after-free in spider_db_print_item_type, SIGABRT in __cxa_pure_virtual/spider_db_print_item_type, Got error 128 "Out of memory in engine", 56/112 memory not freed, and Assertion `fixed()' failed in Item_sp_variable::val_str on SP call

Closed

Activity

Ascending order - Click to sort in descending order

View 14 older comments

Michael Widenius added a comment - 2023-09-08 13:38

ycp Is it not possible to generate a test case where we do an XA commit in spider and then cause it to crash and check if recovery works

Have you tried that?
You could also ask QA to try to do the above, but I suggest that you try to do it first yourself

Michael Widenius added a comment - 2023-09-08 13:38 ycp Is it not possible to generate a test case where we do an XA commit in spider and then cause it to crash and check if recovery works Have you tried that? You could also ask QA to try to do the above, but I suggest that you try to do it first yourself

Yuchen Pei added a comment - 2023-09-13 03:26

monty I took a look at the code. The trace looks a bit confusing -
for example, the line handler.h:3952 appears twice with different
functions (handler::delete_table(char const*) and
Discovered_table_list::add_file(char const*)), neither agrees with the
function on that line at the commit hash quoted above
(0c6b89c694a88aae764b152045bf5bdb234d7edf). So I did a bit of manual
fuzzy match (against CS 10.6 4e7d2e73c0978707fb862cdbd842090ba635163a)
and the closest I got is the following "trace", where "..." separates
function of the stack frame from the relevant line in that function
that calls the function in the upper stack frame.

spider_hton->commit_by_xid = spider_xa_commit_by_xid;

static void xarecover_do_commit_or_rollback(handlerton *hton,

...

  rc= xarecover_decide_to_commit(member, ptr_commit_max) ?

    hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);

static my_bool xarecover_do_commit_or_rollback_handlerton(THD *unused,

...

    xarecover_do_commit_or_rollback(hton, (xarecover_complete_arg *) arg);

static my_bool xarecover_complete_and_count(void *member_arg,

...

  (void) plugin_foreach(NULL, xarecover_do_commit_or_rollback_handlerton,

                        MYSQL_STORAGE_ENGINE_PLUGIN, complete_params);

uint ha_recover_complete(HASH *commit_list, Binlog_offset *coord)

...

  (void) my_hash_iterate(commit_list, xarecover_complete_and_count, &complete);

bool Recovery_context::complete(MYSQL_BIN_LOG *log, HASH &xids)

...

    uint count_in_prepare=

      ha_recover_complete(&xids,

                          !do_truncate ? NULL :

                          (truncate_gtid.seq_no > 0 ?

                           &binlog_truncate_coord : &last_gtid_coord));

int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,

...

#ifndef HAVE_REPLICATION

      if (ha_recover_complete(&xids))

#else

      if (ctx.complete(this, xids))

#endif

        goto err2;

MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery)

...

[two occurrences of TC_LOG_BINLOG::recover()]

And it looks like the recovery requires replication which I am not
very familiar with, so let us consult someone who is. Elkin: what
is the simplest way to trigger a recovery with mtr that reaches some
of these functions like MYSQL_BIN_LOG::do_binlog_recovery() and
TC_LOG_BINLOG::recover()?

Yuchen Pei added a comment - 2023-09-13 03:26 monty I took a look at the code. The trace looks a bit confusing - for example, the line handler.h:3952 appears twice with different functions (handler::delete_table(char const*) and Discovered_table_list::add_file(char const*)), neither agrees with the function on that line at the commit hash quoted above (0c6b89c694a88aae764b152045bf5bdb234d7edf). So I did a bit of manual fuzzy match (against CS 10.6 4e7d2e73c0978707fb862cdbd842090ba635163a) and the closest I got is the following "trace", where "..." separates function of the stack frame from the relevant line in that function that calls the function in the upper stack frame. spider_hton->commit_by_xid = spider_xa_commit_by_xid; static void xarecover_do_commit_or_rollback(handlerton *hton, ... rc= xarecover_decide_to_commit(member, ptr_commit_max) ? hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x); static my_bool xarecover_do_commit_or_rollback_handlerton(THD *unused, ... xarecover_do_commit_or_rollback(hton, (xarecover_complete_arg *) arg); static my_bool xarecover_complete_and_count(void *member_arg, ... (void) plugin_foreach(NULL, xarecover_do_commit_or_rollback_handlerton, MYSQL_STORAGE_ENGINE_PLUGIN, complete_params); uint ha_recover_complete(HASH *commit_list, Binlog_offset *coord) ... (void) my_hash_iterate(commit_list, xarecover_complete_and_count, &complete); bool Recovery_context::complete(MYSQL_BIN_LOG *log, HASH &xids) ... uint count_in_prepare= ha_recover_complete(&xids, !do_truncate ? NULL : (truncate_gtid.seq_no > 0 ? &binlog_truncate_coord : &last_gtid_coord)); int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name, ... #ifndef HAVE_REPLICATION if (ha_recover_complete(&xids)) #else if (ctx.complete(this, xids)) #endif goto err2; MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery) ... [two occurrences of TC_LOG_BINLOG::recover()] And it looks like the recovery requires replication which I am not very familiar with, so let us consult someone who is. Elkin : what is the simplest way to trigger a recovery with mtr that reaches some of these functions like MYSQL_BIN_LOG::do_binlog_recovery() and TC_LOG_BINLOG::recover()?

Andrei Elkin added a comment - 2023-09-13 07:18

ycp, these functions work in the normal binlogging server case. binlog.binlog_xa_recover could be instructive for you to understand low-level details. Specifically ha_recover_complete concludes the binlog-based server recovery with either committing of rolling back transactions in doubt (those that are present in binlog, but not committed yet in engine(s)).
Just in case, when the xid of the transaction is defined by the user the recovery may not work yet - see MDEV-21469 whose patch is in its completion phase.

Andrei Elkin added a comment - 2023-09-13 07:18 ycp , these functions work in the normal binlogging server case. binlog.binlog_xa_recover could be instructive for you to understand low-level details. Specifically ha_recover_complete concludes the binlog-based server recovery with either committing of rolling back transactions in doubt (those that are present in binlog, but not committed yet in engine(s)). Just in case, when the xid of the transaction is defined by the user the recovery may not work yet - see MDEV-21469 whose patch is in its completion phase.

Yuchen Pei added a comment - 2023-11-07 23:01

Hi elenst, I noticed that you changed the status to Open. Has a
test case been provided?

Yuchen Pei added a comment - 2023-11-07 23:01 Hi elenst , I noticed that you changed the status to Open. Has a test case been provided?

Elena Stepanova added a comment - 2023-11-07 23:11 - edited

Not that I know of; but it doesn't need to be set to "Needs Feedback" if it's already assigned to Valerii from whom you expect the action. This status doesn't do what you may expect it to do.

That said, if Valerii wants it to be in this status (e.g. because he is waiting for something from somebody to whom the ticket cannot be assigned), he can of course set the status himself, in this case it will make sense.

Elena Stepanova added a comment - 2023-11-07 23:11 - edited Not that I know of; but it doesn't need to be set to "Needs Feedback" if it's already assigned to Valerii from whom you expect the action. This status doesn't do what you may expect it to do. That said, if Valerii wants it to be in this status (e.g. because he is waiting for something from somebody to whom the ticket cannot be assigned), he can of course set the status himself, in this case it will make sense.

People

Assignee:: Unassigned

Reporter:: Valerii Kravchuk

Votes:: 1 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 2021-12-09 14:14

Updated:: 2024-10-16 06:24

Resolved:: 2024-02-15 10:45

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server