[MDEV-27438] Spider: crash in spider_sys_open_table, directly after crash recovery finished Created: 2021-12-09  Updated: 2023-12-19

Status: Open
Project: MariaDB Server
Component/s: Storage Engine - Spider, XA
Affects Version/s: 10.5, 10.6.14
Fix Version/s: 10.5, 10.6

Type: Bug Priority: Critical
Reporter: Valerii Kravchuk Assignee: Valerii Kravchuk
Resolution: Unresolved Votes: 1
Labels: crash

Issue Links:
Issue split
split to MDEV-30191 SIGSEGV & heap-use-after-free in spid... Closed
Relates
relates to MDEV-32822 Crash signal 11 with Rocksb Open

 Description   

After crash (probably related to MDEV-24769) eventually MariaDB 10.5.12-8 instance got a different crash directly after startup, as we can see in this log snippet:

...
2021-12-09  1:14:49 0 [Note] Loaded 'server_audit2.so' with offset 0x7fb034965000
2021-12-09  1:14:49 0 [Note] Loaded 'disks.so' with offset 0x7fb034762000
2021-12-09  1:14:49 0 [Note] Loaded 'auth_ed25519.so' with offset 0x7fb034551000
2021-12-09  1:14:49 0 [Note] Loaded 'simple_password_check.so' with offset 0x7fb03434e000
2021-12-09  1:14:49 0 [Note] Loaded 'ha_spider.so' with offset 0x7fb02fc9e000
2021-12-09  1:14:53 0 [Note] mariadbd: Aria engine: starting recovery
tables to flush: 9 8 7 6 5 4 3 2 1 0
 (0.0 seconds); 
2021-12-09  1:14:53 0 [Note] mariadbd: Aria engine: recovery done
2021-12-09  1:14:53 0 [Warning] The parameter innodb_thread_concurrency is deprecated and has no effect.
2021-12-09  1:14:53 0 [Warning] InnoDB: innodb_open_files 1000000 should not be greater than the open_files_limit 32768
2021-12-09  1:14:53 0 [Note] InnoDB: Uses event mutexes
2021-12-09  1:14:53 0 [Note] InnoDB: Compressed tables use zlib 1.2.7
2021-12-09  1:14:53 0 [Note] InnoDB: Number of pools: 1
2021-12-09  1:14:53 0 [Note] InnoDB: Using crc32 + pclmulqdq instructions
2021-12-09  1:14:53 0 [Note] mariadbd: O_TMPFILE is not supported on /tmp (disabling future attempts)
2021-12-09  1:14:53 0 [Note] InnoDB: Using Linux native AIO
2021-12-09  1:14:53 0 [Note] InnoDB: Initializing buffer pool, total size = 214748364800, chunk size = 134217728
2021-12-09  1:15:04 0 [Note] InnoDB: Completed initialization of buffer pool
2021-12-09  1:15:04 0 [Note] InnoDB: Starting crash recovery from checkpoint LSN=9289381264203,9289408027088
2021-12-09  1:15:08 0 [Note] InnoDB: Transaction 1550231304 was in the XA prepared state.
2021-12-09  1:15:08 0 [Note] InnoDB: 1 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2021-12-09  1:15:08 0 [Note] InnoDB: Trx id counter is 1550231305
2021-12-09  1:15:08 0 [Note] InnoDB: Starting final batch to recover 39401 pages from redo log.
2021-12-09  1:15:11 0 [Note] InnoDB: Last binlog file '/mariadblog/binlog/db2-bin.000240', position 567549
2021-12-09  1:15:11 0 [Note] InnoDB: 128 rollback segments are active.
2021-12-09  1:15:11 0 [Note] InnoDB: Starting in background the rollback of recovered transactions
2021-12-09  1:15:11 0 [Note] InnoDB: Rollback of non-prepared transactions completed
2021-12-09  1:15:11 0 [Note] InnoDB: Removed temporary tablespace data file: "ibtmp1"
2021-12-09  1:15:11 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-12-09  1:15:11 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-12-09  1:15:11 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2021-12-09  1:15:11 0 [Note] InnoDB: 10.5.12 started; log sequence number 9289421791370; transaction id 1550231306
2021-12-09  1:15:11 0 [Note] InnoDB: Loading buffer pool(s) from /data/mysql/ib_buffer_pool
2021-12-09  1:15:11 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-12-09  1:15:11 server_audit: MariaDB Audit Plugin version 2.4.0 STARTED.
2021-12-09  1:15:11 0 [Note] Loaded 'sql_errlog.so' with offset 0x7f77a05f6000
2021-12-09  1:15:11 0 [ERROR] mariadbd: Plugin 'spider' already installed
2021-12-09  1:15:11 0 [ERROR] mariadbd: Plugin 'spider_alloc_mem' already installed
2021-12-09  1:15:11 0 [Note] Loaded 'metadata_lock_info.so' with offset 0x7f77a03f3000
2021-12-09  1:15:11 0 [Note] Using unique option prefix 'sql_error_log_file' is error-prone and can break in the future. Please use the full name 'sql-error-log-filename' instead.
2021-12-09  1:15:11 0 [Note] Recovering after a crash using /mariadblog/binlog/db2-bin
2021-12-09  1:15:11 0 [Note] Starting crash recovery...
2021-12-09  1:15:11 0 [Note] InnoDB: Starting recovery for XA transactions...
2021-12-09  1:15:11 0 [Note] InnoDB: Transaction 1550231304 in prepared state after recovery
2021-12-09  1:15:11 0 [Note] InnoDB: Transaction contains changes to 1 rows
2021-12-09  1:15:11 0 [Note] InnoDB: 1 transactions in prepared state after recovery
2021-12-09  1:15:11 0 [Note] Found 1 prepared transaction(s) in InnoDB
2021-12-09  1:15:11 0 [Note] Crash recovery finished.
211209  1:15:11 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.5.12-8-MariaDB-enterprise-log
key_buffer_size=8388608
read_buffer_size=2097152
max_used_connections=0
max_threads=65537
thread_count=20
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 939623430 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x16d000
/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x557b97dabd0e]
/usr/sbin/mariadbd(handle_fatal_signal+0x307)[0x557b9777bcc7]
sigaction.c:0(__restore_rt)[0x7fb0388b8630]
??:0(spider_sys_open_table(THD*, TABLE_LIST*, start_new_trans**))[0x7fb02fcff5bd]
??:0(spider_open_sys_table(THD*, char const*, int, bool, start_new_trans**, bool, int*))[0x7fb02fcff839]
??:0(spider_internal_xa_rollback_by_xid(THD*, st_spider_transaction*, xid_t*))[0x7fb02fd0d889]
??:0(spider_xa_rollback_by_xid(handlerton*, xid_t*))[0x7fb02fd108da]
/usr/sbin/mariadbd(+0x994d36)[0x557b9777dd36]
/usr/sbin/mariadbd(_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0x148)[0x557b9757de08]
/usr/sbin/mariadbd(+0x9940db)[0x557b9777d0db]
/usr/sbin/mariadbd(my_hash_iterate+0x50)[0x557b97d89b00]
/usr/sbin/mariadbd(_Z19ha_recover_completeP7st_hashPSt4pairIjyE+0x2c)[0x557b9777fbbc]
/usr/sbin/mariadbd(_ZN16Recovery_context8completeEP13MYSQL_BIN_LOGR7st_hash+0x8a)[0x557b97875b1a]
/usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG7recoverEP11st_log_infoPKcP11st_io_cacheP28Format_description_log_eventb+0x5c3)[0x557b97876863]
/usr/sbin/mariadbd(_ZN13MYSQL_BIN_LOG18do_binlog_recoveryEPKcb+0x28d)[0x557b97876bfd]
sql/handler.cc:2479(Vers_type_trx::check_sys_fields(st_mysql_const_lex_string const&, Column_definition const*, Column_definition const*) const)[0x557b9787b77a]
sql/sql_plugin.cc:2508(plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int*, void*), int, unsigned int, void*))[0x557b9749e263]
sql/handler.cc:2535(Discovered_table_list::add_file(char const*))[0x557b974a3cfb]
??:0(__libc_start_main)[0x7fb037cef555]
/usr/sbin/mariadbd(+0x6af503)[0x557b97498503]
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /data/mysql
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             1006755              1006755              processes 
Max open files            32768                32768                files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       1006755              1006755              signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: core
...

After that the instance continues crashing like that on every startup.



 Comments   
Comment by Julien Fritsch [ 2021-12-09 ]

nayuta-yanagisawa this looks super bad, can you please have a look tomorrow?

Comment by Nayuta Yanagisawa (Inactive) [ 2021-12-10 ]

There is no bug report including the function spider_xa_rollback_by_xid(). So, this seems to be a new bug.

spider_xa_rollback_by_xid() can be called only via hton->rollback_by_xid and it is set only when spider_param_support_xa() returns true. So, a workaround would be to set spider_support_xa to OFF (ON by default), while this is only possible when the user doesn't use XA.

int spider_db_init(
  void *p
) {
  ...
  if (spider_param_support_xa())
  {
    spider_hton->prepare = spider_xa_prepare;
    spider_hton->recover = spider_xa_recover;
    spider_hton->commit_by_xid = spider_xa_commit_by_xid;
    spider_hton->rollback_by_xid = spider_xa_rollback_by_xid;
  }
  ...
}

Comment by Nayuta Yanagisawa (Inactive) [ 2022-03-16 ]

Thread pointer: 0x0

The stack trace is incomplete but it seems that Spider crashed due to the NULL pointer dereference on the thread pointer, thd. I guess that this happened because the macro current_thd returned NULL and the function spider_sys_open_table() didn't assume the case where the thread pointer is NULL.

Comment by Roel Van de Paar [ 2022-03-17 ]

c++filt version of the stack

/usr/sbin/mariadbd(my_print_stacktrace+0x2e)[0x557b97dabd0e]
/usr/sbin/mariadbd(handle_fatal_signal+0x307)[0x557b9777bcc7]
sigaction.c:0(__restore_rt)[0x7fb0388b8630]
??:0(spider_sys_open_table(THD*, TABLE_LIST*, start_new_trans**))[0x7fb02fcff5bd]
??:0(spider_open_sys_table(THD*, char const*, int, bool, start_new_trans**, bool, int*))[0x7fb02fcff839]
??:0(spider_internal_xa_rollback_by_xid(THD*, st_spider_transaction*, xid_t*))[0x7fb02fd0d889]
??:0(spider_xa_rollback_by_xid(handlerton*, xid_t*))[0x7fb02fd108da]
/usr/sbin/mariadbd(+0x994d36)[0x557b9777dd36]
/usr/sbin/mariadbd(plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int*, void*), int, unsigned int, void*)+0x148)[0x557b9757de08]
/usr/sbin/mariadbd(+0x9940db)[0x557b9777d0db]
/usr/sbin/mariadbd(my_hash_iterate+0x50)[0x557b97d89b00]
/usr/sbin/mariadbd(ha_recover_complete(st_hash*, std::pair<unsigned int, unsigned long long>*)+0x2c)[0x557b9777fbbc]
/usr/sbin/mariadbd(Recovery_context::complete(MYSQL_BIN_LOG*, st_hash&)+0x8a)[0x557b97875b1a]
/usr/sbin/mariadbd(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool)+0x5c3)[0x557b97876863]
/usr/sbin/mariadbd(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool)+0x28d)[0x557b97876bfd]
sql/handler.cc:2479(Vers_type_trx::check_sys_fields(st_mysql_const_lex_string const&, Column_definition const*, Column_definition const*) const)[0x557b9787b77a]
sql/sql_plugin.cc:2508(plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int*, void*), int, unsigned int, void*))[0x557b9749e263]
sql/handler.cc:2535(Discovered_table_list::add_file(char const*))[0x557b974a3cfb]

I too could not locate any matching bug.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-03-18 ]

If the above guess of me is true, the bug is likely to reproduce on 10.2.

Comment by Nayuta Yanagisawa (Inactive) [ 2022-06-14 ]

I gave some tries to reproduce the issue but got no success. I don't feel that we can solve this problem without a reproducible test case. So, I change the status to NEED_FEEDBACK.

Having a reproducible test case is the best but I'd at least like to see the stack of the case spider_support_xa=OFF to understand the problem deeper.

Comment by Kathryn Sizemore [ 2022-12-09 ]

Attached sql to reproduce crash – with the stored procedure call usp_Archiving_AuditRecords_test()

Comment by Roel Van de Paar [ 2022-12-10 ]

Preliminary stack form 10.11 reproduction (debug build):

10.11.2 936436ef437c73911c18854a8ce8dad1216331b8 (Debug)

Core was generated by `/test/MD291122-mariadb-10.11.2-linux-x86_64-dbg/bin/mysqld --no-defaults --core'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000015366039eefb in spider_db_print_item_type (item=0x15362c17c6a8, 
    field=field@entry=0x0, spider=0x15362c123810, 
    str=str@entry=0x15362c130b20, alias=alias@entry=0x0, 
    alias_length=alias_length@entry=0, dbton_id=0, use_fields=false, 
    fields=0x0) at /test/10.11_dbg/storage/spider/spd_db_conn.cc:7384
7384	  switch (item->type())
[Current thread is 1 (Thread 0x1536604af700 (LWP 3425826))]
(gdb) bt
#0  0x000015366039eefb in spider_db_print_item_type (item=0x15362c17c6a8, field=field@entry=0x0, spider=0x15362c123810, str=str@entry=0x15362c130b20, alias=alias@entry=0x0, alias_length=alias_length@entry=0, dbton_id=0, use_fields=false, fields=0x0) at /test/10.11_dbg/storage/spider/spd_db_conn.cc:7384
#1  0x000015366041d133 in spider_mbase_handler::append_condition (this=this@entry=0x15362c130ac0, str=str@entry=0x15362c130b20, alias=alias@entry=0x0, alias_length=alias_length@entry=0, start_where=false, sql_type=sql_type@entry=1) at /test/10.11_dbg/storage/spider/spd_db_mysql.cc:11040
#2  0x000015366041d287 in spider_mbase_handler::append_condition_part (this=0x15362c130ac0, alias=0x0, alias_length=0, sql_type=1, test_flg=<optimized out>) at /test/10.11_dbg/storage/spider/spd_db_mysql.cc:10996
#3  0x00001536603eb2da in ha_spider::append_condition_sql_part (this=0x15362c123810, alias=0x0, alias_length=0, sql_type=sql_type@entry=1, test_flg=test_flg@entry=false) at /test/10.11_dbg/storage/spider/ha_spider.cc:10977
#4  0x000015366039f65c in spider_db_append_condition (spider=spider@entry=0x15362c123810, alias=alias@entry=0x0, alias_length=alias_length@entry=0, test_flg=test_flg@entry=false) at /test/10.11_dbg/storage/spider/spd_db_conn.cc:8248
#5  0x00001536603f043c in ha_spider::rnd_next_internal (this=this@entry=0x15362c123810, buf=buf@entry=0x15362c124360 "") at /test/10.11_dbg/storage/spider/ha_spider.cc:5669
#6  0x00001536603f10f8 in ha_spider::rnd_next (this=0x15362c123810, buf=0x15362c124360 "") at /test/10.11_dbg/storage/spider/ha_spider.cc:5963
#7  0x00005589a5a4f48a in handler::ha_rnd_next (this=0x15362c123810, buf=0x15362c124360 "") at /test/10.11_dbg/sql/handler.cc:3414
#8  0x00005589a5682cbf in rr_sequential (info=0x15362c17b858) at /test/10.11_dbg/sql/records.h:82
#9  0x00005589a57dcf37 in READ_RECORD::read_record (this=0x15362c17b858) at /test/10.11_dbg/sql/records.h:81
#10 join_init_read_record (tab=0x15362c17b790) at /test/10.11_dbg/sql/sql_select.cc:22838
#11 0x00005589a57c3e81 in sub_select (join=0x15362c11f890, join_tab=0x15362c17b790, end_of_records=false) at /test/10.11_dbg/sql/sql_select.cc:21841
#12 0x00005589a57f8f0d in do_select (procedure=<optimized out>, join=0x15362c11f890) at /test/10.11_dbg/sql/sql_select.cc:21387
#13 JOIN::exec_inner (this=this@entry=0x15362c11f890) at /test/10.11_dbg/sql/sql_select.cc:4824
#14 0x00005589a57f93f6 in JOIN::exec (this=0x15362c11f890) at /test/10.11_dbg/sql/sql_select.cc:4602
#15 0x00005589a5b5920c in subselect_single_select_engine::exec (this=0x15362c100390) at /test/10.11_dbg/sql/item_subselect.cc:4101
#16 0x00005589a5b5885a in Item_subselect::exec (this=0x15362c1001f0) at /test/10.11_dbg/sql/item_subselect.cc:811
#17 0x00005589a5b5774d in Item_exists_subselect::val_bool (this=0x15362c1001f0) at /test/10.11_dbg/sql/item_subselect.cc:1853
#18 0x00005589a56a6aff in sp_instr_jump_if_not::exec_core (this=0x15362c1003d0, thd=<optimized out>, nextp=0x1536604ad364) at /test/10.11_dbg/sql/sp_head.cc:4161
#19 0x00005589a56b0012 in sp_lex_keeper::reset_lex_and_exec_core (this=0x15362c100428, thd=0x15362c000d48, nextp=<optimized out>, open_tables=open_tables@entry=true, instr=<optimized out>) at /test/10.11_dbg/sql/sp_head.cc:3582
#20 0x00005589a56b0b53 in sp_instr_jump_if_not::execute (this=<optimized out>, thd=<optimized out>, nextp=<optimized out>) at /test/10.11_dbg/sql/sp_head.cc:4143
#21 0x00005589a56a8c8e in sp_head::execute (this=this@entry=0x15362c0a3130, thd=thd@entry=0x15362c000d48, merge_da_on_success=merge_da_on_success@entry=true) at /test/10.11_dbg/sql/sp_head.cc:1459
#22 0x00005589a56aaf23 in sp_head::execute_procedure (this=0x15362c0a3130, thd=thd@entry=0x15362c000d48, args=0x15362c005e88) at /test/10.11_dbg/sql/sp_head.cc:2446
#23 0x00005589a5764973 in do_execute_sp (thd=thd@entry=0x15362c000d48, sp=sp@entry=0x15362c0a3130) at /test/10.11_dbg/sql/sql_parse.cc:3026
#24 0x00005589a5769ece in Sql_cmd_call::execute (this=0x15362c013210, thd=0x15362c000d48) at /test/10.11_dbg/sql/sql_parse.cc:3271
#25 0x00005589a5776a03 in mysql_execute_command (thd=thd@entry=0x15362c000d48, is_called_from_prepared_stmt=is_called_from_prepared_stmt@entry=false) at /test/10.11_dbg/sql/sql_parse.cc:5999
#26 0x00005589a575f5a6 in mysql_parse (thd=thd@entry=0x15362c000d48, rawbuf=<optimized out>, length=<optimized out>, parser_state=parser_state@entry=0x1536604ae300) at /test/10.11_dbg/sql/sql_parse.cc:7998
#27 0x00005589a576cae1 in dispatch_command (command=command@entry=COM_QUERY, thd=thd@entry=0x15362c000d48, packet=packet@entry=0x15362c00adf9 "call usp_Archiving_AuditRecords()", packet_length=packet_length@entry=33, blocking=blocking@entry=true) at /test/10.11_dbg/sql/sql_class.h:1346
#28 0x00005589a576ef1f in do_command (thd=0x15362c000d48, blocking=blocking@entry=true) at /test/10.11_dbg/sql/sql_parse.cc:1407
#29 0x00005589a58c9b27 in do_handle_one_connection (connect=<optimized out>, connect@entry=0x5589a7b33928, put_in_cache=put_in_cache@entry=true) at /test/10.11_dbg/sql/sql_connect.cc:1415
#30 0x00005589a58c9ff6 in handle_one_connection (arg=0x5589a7b33928) at /test/10.11_dbg/sql/sql_connect.cc:1317
#31 0x000015367930b609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#32 0x0000153678ef7133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Comment by Roel Van de Paar [ 2022-12-10 ]

As the stack does not match the issue description, this is a separate bug. I will create a new ticket for the crash.

Comment by Roel Van de Paar [ 2022-12-10 ]

As for the (close to startup) crash issue/original description, can anyone who sees this please clarify if

--spider_table_crd_thread_count=1 --spider_table_sts_thread_count=1

were in use (in combination with --plugin-load-add=ha_spider.so)?

Please try without these options first to see if the issue remains (though even if it remains, albeit unlikely - but not impossible - it could be due to prior corruption).

Comment by Roel Van de Paar [ 2022-12-10 ]

Created MDEV-30191 (SIGSEGV in spider_db_print_item_type on Stored Procedure call) for the crash.

This ticket is reserved for the (close to startup) crash in spider_sys_open_table.

Comment by Roel Van de Paar [ 2022-12-10 ]

MDEV-30191 Has shown that the SP Call SQL listed there can cause corruption (heap-use-after-free), so it is possible, though somewhat remote, that the issue seen here could have been caused by that bug. Outside of this (remote) connection, there remains no link between the two for now.

Comment by Roel Van de Paar [ 2023-09-02 ]

c++filt resolved stack of last comment:

sigaction.c:0(__restore_rt)[0x7f9855b52630]
/usr/lib64/mysql/plugin/ha_spider.so(spider_sys_open_table(THD*, TABLE_LIST*, start_new_trans**)+0x17)[0x7f9850f1f187]
/usr/lib64/mysql/plugin/ha_spider.so(spider_open_sys_table(THD*, char const*, int, bool, start_new_trans**, bool, int*)+0xf5)[0x7f9850f1f415]
/usr/lib64/mysql/plugin/ha_spider.so(spider_internal_xa_commit_by_xid(THD*, st_spider_transaction*, xid_t*)+0x13f)[0x7f9850f2ca6f]
/usr/lib64/mysql/plugin/ha_spider.so(spider_xa_commit_by_xid(handlerton*, xid_t*)+0x3b)[0x7f9850f3000b]
/usr/sbin/mariadbd(+0xa3273d)[0x561f3766873d]
sql/handler.h:3952(handler::delete_table(char const*))[0x561f37433bf6]
sql/sql_list.h:685(plugin_foreach_with_mask(THD*, char (*)(THD*, st_plugin_int*, void*), int, unsigned int, void*))[0x561f3766756b]
sql/handler.h:3952(Discovered_table_list::add_file(char const*))[0x561f37ba7888]
/usr/sbin/mariadbd(ha_recover_complete(st_hash*, std::pair<unsigned int, unsigned long long>*)+0x2c)[0x561f3766c40c]
/usr/sbin/mariadbd(Recovery_context::complete(MYSQL_BIN_LOG*, st_hash&)+0x67)[0x561f37771bc7]
wsrep/exception.hpp:30(my_hash_iterate)[0x561f37772a80]
sql/handler.h:3952(ha_recover_complete(st_hash*, std::pair<unsigned int, unsigned long long>*))[0x561f37772ed3]
wsrep/exception.hpp:30(Recovery_context::complete(MYSQL_BIN_LOG*, st_hash&))[0x561f3777943b]
wsrep/exception.hpp:30(MYSQL_BIN_LOG::recover(st_log_info*, char const*, st_io_cache*, Format_description_log_event*, bool))[0x561f3733e51a]
wsrep/exception.hpp:30(MYSQL_BIN_LOG::do_binlog_recovery(char const*, bool))[0x561f37343450]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f9854f89555]
/usr/sbin/mariadbd(+0x7022e1)[0x561f373382e1]

Comment by Yuchen Pei [ 2023-09-05 ]

julien.fritsch Sure, re-assigned myself

Comment by Michael Widenius [ 2023-09-08 ]

ycp Is it not possible to generate a test case where we do an XA commit in spider and then cause it to crash and check if recovery works

Have you tried that?
You could also ask QA to try to do the above, but I suggest that you try to do it first yourself

Comment by Yuchen Pei [ 2023-09-13 ]

monty I took a look at the code. The trace looks a bit confusing -
for example, the line handler.h:3952 appears twice with different
functions (handler::delete_table(char const*) and
Discovered_table_list::add_file(char const*)), neither agrees with the
function on that line at the commit hash quoted above
(0c6b89c694a88aae764b152045bf5bdb234d7edf). So I did a bit of manual
fuzzy match (against CS 10.6 4e7d2e73c0978707fb862cdbd842090ba635163a)
and the closest I got is the following "trace", where "..." separates
function of the stack frame from the relevant line in that function
that calls the function in the upper stack frame.

spider_hton->commit_by_xid = spider_xa_commit_by_xid;
 
static void xarecover_do_commit_or_rollback(handlerton *hton,
...
  rc= xarecover_decide_to_commit(member, ptr_commit_max) ?
    hton->commit_by_xid(hton, &x) : hton->rollback_by_xid(hton, &x);
 
static my_bool xarecover_do_commit_or_rollback_handlerton(THD *unused,
...
    xarecover_do_commit_or_rollback(hton, (xarecover_complete_arg *) arg);
 
static my_bool xarecover_complete_and_count(void *member_arg,
...
  (void) plugin_foreach(NULL, xarecover_do_commit_or_rollback_handlerton,
                        MYSQL_STORAGE_ENGINE_PLUGIN, complete_params);
 
uint ha_recover_complete(HASH *commit_list, Binlog_offset *coord)
...
  (void) my_hash_iterate(commit_list, xarecover_complete_and_count, &complete);
 
bool Recovery_context::complete(MYSQL_BIN_LOG *log, HASH &xids)
...
    uint count_in_prepare=
      ha_recover_complete(&xids,
                          !do_truncate ? NULL :
                          (truncate_gtid.seq_no > 0 ?
                           &binlog_truncate_coord : &last_gtid_coord));
 
int TC_LOG_BINLOG::recover(LOG_INFO *linfo, const char *last_log_name,
...
#ifndef HAVE_REPLICATION
      if (ha_recover_complete(&xids))
#else
      if (ctx.complete(this, xids))
#endif
        goto err2;
 
MYSQL_BIN_LOG::do_binlog_recovery(const char *opt_name, bool do_xa_recovery)
...
[two occurrences of TC_LOG_BINLOG::recover()]

And it looks like the recovery requires replication which I am not
very familiar with, so let us consult someone who is. Elkin: what
is the simplest way to trigger a recovery with mtr that reaches some
of these functions like MYSQL_BIN_LOG::do_binlog_recovery() and
TC_LOG_BINLOG::recover()?

Comment by Andrei Elkin [ 2023-09-13 ]

ycp, these functions work in the normal binlogging server case. binlog.binlog_xa_recover could be instructive for you to understand low-level details. Specifically ha_recover_complete concludes the binlog-based server recovery with either committing of rolling back transactions in doubt (those that are present in binlog, but not committed yet in engine(s)).
Just in case, when the xid of the transaction is defined by the user the recovery may not work yet - see MDEV-21469 whose patch is in its completion phase.

Comment by Yuchen Pei [ 2023-11-07 ]

Hi elenst, I noticed that you changed the status to Open. Has a
test case been provided?

Comment by Elena Stepanova [ 2023-11-07 ]

Not that I know of; but it doesn't need to be set to "Needs Feedback" if it's already assigned to Valerii from whom you expect the action. This status doesn't do what you may expect it to do.

That said, if Valerii wants it to be in this status (e.g. because he is waiting for something from somebody to whom the ticket cannot be assigned), he can of course set the status himself, in this case it will make sense.

Generated at Thu Feb 08 09:52:55 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.