[MDEV-29549] Assertion `!(thd->rgi_slave && thd->rgi_slave->did_mark_start_commit) || thd->transaction->xid_state.is_explicit_XA()' failed in ha_rollback_trans Created: 2022-09-15  Updated: 2022-10-27

Status: Open
Project: MariaDB Server
Component/s: None
Affects Version/s: 10.6
Fix Version/s: 10.6

Type: Bug Priority: Major
Reporter: Angelique Sklavounos (Inactive) Assignee: Andrei Elkin
Resolution: Unresolved Votes: 0
Labels: None


 Description   

https://buildbot.askmonty.org/buildbot/builders/kvm-asan/builds/8934

10.6 40aa94df3

rpl.rpl_parallel_optimistic_xa 'innodb,mix' w2 [ fail ]
        Test ended at 2022-09-01 09:19:44
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_xa
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /home/buildbot/buildbot/build/mariadb-10.6.10/mysql-test/suite/rpl/t/rpl_parallel_optimistic_xa.test at line 85:
At line 43: query 'let $_result= `SELECT master_gtid_wait('$master_pos', $_slave_timeout)`' failed: <Unknown> (2013): Lost connection to server during query
 
The result from queries just before the failure was:
< snip >
CREATE VIEW v_processlist  as SELECT * FROM performance_schema.threads where type = 'FOREGROUND';
connection master;
ALTER TABLE mysql.gtid_slave_pos ENGINE=InnoDB;
connection slave;
include/stop_slave.inc
SET @old_parallel_threads            = @@GLOBAL.slave_parallel_threads;
SET @@global.slave_parallel_threads  = 7;
SET @old_parallel_mode               = @@GLOBAL.slave_parallel_mode;
SET @@global.slave_parallel_mode     ='optimistic';
SET @old_gtid_cleanup_batch_size     = @@GLOBAL.gtid_cleanup_batch_size;
SET @@global.gtid_cleanup_batch_size = 1000000;
CHANGE MASTER TO master_use_gtid=slave_pos;
connection master;
CREATE TABLE t0 (a int,             b INT) ENGINE=InnoDB;
CREATE TABLE t1 (a int PRIMARY KEY, b INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1, 0);
include/save_master_gtid.inc
connection slave;
include/start_slave.inc
include/sync_with_master_gtid.inc
 
More results from queries before failure can be found in /dev/shm/var/2/log/rpl_parallel_optimistic_xa.log
 
 
Server [mysqld.2 - pid: 41795, winpid: 41795, exit: 256] failed during test run
Server log from this test:
----------SERVER LOG START-----------
2022-09-01  9:19:28 250 [Note] Deleted Master_info file '/dev/shm/var/2/mysqld.2/data/master.info'.
2022-09-01  9:19:28 250 [Note] Deleted Master_info file '/dev/shm/var/2/mysqld.2/data/relay-log.info'.
2022-09-01  9:19:29 250 [Note] Master connection name: ''  Master_info_file: 'master.info'  Relay_info_file: 'relay-log.info'
2022-09-01  9:19:29 250 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='16020', master_log_file='', master_log_pos='4'. New state master_host='127.0.0.1', master_port='16020', master_log_file='master-bin.000001', master_log_pos='4'.
2022-09-01  9:19:29 252 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16020' in log 'master-bin.000001' at position 4
2022-09-01  9:19:29 252 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16020',replication started in log 'master-bin.000001' at position 4
2022-09-01  9:19:29 253 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 4, relay log './slave-relay-bin.000001' position: 4
2022-09-01  9:19:29 253 [Note] Error reading relay log event: slave SQL thread was killed
2022-09-01  9:19:29 253 [Note] Slave SQL thread exiting, replication stopped in log 'master-bin.000001' at position 1469, master: 127.0.0.1:16020
2022-09-01  9:19:29 252 [Note] Slave I/O thread exiting, read up to log 'master-bin.000001', position 1469, master 127.0.0.1:16020
2022-09-01  9:19:30 254 [Note] Master connection name: ''  Master_info_file: 'master.info'  Relay_info_file: 'relay-log.info'
2022-09-01  9:19:30 254 [Note] 'CHANGE MASTER TO executed'. Previous state master_host='127.0.0.1', master_port='16020', master_log_file='master-bin.000001', master_log_pos='1469'. New state master_host='127.0.0.1', master_port='16020', master_log_file='master-bin.000001', master_log_pos='1469'.
2022-09-01  9:19:30 254 [Note] Previous Using_Gtid=No. New Using_Gtid=Slave_Pos
2022-09-01  9:19:34 256 [Note] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16020' in log 'master-bin.000001' at position 1469
2022-09-01  9:19:34 256 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16020',replication starts at GTID position '0-1-4'
2022-09-01  9:19:34 257 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 1469, relay log './slave-relay-bin.000001' position: 4; GTID position '0-1-4'
mariadbd: /home/buildbot/buildbot/build/mariadb-10.6.10/sql/handler.cc:2148: int ha_rollback_trans(THD*, bool): Assertion `!(thd->rgi_slave && thd->rgi_slave->did_mark_start_commit) || thd->transaction->xid_state.is_explicit_XA()' failed.
220901  9:19:36 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.6.10-MariaDB-debug-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=4
max_threads=153
thread_count=14
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63859 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x62b000c9b288
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fd920e95b50 thread_stack 0x100000
??:0(__interceptor_tcgetattr)[0x7fd934e22d30]
/home/buildbot/buildbot/build/mariadb-10.6.10/sql/mariadbd(my_print_stacktrace+0xec)[0x55867c9f1197]
mysys/stacktrace.c:212(my_print_stacktrace)[0x55867b4bb54a]
??:0(__restore_rt)[0x7fd934bd83c0]
??:0(gsignal)[0x7fd9346c418b]
??:0(abort)[0x7fd9346a3859]
/lib/x86_64-linux-gnu/libc.so.6(+0x25729)[0x7fd9346a3729]
??:0(__assert_fail)[0x7fd9346b4f36]
sql/handler.cc:2150(ha_rollback_trans(THD*, bool))[0x55867b4cc88c]
sql/transaction.cc:536(trans_rollback_stmt(THD*))[0x55867b0ec185]
sql/sql_base.cc:5330(open_and_lock_tables(THD*, DDL_options_st const&, TABLE_LIST*, bool, unsigned int, Prelocking_strategy*))[0x55867aab3d0f]
sql/sql_base.h:509(open_and_lock_tables(THD*, TABLE_LIST*, bool, unsigned int))[0x55867aa09ca6]
sql/rpl_gtid.cc:674(rpl_slave_state::record_gtid(THD*, rpl_gtid const*, unsigned long long, bool, bool, void**))[0x55867b20fb16]
sql/log_event_server.cc:3948(Xid_apply_log_event::do_record_gtid(THD*, rpl_group_info*, bool, void**))[0x55867b8525e2]
sql/log_event_server.cc:4038(Xid_apply_log_event::do_apply_event(rpl_group_info*))[0x55867b852fb8]
sql/log_event.h:1499(Log_event::apply_event(rpl_group_info*))[0x55867a9d2f1d]
sql/slave.cc:3881(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x55867a9b31ce]
sql/slave.cc:4068(apply_event_and_update_pos_for_parallel(Log_event*, THD*, rpl_group_info*))[0x55867a9b3f5a]
sql/rpl_parallel.cc:62(rpt_handle_event(rpl_parallel_thread::queued_event*, rpl_parallel_thread*))[0x55867b21f160]
sql/rpl_parallel.cc:1378(handle_rpl_parallel_thread)[0x55867b226dd8]
perfschema/pfs.cc:2203(pfs_spawn_thread)[0x55867bdda8b2]
??:0(start_thread)[0x7fd934bcc609]
??:0(clone)[0x7fd9347a0293]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x0): (null)
Connection ID (thread ID): 263
Status: KILL_CONNECTION
 
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on,not_null_range_scan=off
 
The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
information that should help you find out what is causing the crash.
 
We think the query pointer is invalid, but we will try to print it anyway. 
Query: 
 
Writing a core file...
Working directory at /dev/shm/var/2/mysqld.2/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             31505                31505                processes 
Max open files            1024                 1024                 files     
Max locked memory         67108864             67108864             bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       31505                31505                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/share/apport/apport %p %s %c %d %P %E
 
Kernel version: Linux version 5.4.0-26-generic (buildd@lcy01-amd64-029) (gcc version 9.3.0 (Ubuntu 9.3.0-10ubuntu2)) #30-Ubuntu SMP Mon Apr 20 16:58:30 UTC 2020
 
----------SERVER LOG END-------------
 
 
 - found 'core' (0/0)
Core generated by '/home/buildbot/buildbot/build/mariadb-10.6.10/sql/mariadbd'
Output from gdb follows. The first stack trace is from the failing thread.
The following stack traces are from all threads (so the failing one is
duplicated).
--------------------------
[New LWP 43032]
[New LWP 41796]
[New LWP 41844]
[New LWP 43029]
[New LWP 41808]
[New LWP 41835]
[New LWP 41834]
[New LWP 41814]
[New LWP 41837]
[New LWP 41845]
[New LWP 42595]
[New LWP 43022]
[New LWP 43024]
[New LWP 43028]
[New LWP 43025]
[New LWP 43034]
[New LWP 43023]
[New LWP 43021]
[New LWP 42598]
[New LWP 41843]
[New LWP 41841]
[New LWP 41812]
[New LWP 41828]
[New LWP 41838]
[New LWP 41806]
[New LWP 41836]
[New LWP 41839]
[New LWP 41840]
[New LWP 41842]
[New LWP 41853]
[New LWP 41863]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/home/buildbot/buildbot/build/mariadb-10.6.10/sql/mariadbd --defaults-group-suf'.
Program terminated with signal SIGABRT, Aborted.
#0  0x00007fd934bd4f08 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
[Current thread is 1 (Thread 0x7fd920e96700 (LWP 43032))]
#0  0x00007fd934bd4f08 in pthread_kill () from /lib/x86_64-linux-gnu/libpthread.so.0
#1  0x000055867c9f1319 in my_write_core (sig=6) at /home/buildbot/buildbot/build/mariadb-10.6.10/mysys/stacktrace.c:424
#2  0x000055867b4bba8a in handle_fatal_signal (sig=6) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/signal_handler.cc:355
#3  <signal handler called>
#4  0x00007fd9346c418b in raise () from /lib/x86_64-linux-gnu/libc.so.6
#5  0x00007fd9346a3859 in abort () from /lib/x86_64-linux-gnu/libc.so.6
#6  0x00007fd9346a3729 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#7  0x00007fd9346b4f36 in __assert_fail () from /lib/x86_64-linux-gnu/libc.so.6
#8  0x000055867b4cc88c in ha_rollback_trans (thd=0x62b000c9b288, all=false) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/handler.cc:2148
#9  0x000055867b0ec185 in trans_rollback_stmt (thd=0x62b000c9b288) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/transaction.cc:535
#10 0x000055867aab3d0f in open_and_lock_tables (thd=0x62b000c9b288, options=..., tables=0x7fd920e94fa0, derived=false, flags=0, prelocking_strategy=0x7fd920e94d30) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/sql_base.cc:5329
#11 0x000055867aa09ca6 in open_and_lock_tables (thd=0x62b000c9b288, tables=0x7fd920e94fa0, derived=false, flags=0) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/sql_base.h:509
#12 0x000055867b20fb16 in rpl_slave_state::record_gtid (this=0x613000000900, thd=0x62b000c9b288, gtid=0x61d0010be4a0, sub_id=325, in_transaction=false, in_statement=false, out_hton=0x7fd920e95830) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/rpl_gtid.cc:674
#13 0x000055867b8525e2 in Xid_apply_log_event::do_record_gtid (this=0x619000296908, thd=0x62b000c9b288, rgi=0x61d0010be480, in_trans=false, out_hton=0x7fd920e95830) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/log_event_server.cc:3948
#14 0x000055867b852fb8 in Xid_apply_log_event::do_apply_event (this=0x619000296908, rgi=0x61d0010be480) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/log_event_server.cc:4038
#15 0x000055867a9d2f1d in Log_event::apply_event (this=0x619000296908, rgi=0x61d0010be480) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/log_event.h:1499
#16 0x000055867a9b31ce in apply_event_and_update_pos_apply (ev=0x619000296908, thd=0x62b000c9b288, rgi=0x61d0010be480, reason=0) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/slave.cc:3881
#17 0x000055867a9b3f5a in apply_event_and_update_pos_for_parallel (ev=0x619000296908, thd=0x62b000c9b288, rgi=0x61d0010be480) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/slave.cc:4067
#18 0x000055867b21f160 in rpt_handle_event (qev=0x61a000390708, rpt=0x6270000414e8) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/rpl_parallel.cc:62
#19 0x000055867b226dd8 in handle_rpl_parallel_thread (arg=0x6270000414e8) at /home/buildbot/buildbot/build/mariadb-10.6.10/sql/rpl_parallel.cc:1378
#20 0x000055867bdda8b2 in pfs_spawn_thread (arg=0x61800011ad08) at /home/buildbot/buildbot/build/mariadb-10.6.10/storage/perfschema/pfs.cc:2201
#21 0x00007fd934bcc609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#22 0x00007fd9347a0293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

Logs and var dir at: https://drive.google.com/drive/folders/1lMToawOczyG-bzjnjYJW-3jVU9JE3pWn?usp=sharing


Generated at Thu Feb 08 10:09:28 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.