[MDEV-29928] Assertion `!(old_n_ref >> 31)' failed seen on BB for rpl.rpl_parallel_optimistic_xa Created: 2022-11-01  Updated: 2022-11-01  Resolved: 2022-11-01

Status: Closed
Project: MariaDB Server
Component/s: Replication, Storage Engine - InnoDB, Tests
Affects Version/s: 10.6
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Angelique Sklavounos (Inactive) Assignee: Marko Mäkelä
Resolution: Fixed Votes: 0
Labels: None


 Description   

This failure has been seen a few times recently since https://buildbot.askmonty.org/buildbot/builders/kvm-ubsan/builds/3149 (based on a search of set_skip_lock_inheritance in Failure Output)

https://es-jenkins.mariadb.net/job/Source-Build-Test/6059/

10.6 18a0f0c17

rpl.rpl_parallel_optimistic_xa 'innodb,row' w1 [ fail ]
        Test ended at 2022-10-29 00:36:34
 
CURRENT_TEST: rpl.rpl_parallel_optimistic_xa
mysqltest: In included file "./include/sync_with_master_gtid.inc": 
included from /home/jenkins/workspace/Source-Build-Test/src/mysql-test/suite/rpl/t/rpl_parallel_optimistic_xa.test at line 194:
At line 43: query 'let $_result= `SELECT master_gtid_wait('$master_pos', $_slave_timeout)`' failed: <Unknown> (2013): Lost connection to server during query
 
The result from queries just before the failure was:
< snip >
SET @@global.slave_parallel_threads  = 7;
SET @old_parallel_mode               = @@GLOBAL.slave_parallel_mode;
SET @@global.slave_parallel_mode     ='optimistic';
SET @old_gtid_cleanup_batch_size     = @@GLOBAL.gtid_cleanup_batch_size;
SET @@global.gtid_cleanup_batch_size = 1000000;
CHANGE MASTER TO master_use_gtid=slave_pos;
connection master;
CREATE TABLE t0 (a int,             b INT) ENGINE=InnoDB;
CREATE TABLE t1 (a int PRIMARY KEY, b INT) ENGINE=InnoDB;
INSERT INTO t1 VALUES (1, 0);
include/save_master_gtid.inc
connection slave;
include/start_slave.inc
include/sync_with_master_gtid.inc
include/stop_slave.inc
connection master;
include/save_master_gtid.inc
connection slave;
include/start_slave.inc
include/sync_with_master_gtid.inc

worker[1] > Restart [mysqld.2 - pid: 134834, winpid: 134834] - running with different options '--innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --binlog-format=row --log-bin=slave-bin' != '--binlog-format=row --innodb --innodb-cmpmem --innodb-cmp-per-index --innodb-trx --innodb-locks --innodb-lock-waits --innodb-metrics --innodb-buffer-pool-stats --innodb-buffer-page --innodb-buffer-page-lru --innodb-sys-columns --innodb-sys-fields --innodb-sys-foreign --innodb-sys-foreign-cols --innodb-sys-indexes --innodb-sys-tables --innodb-sys-virtual --log-bin=slave-bin'
ote] Slave I/O thread: Start asynchronous replication to master 'root@127.0.0.1:16020' in log 'master-bin.000001' at position 38458
2022-10-29  0:36:31 20 [Note] Slave I/O thread: connected to master 'root@127.0.0.1:16020',replication starts at GTID position '0-1-139'
2022-10-29  0:36:31 21 [Note] Slave SQL thread initialized, starting replication in log 'master-bin.000001' at position 38458, relay log './slave-relay-bin.000001' position: 4; GTID position '0-1-139'
mariadbd: /home/jenkins/workspace/Source-Build-Test/src/storage/innobase/include/trx0trx.h:1019: void trx_t::set_skip_lock_inheritance(): Assertion `!(old_n_ref >> 31)' failed.
221029  0:36:32 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.6.11-MariaDB-debug-log
key_buffer_size=1048576
read_buffer_size=131072
max_used_connections=4
max_threads=153
thread_count=13
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 63866 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x62b0001ff218
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f0b4a817b10 thread_stack 0x100000
??:0(__interceptor_tcgetattr)[0x7f0b5c187d30]
mysys/stacktrace.c:212(my_print_stacktrace)[0x561e861711c3]
sql/signal_handler.cc:236(handle_fatal_signal)[0x561e84c32fc8]
sigaction.c:0(__restore_rt)[0x7f0b5bfa33c0]
??:0(gsignal)[0x7f0b5ba8e18b]
??:0(abort)[0x7f0b5ba6d859]
/lib/x86_64-linux-gnu/libc.so.6(+0x25729)[0x7f0b5ba6d729]
??:0(__assert_fail)[0x7f0b5ba7ef36]
include/trx0trx.h:1020(trx_t::set_skip_lock_inheritance())[0x561e8590748e]
lock/lock0lock.cc:4120(lock_release_on_prepare_try(trx_t*))[0x561e858f5b56]
lock/lock0lock.cc:4128(lock_release_on_prepare(trx_t*))[0x561e858f5c26]
trx/trx0trx.cc:1908(trx_prepare(trx_t*))[0x561e85c66e7e]
trx/trx0trx.cc:1920(trx_prepare_for_mysql(trx_t*))[0x561e85c67019]
handler/ha_innodb.cc:17035(innobase_xa_prepare(handlerton*, THD*, bool))[0x561e85783956]
sql/handler.cc:1436(prepare_or_error(handlerton*, THD*, bool))[0x561e84c3fb83]
sql/handler.cc:1474(ha_prepare(THD*))[0x561e84c3fe8f]
sql/xa.cc:531(trans_xa_prepare(THD*))[0x561e84a9c71b]
sql/log_event_server.cc:4150(XA_prepare_log_event::do_commit())[0x561e84fcc4de]
sql/log_event_server.cc:4035(Xid_apply_log_event::do_apply_event(rpl_group_info*))[0x561e84fcb10f]
sql/log_event.h:1499(Log_event::apply_event(rpl_group_info*))[0x561e84146fbf]
sql/slave.cc:3883(apply_event_and_update_pos_apply(Log_event*, THD*, rpl_group_info*, int))[0x561e8412721e]
sql/slave.cc:4070(apply_event_and_update_pos_for_parallel(Log_event*, THD*, rpl_group_info*))[0x561e84127fc1]
sql/rpl_parallel.cc:62(rpt_handle_event(rpl_parallel_thread::queued_event*, rpl_parallel_thread*))[0x561e849975a6]
sql/rpl_parallel.cc:1384(handle_rpl_parallel_thread)[0x561e8499f232]
perfschema/pfs.cc:2203(pfs_spawn_thread)[0x561e85553078]
nptl/pthread_create.c:478(start_thread)[0x7f0b5bf97609]
??:0(clone)[0x7f0b5bb6a293]

Also seen with rpl.rpl_parallel_optimistic_xa_lsu_off.

Reproducible with ./mtr rpl.rpl_parallel_optimistic_xa,row --repeat 200.



 Comments   
Comment by Marko Mäkelä [ 2022-11-01 ]

This bug was already fixed in a release branch. It was not pushed to the 10.6 branch due to instructions given by sanja (he did not want the current head of 10.6 commit 18a0f0c17839e3662ac86a22975ae45886ae4cf6 to be included in the upcoming releases).

git log 18a0f0c17..origin/bb-10.6-release storage/innobase

The commit hash may change before the release, but a fix was applied:

commit 2f421688c66ca8c74b1812eaf4c04dac6f71ae0b
Author: Vlad Lesin <vlad_lesin@mail.ru>
Date:   Thu Oct 27 23:35:34 2022 +0300
 
    MDEV-28709 unexpected X lock on Supremum in READ COMMITTED
    
    Post-push fix. The flag of transaction which indicates that it's necessary
    to forbid gap lock inheritance after XA PREPARE could be inverted if
    lock_release_on_prepare_try() is invoked several times. The fix is to
    toggle it on lock_release_on_prepare() exit.

Generated at Thu Feb 08 10:12:20 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.