[MDEV-21597] MariaDB standalone server replication to MariaDB Galera Cluster crashing Created: 2020-01-30  Updated: 2022-10-27  Resolved: 2022-10-27

Status: Closed
Project: MariaDB Server
Component/s: Galera, Replication, Tests
Affects Version/s: 10.4.12
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Eric Ang Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: Galera, crash, replication
Environment:

CentOS 7 (64bit), CentOS 8 (64bit)


Issue Links:
Relates
relates to MDEV-21025 Galera: Server 10.4 crashes with sign... Closed

 Description   

Have setup the following 3 servers
Server A: MariaDB Galera Cluster
Server B: MariaDB Galera Cluster
Server C: MariaDB Standalone Server

Server A is replicating to Server C
Server C is replicating back to Server B.

Data updates done to Server A and Server B are fine but when updates are done to Server C, Server B will crash with the following information:

2020-01-30 11:23:04 24 [Note] Slave SQL thread initialized, starting replication in log 'mysql-bin.000019' at position 259047, relay log './unicorn-relay-bin.000001' position: 4; GTID position '0-115-13488787,1-8-1419'
2020-01-30 11:23:04 23 [Note] Slave I/O thread: connected to master 'repl_user@sunny:3306',replication starts at GTID position '1-8-1419,0-115-13488787'

mysqld: /home/buildbot/buildbot/padding_for_CPACK_RPM_BUILD_SOURCE_DIRS_PREFIX/mariadb-10.4.12/wsrep-lib/src/transaction.cpp:123: int wsrep::transaction::start_transaction(const wsrep::transaction_id&): Assertion `active() == false' failed.
200130 11:23:55 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.4.12-MariaDB-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=3
max_threads=2002
thread_count=14
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 4536823 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7faa30001378
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7faa7168adc0 thread_stack 0x49000
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55d110522a1e]
/usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x55d10ffb6d8f]
sigaction.c:0(__restore_rt)[0x7fab788765f0]
:0(__GI_raise)[0x7fab76b47337]
:0(__GI_abort)[0x7fab76b48a28]
:0(__assert_fail_base)[0x7fab76b40156]
:0(_GI__assert_fail)[0x7fab76b40202]
/usr/sbin/mysqld(+0xebce9e)[0x55d1105b5e9e]
/usr/sbin/mysqld(_Z11trans_beginP3THDj+0x2eb)[0x55d10fe9aaeb]
/usr/sbin/mysqld(_ZN14Gtid_log_event14do_apply_eventEP14rpl_group_info+0x14e)[0x55d1100af01e]
/usr/sbin/mysqld(+0x5fe560)[0x55d10fcf7560]
/usr/sbin/mysqld(handle_slave_sql+0x1ba2)[0x55d10fd01152]
pthread_create.c:0(start_thread)[0x7fab7886ee65]
/lib64/libc.so.6(clone+0x6d)[0x7fab76c0f88d]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x55d110ed7820): BEGIN
Connection ID (thread ID): 24
Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=on,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on,condition_pushdown_for_subquery=on,rowid_filter=on,condition_pushdown_from_having=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /var/lib/mysql
Resource Limits:
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 31098 31098 processes
Max open files 16364 16364 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 31098 31098 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
Core pattern: core

2020-01-30 11:24:03 0 [Note] WSREP: Loading provider /usr/lib64/galera-4/libgalera_smm.so initial position: e171100d-322d-11e8-9957-624639ca8561:21127560



 Comments   
Comment by Eric Ang [ 2020-01-31 ]

Hi!

After more testing, I've found the steps to reproduce this issue.

In Server B, set the following
replicate_ignore_table=eric.test

In Server C, run the following query
DELETE FROM eric.test LIMIT 400;

Server B will then crash.

To summarize, Server A and Server B are setup as Galera Cluster while Server C is a standalone MariaDB server.
Server A is replicating to Server C and Server C is replicating back to Server B.

Comment by Eric Ang [ 2020-02-07 ]

Hi! Any updates regarding this issue?

Comment by Marko Mäkelä [ 2021-03-19 ]

I got a test failure locally on 10.4, with the same assertion expression:

galera.galera_bf_lock_wait 'innodb'      w57 [ fail ]
        Test ended at 2021-03-19 11:21:29
 
CURRENT_TEST: galera.galera_bf_lock_wait
mysqltest: At line 59: query 'do sleep($sleep_period)' failed: 2013: Lost connection to MySQL server during query
mysqld: /mariadb/10.4/wsrep-lib/src/transaction.cpp:123: int wsrep::transaction::start_transaction(const wsrep::transaction_id &): Assertion `active() == false' failed.
#7  0x00005604d6c942e6 in wsrep::transaction::start_transaction (this=0x7f5440008090, id=@0x7f54cc1acca8: {id_ = 5336}) at /mariadb/10.4/wsrep-lib/src/transaction.cpp:123
#8  0x00005604d6232fdc in wsrep::client_state::start_transaction (this=<optimized out>, this@entry=0x7f5440008028, id=@0x7f54cc1acca8: {id_ = 5336}) at /mariadb/10.4/wsrep-lib/include/wsrep/client_state.hpp:321
#9  0x00005604d642d743 in wsrep_start_transaction (thd=0x7f5440001e58, trx_id=0) at /mariadb/10.4/sql/wsrep_trans_observer.h:138
#10 wsrep_bf_abort (bf_thd=bf_thd@entry=0x7f54bc000d28, victim_thd=victim_thd@entry=0x7f5440001e58) at /mariadb/10.4/sql/wsrep_thd.cc:352
#11 0x00005604d64345ac in wsrep_thd_bf_abort (bf_thd=0x7f54bc000d28, victim_thd=0x7f5440001e58, signal=1 '\001') at /mariadb/10.4/sql/service_wsrep.cc:215
#12 0x00005604d66cbc9a in bg_wsrep_kill_trx (void_arg=0x7f54bc044790) at /mariadb/10.4/storage/innobase/handler/ha_innodb.cc:18779
#13 0x00005604d62586ce in handle_manager (arg=<optimized out>) at /mariadb/10.4/sql/sql_manager.cc:112
#14 0x00005604d6bcf016 in pfs_spawn_thread (arg=0x5604d9edc788) at /mariadb/10.4/storage/perfschema/pfs.cc:1869
#15 0x00007f54d9bfcea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#16 0x00007f54d924edef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

I will disable that test with a reference to this ticket. As part of the fix, the test must be re-enabled.

Generated at Thu Feb 08 09:08:21 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.