[MDEV-22731] Galera: RQG: Assertion `trx_map_.size() == 0' failed in Wsdb on master node shutting down Created: 2020-05-27  Updated: 2023-06-06  Resolved: 2023-06-06

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.2.32, 10.3.24
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Stepan Patryshev (Inactive) Assignee: Seppo Jaakola
Resolution: Won't Fix Votes: 1
Labels: galera, rqg
Environment:

Galera 25.3.29(rb0f34b0) debug build; CentOS 7.


Attachments: Zip Archive 200526_RQG_crash_10.3.zip     File galera_stress.yy     File galera_stress.zz     File mysql0_200526_10.3.err    
Issue Links:
Relates
relates to MDEV-20914 Crash on galera.MW-328E Closed
relates to MDEV-21459 non-Galera deadlock causes leaked WSR... Closed

 Description   

Master node crashes with signal 6 and assertion `trx_map_.size() == 0' failed in Wsdb on master node shutting down using RQG. It is reproduced on 10.2/10.3 CS/ES.

Maybe, it is related to MDEV-21459 and MDEV-20914 since they have the same assertion.

RQG: Repository: MariaDB/randgen; master branch; Revision a11a5f60eff28214edd90b6994fd6c85e76ea256, used galera_stress.yy and galera_stress.zz.

Run:

perl ./runall-new.pl --grammar=conf/galera/galera_stress.yy --gendata=conf/galera/galera_stress.zz --duration=3600 --queries=200M --threads=8 --galera=mss --basedir=/home/stepan/mariadb/10.3 --vardir=/home/stepan/rqg/var "--mysqld=--wsrep-provider=/usr/lib/libgalera_smm_3.so" "--mysqld=--wsrep_sst_method=rsync" "--mysqld=--core" "--mysqld=--general-log" "--mysqld=--general-log-file=queries.log" "--mysqld=--log-output=file" "--mysqld=--wsrep-debug=none" "--mysqld=--wsrep-sync-wait=15" "--mysqld=--wsrep_retry_autocommit=0" "--mysqld=--wsrep_log_conflicts=1" "--mysqld=--wsrep_on=ON" "--mysqld=--default-storage-engine=innodb" "--mysqld=--sort_buffer_size=200M" "--mysqld=--innodb-autoinc-lock-mode=2" "--mysqld=--innodb-lock-wait-timeout=3"

Master node 0 server log:

10.3.24, 7476e8c7cdd73d60294126a2840baee97e7644b6, debug build

2020-05-26 13:34:14 0 [Note] WSREP: MemPool(LocalTrxHandle): hit ratio: 0.95857, misses: 5242, in use: 5109, in pool: 133
2020-05-26 13:34:14 0 [Note] WSREP: trx map:
7521 source: cf6d13ca-9f33-11ea-9bf2-1a7971187bca version: 4 local: 1 state: EXECUTING flags: 0 conn_id: -1 trx_id: 7521 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1590485833921506963)
217998 source: cf6d13ca-9f33-11ea-9bf2-1a7971187bca version: 4 local: 1 state: EXECUTING flags: 0 conn_id: -1 trx_id: 217998 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1590487397827572818)
338271 source: cf6d13ca-9f33-11ea-9bf2-1a7971187bca version: 4 local: 1 state: EXECUTING flags: 0 conn_id: -1 trx_id: 338271 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1590488300661635019)
112761 source: cf6d13ca-9f33-11ea-9bf2-1a7971187bca version: 4 local: 1 state: EXECUTING flags: 0 conn_id: -1 trx_id: 112761 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1590486634320976387)
7527 source: cf6d13ca-9f33-11ea-9bf2-1a7971187bca version: 4 local: 1 state: EXECUTING flags: 0 conn_id: -1 trx_id: 7527 seqnos (l: -1, g: -1, s: -1, d: -1, ts: 1590485833911239520)
375863 source: cf6d13ca-9f33-11ea-9bf2-1a7971187bca version: 4 local: 1 state: EXECUT
mysqld: galera/src/wsdb.cpp:54: galera::Wsdb::~Wsdb(): Assertion `trx_map_.size() == 0' failed.
200526 13:34:14 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.3.24-MariaDB-debug-log
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=9
max_threads=153
thread_count=0
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 31488565 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x49000
/home/stepan/mariadb/10.3/sql/mysqld(my_print_stacktrace+0x40)[0x5637b19128a4]
mysys/stacktrace.c:269(my_print_stacktrace)[0x5637b109f6d7]
sigaction.c:0(__restore_rt)[0x7f81e3cef5d0]
:0(__GI_raise)[0x7f81e1fdd207]
:0(__GI_abort)[0x7f81e1fde8f8]
:0(__assert_fail_base)[0x7f81e1fd6026]
:0(__GI___assert_fail)[0x7f81e1fd60d2]
src/wsdb.cpp:54(galera::Wsdb::~Wsdb())[0x7f81dfe59689]
src/replicator_smm.cpp:259(galera::ReplicatorSMM::~ReplicatorSMM())[0x7f81dfe85dec]
src/replicator_smm.cpp:279(galera::ReplicatorSMM::~ReplicatorSMM())[0x7f81dfe860d9]
src/wsrep_provider.cpp:104(galera_tear_down)[0x7f81dfe9288b]
wsrep/wsrep_loader.c:221(wsrep_unload)[0x5637b19a5ec8]
sql/wsrep_mysqld.cc:878(wsrep_deinit(bool))[0x5637b0fb08d4]
sql/mysqld.cc:2039(kill_server(void*))[0x5637b0c48f53]
sql/mysqld.cc:2068(kill_server_thread)[0x5637b0c48fb6]
pthread_create.c:0(start_thread)[0x7f81e3ce7dd5]
/lib64/libc.so.6(clone+0x6d)[0x7f81e20a4ead]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
Writing a core file...
Working directory at /home/stepan/rqg/var/node0/data
Resource Limits:
Limit                     Soft Limit           Hard Limit           Units     
Max cpu time              unlimited            unlimited            seconds   
Max file size             unlimited            unlimited            bytes     
Max data size             unlimited            unlimited            bytes     
Max stack size            8388608              unlimited            bytes     
Max core file size        unlimited            unlimited            bytes     
Max resident set          unlimited            unlimited            bytes     
Max processes             4096                 23005                processes 
Max open files            1024                 4096                 files     
Max locked memory         65536                65536                bytes     
Max address space         unlimited            unlimited            bytes     
Max file locks            unlimited            unlimited            locks     
Max pending signals       23005                23005                signals   
Max msgqueue size         819200               819200               bytes     
Max nice priority         0                    0                    
Max realtime priority     0                    0                    
Max realtime timeout      unlimited            unlimited            us        
Core pattern: |/usr/libexec/abrt-hook-ccpp %s %c %p %u %g %t e %P %I %h

All server logs and RQG output.



 Comments   
Comment by Jan Lindström [ 2023-06-06 ]

10.2 is EOL and 10.3 will be EOL soon.

Generated at Thu Feb 08 09:17:02 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.