[MDEV-25104] Galera crashes with "FSM: no such a transition COMMITTED -> CERTIFYING" Created: 2021-03-10  Updated: 2021-06-14  Resolved: 2021-05-31

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4.14
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Hartmut Holzgraefe Assignee: Jan Lindström (Inactive)
Resolution: Incomplete Votes: 0
Labels: need_feedback

Issue Links:
Relates
relates to MDEV-17243 Galera Server crashes with "WSREP: FS... Closed
relates to MDEV-17262 mysql crashed on galera server durin... Closed

 Description   

After successful SST, about 15 minutes later a node crashed with:

2021-03-10 10:31:11 304 [ERROR] WSREP: Internal library error: unexpected trx release state: source: 7d566fdd-3b90-11eb-81ce-0601934ff1f0 version: 5 local: 1 flags: 1 conn_id: 276 trx_id: 377455 tstamp: 1615361471363320381; state: EXECUTING:0->REPLICATING:661->CERTIFYING:3216 (FATAL)
	 at galera/src/wsrep_provider.cpp:galera_release():930
2021-03-10 10:31:11 276 [ERROR] WSREP: FSM: no such a transition COMMITTED -> CERTIFYING
210310 10:31:11 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 10.4.14-MariaDB
key_buffer_size=134217728
read_buffer_size=262144
max_used_connections=30
max_threads=1002
thread_count=97
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2464400 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x7e77780009a8
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
terminate called after throwing an instance of 'gu::Exception'
stack_bottom = 0x7e7a8c521cf0 thread_stack 0x40000
  what():  gu_mutex_destroy(): 16 (Device or resource busy)
	 at galerautils/src/gu_mutex.hpp:~Mutex():44

This looks similar to MDEV-17262 and MDEV-17245, but the actual

{COMMITTED -> CERTIFYING}

transition is a different one that I can't any other prior report for.



 Comments   
Comment by Jan Lindström (Inactive) [ 2021-04-26 ]

Hi, any idea how to reproduce this issue? Some steps would be needed to move forward on this issue.

Generated at Thu Feb 08 09:35:10 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.