Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25104

Galera crashes with "FSM: no such a transition COMMITTED -> CERTIFYING"

    XMLWordPrintable

    Details

      Description

      After successful SST, about 15 minutes later a node crashed with:

      2021-03-10 10:31:11 304 [ERROR] WSREP: Internal library error: unexpected trx release state: source: 7d566fdd-3b90-11eb-81ce-0601934ff1f0 version: 5 local: 1 flags: 1 conn_id: 276 trx_id: 377455 tstamp: 1615361471363320381; state: EXECUTING:0->REPLICATING:661->CERTIFYING:3216 (FATAL)
      	 at galera/src/wsrep_provider.cpp:galera_release():930
      2021-03-10 10:31:11 276 [ERROR] WSREP: FSM: no such a transition COMMITTED -> CERTIFYING
      210310 10:31:11 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed, 
      something is definitely wrong and this may fail.
       
      Server version: 10.4.14-MariaDB
      key_buffer_size=134217728
      read_buffer_size=262144
      max_used_connections=30
      max_threads=1002
      thread_count=97
      It is possible that mysqld could use up to 
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2464400 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x7e77780009a8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      terminate called after throwing an instance of 'gu::Exception'
      stack_bottom = 0x7e7a8c521cf0 thread_stack 0x40000
        what():  gu_mutex_destroy(): 16 (Device or resource busy)
      	 at galerautils/src/gu_mutex.hpp:~Mutex():44
      

      This looks similar to MDEV-17262 and MDEV-17245, but the actual

      {COMMITTED -> CERTIFYING}

      transition is a different one that I can't any other prior report for.

        Attachments

          Issue Links

            Activity

              People

              Assignee:
              jplindst Jan Lindström
              Reporter:
              hholzgra Hartmut Holzgraefe
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration