Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Cannot Reproduce
    • 10.2.21, 10.3.25
    • N/A
    • Galera, wsrep
    • Docker version 18.09.1, build 4c52b90
      Container version: 10.2.21-MariaDB-1:10.2.21

    Description

      We have recently updated a mariadb 10.2.17 to 10.2.21.
      The update was very smooth without any errors. At least once a day since we did the minor version update we have a crash. The crash happens mostly on our main write node, the other two are read nodes.

      The differences in the logs, i see with the new version is this Warning multiple times throughout the day:

      {"log":"2019-01-29  0:14:00 139950395401984 [Warning] WSREP: SQL statement was ineffective  thd: 2929856  buf: 226\n","stream":"stderr","time":"2019-01-28T22:14:00.682735534Z"}
      {"log":"schema: xxxxxx \n","stream":"stderr","time":"2019-01-28T22:14:00.682766101Z"}
      {"log":"QUERY: commit\n","stream":"stderr","time":"2019-01-28T22:14:00.682771205Z"}
      {"log":" =\u003e Skipping replication\n","stream":"stderr","time":"2019-01-28T22:14:00.68277568Z"}
      

      Now the crash happens after multiple of these warnings in this manner:

      {"log":"2019-01-29  0:14:00 139950395401984 [Warning] WSREP: SQL statement was ineffective  thd: 2929856  buf: 226\n","stream":"stderr","time":"2019-01-28T22:14:00.682735534Z"}
      {"log":"schema: XXXXXX \n","stream":"stderr","time":"2019-01-28T22:14:00.682766101Z"}
      {"log":"QUERY: commit\n","stream":"stderr","time":"2019-01-28T22:14:00.682771205Z"}
      {"log":" =\u003e Skipping replication\n","stream":"stderr","time":"2019-01-28T22:14:00.68277568Z"}
      {"log":"2019-01-29  0:14:00 139950400808704 [Warning] WSREP: SQL statement was ineffective  thd: 2929858  buf: 226\n","stream":"stderr","time":"2019-01-28T22:14:00.692498864Z"}
      {"log":"schema: XXXXXX \n","stream":"stderr","time":"2019-01-28T22:14:00.692525167Z"}
      {"log":"QUERY: commit\n","stream":"stderr","time":"2019-01-28T22:14:00.692534055Z"}
      {"log":" =\u003e Skipping replication\n","stream":"stderr","time":"2019-01-28T22:14:00.692541387Z"}
      {"log":"2019-01-29  0:15:03 139950402430720 [Warning] WSREP: SQL statement was ineffective  thd: 2931591  buf: 214\n","stream":"stderr","time":"2019-01-28T22:15:03.76604226Z"}
      {"log":"schema: XXXXXX \n","stream":"stderr","time":"2019-01-28T22:15:03.766070584Z"}
      {"log":"QUERY: commit\n","stream":"stderr","time":"2019-01-28T22:15:03.766076151Z"}
      {"log":" =\u003e Skipping replication\n","stream":"stderr","time":"2019-01-28T22:15:03.76608061Z"}
      {"log":"2019-01-29  0:15:03 139950402430720 [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -\u003e ROLLED_BACK\n","stream":"stderr","time":"2019-01-28T22:15:03.766085945Z"}
      {"log":"190129  0:15:03 [ERROR] mysqld got signal 6 ;\n","stream":"stderr","time":"2019-01-28T22:15:03.766242364Z"}
      

      After these error the line before the crash is :

      {"log":"Fatal signal 11 while backtracing\n","stream":"stderr","time":"2019-01-28T22:15:14.47965349Z"}
      

      The node later recovers connecting back to the cluster.

      I have looked up this error:

       WSREP: FSM: no such a transition ROLLED_BACK
      

      in issue: https://mariadb.atlassian.net/browse/MDEV-7217
      It seems pretty old and i shouldn't have it in this up to date version of mariadb. Any suggestions are welcome. Any more information i would be glad to provide.

      Attachments

        Activity

          Please provide the complete error log.

          elenst Elena Stepanova added a comment - Please provide the complete error log.

          please see attachment

          gkaragiorgi george koumantaris added a comment - please see attachment

          Hello,

          Just want to mention that this happens regularly. The errors are exactly the same.

          The same error:

          [ERROR] WSREP: FSM: no such a transition ROLLED_BACK

          followed by :

          [ERROR] mysqld got signal 6

          Other than that, its the multiple warnings:

          [Warning] WSREP: SQL statement was ineffective

          gkaragiorgi george koumantaris added a comment - Hello, Just want to mention that this happens regularly. The errors are exactly the same. The same error: [ERROR] WSREP: FSM: no such a transition ROLLED_BACK followed by : [ERROR] mysqld got signal 6 Other than that, its the multiple warnings: [Warning] WSREP: SQL statement was ineffective

          We have similar error on 10.2.25.

          Some warnings [Warning] WSREP: SQL statement was ineffective QUERY: commit and after that [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK.

          Log in attachement.

          mariadb.10.2.25.log

          Middlegear Sergey Sokolov added a comment - We have similar error on 10.2.25. Some warnings [Warning] WSREP: SQL statement was ineffective QUERY: commit and after that [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK. Log in attachement. mariadb.10.2.25.log

          I suggest you upgrade to more recent version of MariaDB 10.2 and galera library and retry. If this still repeats, I would need some instructions how to reproduce.

          jplindst Jan Lindström (Inactive) added a comment - I suggest you upgrade to more recent version of MariaDB 10.2 and galera library and retry. If this still repeats, I would need some instructions how to reproduce.

          About month ago we try upgrade production environment to 10.2.36, but other problems forced us to roll back the version to 10.2.25. Ok, i'll try to reproduce problem in test environment and latest version of 10.2.

          Middlegear Sergey Sokolov added a comment - About month ago we try upgrade production environment to 10.2.36, but other problems forced us to roll back the version to 10.2.25. Ok, i'll try to reproduce problem in test environment and latest version of 10.2.

          Seing something similar on 10.3.25

          2021-05-03 14:04:39 1669694 [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK
          210503 14:04:39 [ERROR] mysqld got signal 6 ;
          This could be because you hit a bug. It is also possible that this binary
          or one of the libraries it was linked against is corrupt, improperly built,
          or misconfigured. This error can also be caused by malfunctioning hardware.
           
          To report this bug, see https://mariadb.com/kb/en/reporting-bugs
           
          We will try our best to scrape up some info that will hopefully help
          diagnose the problem, but since we have already crashed,
          something is definitely wrong and this may fail.
           
          Server version: 10.3.25-MariaDB-log
          key_buffer_size=134217728
          read_buffer_size=131072
          max_used_connections=701
          max_threads=602
          thread_count=382
          It is possible that mysqld could use up to
          key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1461324 K  bytes of memory
          Hope that's ok; if not, decrease some variables in the equation.
           
          Thread pointer: 0x7f310845b2a8
          Attempting backtrace. You can use the following information to find out
          where mysqld died. If you see no messages after this, something went
          terribly wrong...
          stack_bottom = 0x7f31a29bad00 thread_stack 0x49000
          /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55562a271cde]
          /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x555629d0720f]
          sigaction.c:0(__restore_rt)[0x7f497ab0d5d0]
          :0(__GI_raise)[0x7f4978de02c7]
          :0(__GI_abort)[0x7f4978de19b8]
          /usr/lib64/galera/libgalera_smm.so(+0x1a281c)[0x7f4954b2d81c]
          src/fsm.hpp:104(galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition, galera::EmptyGuard, galera::EmptyAction>::shift_to(galera::TrxHandle::State))[0x7f4954b23cc6]
          src/gu_atomic.hpp:59(gu::Atomic<long long>::operator++())[0x7f4954b34975]
          /usr/sbin/mysqld(_Z12wsrep_commitP10handlertonP3THDb+0xd2)[0x555629c72162]
          /usr/sbin/mysqld(+0x7b4475)[0x555629d08475]
          /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x4e6)[0x555629d0a906]
          /usr/sbin/mysqld(_Z12trans_commitP3THD+0x4a)[0x555629c116da]
          /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2e76)[0x555629b24396]
          /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x36d)[0x555629b2a64d]
          /usr/sbin/mysqld(+0x4db110)[0x555629a2f110]
          /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x2eac)[0x555629b2ddac]
          /usr/sbin/mysqld(_Z10do_commandP3THD+0x11b)[0x555629b2e17b]
          /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1d6)[0x555629c04a96]
          /usr/sbin/mysqld(handle_one_connection+0x3d)[0x555629c04bad]
          /usr/sbin/mysqld(+0xcce97d)[0x55562a22297d]
          pthread_create.c:0(start_thread)[0x7f497ab05dd5]
          2021-05-03 14:04:40 1669201 [Warning] Aborted connection 1669201 to db: 'catalogservice' user: 'catalogservice' host: 'maxscale01.java.jysk.netic.dk' (CLOSE_CONNECTION)
          /lib64/libc.so.6(clone+0x6d)[0x7f4978ea7f6d]
           
          Trying to get some variables.
          Some pointers may be invalid and cause the dump to abort.
          Query (0x7f310859a660): COMMIT
          Connection ID (thread ID): 1669694
          Status: NOT_KILLED
           
          Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_s
          can=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on
           
          The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains
          information that should help you find out what is causing the crash.
          Writing a core file...
          Working directory at /data/mysql/data
          Resource Limits:
          Limit                     Soft Limit           Hard Limit           Units
          Max cpu time              unlimited            unlimited            seconds
          Max file size             unlimited            unlimited            bytes
          Max data size             unlimited            unlimited            bytes
          Max stack size            8388608              unlimited            bytes
          Max core file size        0                    unlimited            bytes
          Max resident set          unlimited            unlimited            bytes
          Max processes             127405               127405               processes
          Max open files            16384                16384                files
          Max locked memory         65536                65536                bytes
          Max address space         unlimited            unlimited            bytes
          Max file locks            unlimited            unlimited            locks
          Max pending signals       127405               127405               signals
          Max msgqueue size         819200               819200               bytes
          Max nice priority         0                    0
          Max realtime priority     0                    0
          Max realtime timeout      unlimited            unlimited            us
          Core pattern: core
          

          lmk@netic.dk Lars Mikkelsen added a comment - Seing something similar on 10.3.25 2021-05-03 14:04:39 1669694 [ERROR] WSREP: FSM: no such a transition ROLLED_BACK -> ROLLED_BACK 210503 14:04:39 [ERROR] mysqld got signal 6 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.3.25-MariaDB-log key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=701 max_threads=602 thread_count=382 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1461324 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f310845b2a8 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f31a29bad00 thread_stack 0x49000 /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55562a271cde] /usr/sbin/mysqld(handle_fatal_signal+0x30f)[0x555629d0720f] sigaction.c:0(__restore_rt)[0x7f497ab0d5d0] :0(__GI_raise)[0x7f4978de02c7] :0(__GI_abort)[0x7f4978de19b8] /usr/lib64/galera/libgalera_smm.so(+0x1a281c)[0x7f4954b2d81c] src/fsm.hpp:104(galera::FSM<galera::TrxHandle::State, galera::TrxHandle::Transition, galera::EmptyGuard, galera::EmptyAction>::shift_to(galera::TrxHandle::State))[0x7f4954b23cc6] src/gu_atomic.hpp:59(gu::Atomic<long long>::operator++())[0x7f4954b34975] /usr/sbin/mysqld(_Z12wsrep_commitP10handlertonP3THDb+0xd2)[0x555629c72162] /usr/sbin/mysqld(+0x7b4475)[0x555629d08475] /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x4e6)[0x555629d0a906] /usr/sbin/mysqld(_Z12trans_commitP3THD+0x4a)[0x555629c116da] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x2e76)[0x555629b24396] /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x36d)[0x555629b2a64d] /usr/sbin/mysqld(+0x4db110)[0x555629a2f110] /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x2eac)[0x555629b2ddac] /usr/sbin/mysqld(_Z10do_commandP3THD+0x11b)[0x555629b2e17b] /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x1d6)[0x555629c04a96] /usr/sbin/mysqld(handle_one_connection+0x3d)[0x555629c04bad] /usr/sbin/mysqld(+0xcce97d)[0x55562a22297d] pthread_create.c:0(start_thread)[0x7f497ab05dd5] 2021-05-03 14:04:40 1669201 [Warning] Aborted connection 1669201 to db: 'catalogservice' user: 'catalogservice' host: 'maxscale01.java.jysk.netic.dk' (CLOSE_CONNECTION) /lib64/libc.so.6(clone+0x6d)[0x7f4978ea7f6d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f310859a660): COMMIT Connection ID (thread ID): 1669694 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_s can=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on,split_materialized=on The manual page at https://mariadb.com/kb/en/how-to-produce-a-full-stack-trace-for-mysqld/ contains information that should help you find out what is causing the crash. Writing a core file... Working directory at /data/mysql/data Resource Limits: Limit Soft Limit Hard Limit Units Max cpu time unlimited unlimited seconds Max file size unlimited unlimited bytes Max data size unlimited unlimited bytes Max stack size 8388608 unlimited bytes Max core file size 0 unlimited bytes Max resident set unlimited unlimited bytes Max processes 127405 127405 processes Max open files 16384 16384 files Max locked memory 65536 65536 bytes Max address space unlimited unlimited bytes Max file locks unlimited unlimited locks Max pending signals 127405 127405 signals Max msgqueue size 819200 819200 bytes Max nice priority 0 0 Max realtime priority 0 0 Max realtime timeout unlimited unlimited us Core pattern: core

          Hello,
          Since i have reported this bug i upgraded first to 10.4 and later on to 10.5. I did not experience any of these errors i was getting on 10.2 version. The dataset is the same and much larger.
          thank you

          gkaragiorgi george koumantaris added a comment - Hello, Since i have reported this bug i upgraded first to 10.4 and later on to 10.5. I did not experience any of these errors i was getting on 10.2 version. The dataset is the same and much larger. thank you

          People

            ramesh Ramesh Sivaraman
            gkaragiorgi george koumantaris
            Votes:
            3 Vote for this issue
            Watchers:
            9 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.