Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-17243

Galera Server crashes with "WSREP: FSM: no such a transition ABORTING -> REPLICATING" on loading data

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.2.14, 10.1(EOL)
    • 10.2.23, 10.3.14, 10.4.4
    • Galera
    • None
    • 3x Master-Master Servers ;OS Fedora 27

    Description

      Galera Server crashes with "WSREP: FSM: no such a transition ABORTING -> REPLICATING" on
      loading data

      the crash occurred on concurrent loading of several tables after interrupting the previous session, dropping database ,recreate schema and restarting load

      2018-09-19 15:27:45 139794208356096 [ERROR] WSREP: FSM: no such a transition ABORTING -> REPLICATING
      180919 15:27:45 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.

      To report this bug, see https://mariadb.com/kb/en/reporting-bugs

      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.

      Server version: 10.2.14-MariaDB
      key_buffer_size=134217728
      read_buffer_size=131072
      max_used_connections=25
      max_threads=153
      thread_count=36
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467245 K bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.

      Thread pointer: 0x7f23d40008d8
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7f2460216cd8 thread_stack 0x49000
      /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55a0f3a1818e]
      /usr/sbin/mysqld(handle_fatal_signal+0x5a3)[0x55a0f349e5f3]
      /lib64/libpthread.so.0(+0x121c0)[0x7f252a6761c0]
      /lib64/libc.so.6(gsignal+0x110)[0x7f25285a6750]
      /lib64/libc.so.6(abort+0x151)[0x7f25285a7d31]
      /usr/lib64/galera/libgalera_smm.so(ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2+0x190)[0x7f25246c6850]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM9replicateEPNS_9TrxHandleEP14wsrep_trx_meta+0x277)[0x7f25246bf457]
      /usr/lib64/galera/libgalera_smm.so(galera_pre_commit+0xb3)[0x7f25246df303]
      /usr/sbin/mysqld(wsrep_run_wsrep_commit+0x987)[0x55a0f3433147]
      /usr/sbin/mysqld(+0x5e9d73)[0x55a0f3433d73]
      /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x31f)[0x55a0f34a172f]
      /usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0x5d)[0x55a0f33e6f4d]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x3bf)[0x55a0f3307bff]
      /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_statebb+0x2f3)[0x55a0f3310543]
      /usr/sbin/mysqld(+0x4c6d16)[0x55a0f3310d16]
      /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcjbb+0x17b0)[0x55a0f3312ad0]
      /usr/sbin/mysqld(_Z10do_commandP3THD+0x230)[0x55a0f3313fa0]
      /usr/sbin/mysqld(_Z24do_handle_one_connectionP7CONNECT+0x20a)[0x55a0f33d830a]
      /usr/sbin/mysqld(handle_one_connection+0x3d)[0x55a0f33d84dd]
      /lib64/libpthread.so.0(+0x750b)[0x7f252a66b50b]
      /lib64/libc.so.6(clone+0x3f)[0x7f252866738f]

      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7f23d401ea30): LOAD DATA LOCAL INFILE '/root/QA/mariadb-columnstore-tpcds/insert-data-tables/data/tpcds_2000/item.tbl' INTO TABLE `item` FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' IGNORE 0 LINES
      Connection ID (thread ID): 428
      Status: NOT_KILLED

      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=on,condition_pushdown_for_derived=on

      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.

      Attachments

        Issue Links

          Activity

            Looks in surface similar to MDEV-17262.

            jplindst Jan Lindström (Inactive) added a comment - Looks in surface similar to MDEV-17262 .

            https://github.com/MariaDB/galera/pull/3

            TrxMap structure doesn't take into consideration presence of two trx
            objects with same trx_id (2^64 - 1 which is default trx_id) belonging
            to two different connections.

            This eventually causes same trx object to get shared among two
            different unrelated connections which causes state inconsistency
            leading to crash (RACE CONDITION).

            This problem could be solved by taking into consideration conn-id,
            but that would invite interface change. To avoid this we should
            maintain a separate map of such trx objects based on gu_thread_id.

            sysprg Julius Goryavsky added a comment - https://github.com/MariaDB/galera/pull/3 TrxMap structure doesn't take into consideration presence of two trx objects with same trx_id (2^64 - 1 which is default trx_id) belonging to two different connections. This eventually causes same trx object to get shared among two different unrelated connections which causes state inconsistency leading to crash (RACE CONDITION). This problem could be solved by taking into consideration conn-id, but that would invite interface change. To avoid this we should maintain a separate map of such trx objects based on gu_thread_id.

            bar Please review the latest version or if you already did please mark both PR and this accordingly.

            jplindst Jan Lindström (Inactive) added a comment - bar Please review the latest version or if you already did please mark both PR and this accordingly.

            jplindst, sorry I can't review this change. I suggest to ask someone more familiar with this code. Perhaps Sergey Vojtovich could review.

            bar Alexander Barkov added a comment - jplindst , sorry I can't review this change. I suggest to ask someone more familiar with this code. Perhaps Sergey Vojtovich could review.

            svoj Can you review this ?

            jplindst Jan Lindström (Inactive) added a comment - svoj Can you review this ?

            People

              jplindst Jan Lindström (Inactive)
              winstone Zdravelina Sokolovska (Inactive)
              Votes:
              1 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.