Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10259

mysqld crash with certain statement length and order with Galera and encrypt-tmp-files=1

    XMLWordPrintable

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Fixed
    • Affects Version/s: 10.1.14, 10.1.19
    • Fix Version/s: 10.1.34, 10.2.16, 10.3.8
    • Component/s: Galera
    • Labels:
      None
    • Environment:
    • Sprint:
      10.2.4-1, 10.2.12, 10.1.31, 10.2.13, 10.1.32

      Description

      We are running a three-node Galera cluster in both production and preproduction. We recently added data-at-rest encryption to our preproduction cluster. We then found that our standard task to mysqldump production data and load it into preproduction was causing mysqld to crash on the node that was performing the import. The error log showed:

      2016-06-20 15:57:33 140415040076544 [ERROR] mysqld: Error writing file '/apps/data/mysqld/mysql-bin' (errno: 0 "Internal error/check (Not system error)")
      2016-06-20 15:57:33 140415040076544 [ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK
      160620 15:57:33 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
      Server version: 10.1.14-MariaDBkey_buffer_size=33554432read_buffer_size=131072
      max_used_connections=1
      max_threads=502
      thread_count=3
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1135382 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0x7fb4f726a008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fb4ec95b140 thread_stack 0x48400
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7fb5bd1ed97b]
      /usr/sbin/mysqld(handle_fatal_signal+0x475)[0x7fb5bcd4c825]
      /lib64/libpthread.so.0(+0x391f40f7e0)[0x7fb5bc3517e0]
      /lib64/libc.so.6(gsignal+0x35)[0x7fb5ba7785e5]
      /lib64/libc.so.6(abort+0x175)[0x7fb5ba779dc5]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x180)[0x7fb5a71559f0]
      mysys/stacktrace.c:247(my_print_stacktrace)[0x7fb5a714940e]
      sql/signal_handler.cc:160(handle_fatal_signal)[0x7fb5a715c1bd]
      sql/wsrep_hton.cc:257(wsrep_rollback)[0x7fb5bcce923a]
      sql/wsrep_hton.cc:268(wsrep_rollback)[0x7fb5bcce9368]
      sql/handler.cc:1658(ha_rollback_trans(THD*, bool))[0x7fb5bcd4f41a]
      sql/handler.cc:1483(ha_commit_trans(THD*, bool))[0x7fb5bcd4f804]
      sql/transaction.cc:435(trans_commit_stmt(THD*))[0x7fb5bccaf258]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0xb7c)[0x7fb5bcbda1bc]
      sql/sql_class.h:3332(THD::get_stmt_da())[0x7fb5bcbe28fd]
      sql/sql_parse.cc:7138(wsrep_mysql_parse)[0x7fb5bcbe29fc]
      sql/sql_parse.cc:1484(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x7fb5bcbe4f3e]
      sql/sql_parse.cc:1109(do_command(THD*))[0x7fb5bcbe5b0b]
      sql/sql_connect.cc:1350(do_handle_one_connection(THD*))[0x7fb5bcca169f]
      sql/sql_connect.cc:1264(handle_one_connection)[0x7fb5bcca17f7]
      /lib64/libpthread.so.0(+0x391f407aa1)[0x7fb5bc349aa1]
      /lib64/libc.so.6(clone+0x6d)[0x7fb5ba82eaad]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7fb598c56020): is an invalid pointer
      Connection ID (thread ID): 6
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
       
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      160620 15:57:33 mysqld_safe Number of processes running now: 0
      160620 15:57:33 mysqld_safe WSREP: not restarting wsrep node automatically
      160620 15:57:33 mysqld_safe mysqld from pid file /apps/data/mysqld/hostname.pid ended
      

      I have honed the import file down to a reproducible minimum of creating and populating two tables, attached as crash1b4.sql. With a fresh database, I can cause the same crash as simply as:

      # echo "drop database jimtest ; create database jimtest" | mysql
      # mysql jimtest < crash1b4.sql 
      ERROR 2013 (HY000) at line 23: Lost connection to MySQL server during query
      

      My observations so far:
      1. The statements in the file all succeed in isolation
      2. If you reverse the order of the two tables in the file, the import succeeds. This is attached as crash1b4-reordered.sql
      3. To force the crash we seem to need the long insert line with many rows; shortening this will allow the import to succeed
      4. Changing encrypt-tmp-files=1 to encrypt-tmp-files=0 in my.ini and restarting mysqld will allow the import to succeed
      5. Removing the node from the cluster (by removing all wsrep_* from my.ini and restarting mysqld) will allow the import to succeed, even with encrypt-tmp-files=1

      My my.cnf is attached, with host and domain names replaced for privacy.

        Attachments

        1. crash1b4.sql
          310 kB
        2. crash1b4-reordered.sql
          310 kB
        3. my.cnf
          3 kB
        4. mysql-error.log
          14 kB

          Issue Links

            Activity

              People

              Assignee:
              sachin.setiya.007 Sachin Setiya (Inactive)
              Reporter:
              jal25 Jim Lamb
              Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved:

                  Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.