mysqld crash with certain statement length and order with Galera and encrypt-tmp-files=1



      We are running a three-node Galera cluster in both production and preproduction. We recently added data-at-rest encryption to our preproduction cluster. We then found that our standard task to mysqldump production data and load it into preproduction was causing mysqld to crash on the node that was performing the import. The error log showed:

      2016-06-20 15:57:33 140415040076544 [ERROR] mysqld: Error writing file '/apps/data/mysqld/mysql-bin' (errno: 0 "Internal error/check (Not system error)")
      2016-06-20 15:57:33 140415040076544 [ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK
      160620 15:57:33 [ERROR] mysqld got signal 6 ;
      Server version: 10.1.14-MariaDBkey_buffer_size=33554432read_buffer_size=131072
      Thread pointer: 0x0x7fb4f726a008
      stack_bottom = 0x7fb4ec95b140 thread_stack 0x48400
      sql/handler.cc:1658(ha_rollback_trans(THD*, bool))[0x7fb5bcd4f41a]
      sql/handler.cc:1483(ha_commit_trans(THD*, bool))[0x7fb5bcd4f804]
      sql/sql_parse.cc:1484(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x7fb5bcbe4f3e]
      Query (0x7fb598c56020): is an invalid pointer
      Connection ID (thread ID): 6
      Status: NOT_KILLED
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
      160620 15:57:33 mysqld_safe Number of processes running now: 0
      160620 15:57:33 mysqld_safe WSREP: not restarting wsrep node automatically
      160620 15:57:33 mysqld_safe mysqld from pid file /apps/data/mysqld/hostname.pid ended

      I have honed the import file down to a reproducible minimum of creating and populating two tables, attached as crash1b4.sql. With a fresh database, I can cause the same crash as simply as:

      # echo "drop database jimtest ; create database jimtest" | mysql
      # mysql jimtest < crash1b4.sql 
      ERROR 2013 (HY000) at line 23: Lost connection to MySQL server during query

      My observations so far:
      1. The statements in the file all succeed in isolation
      2. If you reverse the order of the two tables in the file, the import succeeds. This is attached as crash1b4-reordered.sql
      3. To force the crash we seem to need the long insert line with many rows; shortening this will allow the import to succeed
      4. Changing encrypt-tmp-files=1 to encrypt-tmp-files=0 in my.ini and restarting mysqld will allow the import to succeed
      5. Removing the node from the cluster (by removing all wsrep_* from my.ini and restarting mysqld) will allow the import to succeed, even with encrypt-tmp-files=1

      My my.cnf is attached, with host and domain names replaced for privacy.


