Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-10259

mysqld crash with certain statement length and order with Galera and encrypt-tmp-files=1

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Fixed
    • 10.1.14, 10.1.19
    • 10.1.34, 10.2.16, 10.3.8
    • Galera
    • None
    • 10.2.4-1, 10.2.12, 10.1.31, 10.2.13, 10.1.32

    Description

      We are running a three-node Galera cluster in both production and preproduction. We recently added data-at-rest encryption to our preproduction cluster. We then found that our standard task to mysqldump production data and load it into preproduction was causing mysqld to crash on the node that was performing the import. The error log showed:

      2016-06-20 15:57:33 140415040076544 [ERROR] mysqld: Error writing file '/apps/data/mysqld/mysql-bin' (errno: 0 "Internal error/check (Not system error)")
      2016-06-20 15:57:33 140415040076544 [ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK
      160620 15:57:33 [ERROR] mysqld got signal 6 ;
      This could be because you hit a bug. It is also possible that this binary
      or one of the libraries it was linked against is corrupt, improperly built,
      or misconfigured. This error can also be caused by malfunctioning hardware.
       
      To report this bug, see https://mariadb.com/kb/en/reporting-bugs
       
      We will try our best to scrape up some info that will hopefully help
      diagnose the problem, but since we have already crashed,
      something is definitely wrong and this may fail.
      Server version: 10.1.14-MariaDBkey_buffer_size=33554432read_buffer_size=131072
      max_used_connections=1
      max_threads=502
      thread_count=3
      It is possible that mysqld could use up to
      key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1135382 K  bytes of memory
      Hope that's ok; if not, decrease some variables in the equation.
       
      Thread pointer: 0x0x7fb4f726a008
      Attempting backtrace. You can use the following information to find out
      where mysqld died. If you see no messages after this, something went
      terribly wrong...
      stack_bottom = 0x7fb4ec95b140 thread_stack 0x48400
      /usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7fb5bd1ed97b]
      /usr/sbin/mysqld(handle_fatal_signal+0x475)[0x7fb5bcd4c825]
      /lib64/libpthread.so.0(+0x391f40f7e0)[0x7fb5bc3517e0]
      /lib64/libc.so.6(gsignal+0x35)[0x7fb5ba7785e5]
      /lib64/libc.so.6(abort+0x175)[0x7fb5ba779dc5]
      /usr/lib64/galera/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x180)[0x7fb5a71559f0]
      mysys/stacktrace.c:247(my_print_stacktrace)[0x7fb5a714940e]
      sql/signal_handler.cc:160(handle_fatal_signal)[0x7fb5a715c1bd]
      sql/wsrep_hton.cc:257(wsrep_rollback)[0x7fb5bcce923a]
      sql/wsrep_hton.cc:268(wsrep_rollback)[0x7fb5bcce9368]
      sql/handler.cc:1658(ha_rollback_trans(THD*, bool))[0x7fb5bcd4f41a]
      sql/handler.cc:1483(ha_commit_trans(THD*, bool))[0x7fb5bcd4f804]
      sql/transaction.cc:435(trans_commit_stmt(THD*))[0x7fb5bccaf258]
      /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0xb7c)[0x7fb5bcbda1bc]
      sql/sql_class.h:3332(THD::get_stmt_da())[0x7fb5bcbe28fd]
      sql/sql_parse.cc:7138(wsrep_mysql_parse)[0x7fb5bcbe29fc]
      sql/sql_parse.cc:1484(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x7fb5bcbe4f3e]
      sql/sql_parse.cc:1109(do_command(THD*))[0x7fb5bcbe5b0b]
      sql/sql_connect.cc:1350(do_handle_one_connection(THD*))[0x7fb5bcca169f]
      sql/sql_connect.cc:1264(handle_one_connection)[0x7fb5bcca17f7]
      /lib64/libpthread.so.0(+0x391f407aa1)[0x7fb5bc349aa1]
      /lib64/libc.so.6(clone+0x6d)[0x7fb5ba82eaad]
       
      Trying to get some variables.
      Some pointers may be invalid and cause the dump to abort.
      Query (0x7fb598c56020): is an invalid pointer
      Connection ID (thread ID): 6
      Status: NOT_KILLED
       
      Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
       
      The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
      information that should help you find out what is causing the crash.
      160620 15:57:33 mysqld_safe Number of processes running now: 0
      160620 15:57:33 mysqld_safe WSREP: not restarting wsrep node automatically
      160620 15:57:33 mysqld_safe mysqld from pid file /apps/data/mysqld/hostname.pid ended
      

      I have honed the import file down to a reproducible minimum of creating and populating two tables, attached as crash1b4.sql. With a fresh database, I can cause the same crash as simply as:

      # echo "drop database jimtest ; create database jimtest" | mysql
      # mysql jimtest < crash1b4.sql 
      ERROR 2013 (HY000) at line 23: Lost connection to MySQL server during query
      

      My observations so far:
      1. The statements in the file all succeed in isolation
      2. If you reverse the order of the two tables in the file, the import succeeds. This is attached as crash1b4-reordered.sql
      3. To force the crash we seem to need the long insert line with many rows; shortening this will allow the import to succeed
      4. Changing encrypt-tmp-files=1 to encrypt-tmp-files=0 in my.ini and restarting mysqld will allow the import to succeed
      5. Removing the node from the cluster (by removing all wsrep_* from my.ini and restarting mysqld) will allow the import to succeed, even with encrypt-tmp-files=1

      My my.cnf is attached, with host and domain names replaced for privacy.

      Attachments

        1. crash1b4.sql
          310 kB
        2. my.cnf
          3 kB
        3. crash1b4-reordered.sql
          310 kB
        4. mysql-error.log
          14 kB

        Issue Links

          Activity

            jal25 Jim Lamb created issue -
            jal25 Jim Lamb made changes -
            Field Original Value New Value
            Attachment mysql-error.log [ 42224 ]
            elenst Elena Stepanova made changes -
            Component/s Galera [ 10124 ]
            Fix Version/s 10.1 [ 16100 ]
            Assignee Nirbhay Choubey [ nirbhay_c ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Assignee Nirbhay Choubey [ nirbhay_c ] Sachin Setiya [ sachin.setiya.007 ]
            GeoffMontee Geoff Montee (Inactive) made changes -
            Affects Version/s 10.1.19 [ 22111 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Sprint 10.2.4-1 [ 121 ]
            ratzpo Rasmus Johansson (Inactive) made changes -
            Rank Ranked lower
            serg Sergei Golubchik made changes -
            Priority Major [ 3 ] Critical [ 2 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.2 [ 14601 ]
            serg Sergei Golubchik made changes -
            Sprint 10.2.4-1 [ 121 ] 10.2.4-1, 10.2.12 [ 121, 216 ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Status Open [ 1 ] In Progress [ 3 ]
            serg Sergei Golubchik made changes -
            Sprint 10.2.4-1, 10.2.12 [ 121, 216 ] 10.2.4-1, 10.2.12, 10.1.31 [ 121, 216, 225 ]
            serg Sergei Golubchik made changes -
            Sprint 10.2.4-1, 10.2.12, 10.1.31 [ 121, 216, 225 ] 10.2.4-1, 10.2.12, 10.1.31, 10.2.13 [ 121, 216, 225, 228 ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Status In Progress [ 3 ] Stalled [ 10000 ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            GeoffMontee Geoff Montee (Inactive) made changes -
            serg Sergei Golubchik made changes -
            Sprint 10.2.4-1, 10.2.12, 10.1.31, 10.2.13 [ 121, 216, 225, 228 ] 10.2.4-1, 10.2.12, 10.1.31, 10.2.13, 10.1.32 [ 121, 216, 225, 228, 235 ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Assignee Sachin Setiya [ sachin.setiya.007 ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            serg Sergei Golubchik made changes -
            Assignee Sergei Golubchik [ serg ] Sachin Setiya [ sachin.setiya.007 ]
            Status In Review [ 10002 ] Stalled [ 10000 ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Status Stalled [ 10000 ] In Progress [ 3 ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Assignee Sachin Setiya [ sachin.setiya.007 ] Sergei Golubchik [ serg ]
            Status In Progress [ 3 ] In Review [ 10002 ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Comment [ Effects of patch before and after
            {noformat}

            diff --git a/unittest/sql/mf_iocache-t.cc b/unittest/sql/mf_iocache-t.cc
            index 31f98562521..ad5ea86175f 100644
            --- a/unittest/sql/mf_iocache-t.cc
            +++ b/unittest/sql/mf_iocache-t.cc
            @@ -187,10 +187,73 @@ void mdev9044()
               close_cached_file(&info);
             }
             
            +/* 2 Reads in cache makes second read to fail (only if first read uses temp file) */
            +void mdev10259()
            +{
            + int res;
            + uchar buf[200];
            + memset(buf, FILL, sizeof(buf));
            +
            + diag("MDEV-10259- mysqld crash with certain statement length and order with"
            + " Galera and encrypt-tmp-files=1");
            +
            + init_io_cache_encryption();
            +
            + res= open_cached_file(&info, 0, 0, CACHE_SIZE, 0);
            + ok(res == 0, "open_cached_file" INFO_TAIL);
            +
            + res= my_b_write(&info, buf, sizeof(buf));
            + ok(res == 0 && info.pos_in_file == 200, "200 write" INFO_TAIL);
            +
            + res= my_b_flush_io_cache(&info, 1);
            + ok(res == 0, "flush" INFO_TAIL);
            +
            + ulong saved_pos= my_b_tell(&info);
            + res= reinit_io_cache(&info, READ_CACHE, 0, 0, 0);
            + ok(res == 0, "reinit READ_CACHE" INFO_TAIL);
            +
            + res= my_b_fill(&info);
            + ok(res == 200, "fill" INFO_TAIL);
            +
            + res= my_b_fill(&info);
            + ok(res == 0, "fill" INFO_TAIL);
            +
            + res= my_b_fill(&info);
            + ok(res == 0, "fill" INFO_TAIL);
            +
            + res= reinit_io_cache(&info, WRITE_CACHE, saved_pos, 0, 0);
            + ok(res == 0, "reinit WRITE_CACHE" INFO_TAIL);
            + res= reinit_io_cache(&info, READ_CACHE, 0, 0, 0);
            + ok(res == 0, "reinit READ_CACHE" INFO_TAIL);
            + ok(1, "sachin my_b_bytes_in_cache %d", my_b_bytes_in_cache(&info));
            +
            + res= my_b_fill(&info);
            + ok(res == 200, "fill" INFO_TAIL);
            + ok(1, "%d" ,res);
            +
            + res= my_b_fill(&info);
            + ok(res == 0, "fill" INFO_TAIL);
            +
            + res= my_b_fill(&info);
            + ok(res == 0, "fill" INFO_TAIL);
            +
            + res= reinit_io_cache(&info, WRITE_CACHE, saved_pos, 0, 0);
            + ok(res == 0, "reinit WRITE_CACHE" INFO_TAIL);
            + res= reinit_io_cache(&info, READ_CACHE, 0, 0, 0);
            + ok(res == 0, "reinit READ_CACHE" INFO_TAIL);
            + ok(1, "sachin my_b_bytes_in_cache %d", my_b_bytes_in_cache(&info));
            +
            + res= my_b_read(&info, buf, sizeof(buf)) || data_bad(buf, sizeof(buf));
            + ok(res == 0 && info.pos_in_file == 200, "large read" INFO_TAIL);
            +
            + close_cached_file(&info);
            +
            +}
            +
             int main(int argc __attribute__((unused)),char *argv[])
             {
               MY_INIT(argv[0]);
            - plan(29);
            + plan(44);
             
               /* temp files with and without encryption */
               encrypt_tmp_files= 1;
            @@ -202,6 +265,10 @@ int main(int argc __attribute__((unused)),char *argv[])
               /* regression tests */
               mdev9044();
             
            + encrypt_tmp_files= 1;
            + mdev10259();
            + encrypt_tmp_files= 0;
            +
               my_end(0);
               return exit_status();
             }

            {noformat}

            {noformat}
            --- not_fix 2018-05-18 13:16:13.844046978 +0530
            +++ fix 2018-05-18 12:50:42.117491310 +0530
            @@ -37,17 +37,17 @@
             ok 32 - flush, pos_in_file = 200, pos_in_mem = 0
             ok 33 - reinit READ_CACHE, pos_in_file = 0, pos_in_mem = 0
             ok 34 - fill, pos_in_file = 0, pos_in_mem = 0
            -ok 35 - fill, pos_in_file = 200, pos_in_mem = 0
            -ok 36 - fill, pos_in_file = 200, pos_in_mem = 0
            -ok 37 - reinit WRITE_CACHE, pos_in_file = 200, pos_in_mem = 0
            +ok 35 - fill, pos_in_file = 0, pos_in_mem = 200
            +ok 36 - fill, pos_in_file = 0, pos_in_mem = 200
            +ok 37 - reinit WRITE_CACHE, pos_in_file = 0, pos_in_mem = 200
             ok 38 - reinit READ_CACHE, pos_in_file = 0, pos_in_mem = 0
            -ok 39 - sachin my_b_bytes_in_cache 0
            -ok 40 - fill, pos_in_file = 0, pos_in_mem = 0
            -ok 41 - 200
            -ok 42 - fill, pos_in_file = 200, pos_in_mem = 0
            -ok 43 - fill, pos_in_file = 200, pos_in_mem = 0
            -ok 44 - reinit WRITE_CACHE, pos_in_file = 200, pos_in_mem = 0
            +ok 39 - sachin my_b_bytes_in_cache 200
            +not ok 40 - fill, pos_in_file = 0, pos_in_mem = 200
            +ok 41 - 0
            +ok 42 - fill, pos_in_file = 0, pos_in_mem = 200
            +ok 43 - fill, pos_in_file = 0, pos_in_mem = 200
            +ok 44 - reinit WRITE_CACHE, pos_in_file = 0, pos_in_mem = 200
             ok 45 - reinit READ_CACHE, pos_in_file = 0, pos_in_mem = 0
            -ok 46 - sachin my_b_bytes_in_cache 0
            +ok 46 - sachin my_b_bytes_in_cache 200
             not ok 47 - large read, pos_in_file = 0, pos_in_mem = 200
             # 44 tests planned but 47 executed


            {noformat}
            ]
            sachin.setiya.007 Sachin Setiya (Inactive) made changes -
            Fix Version/s 10.1.34 [ 23100 ]
            Fix Version/s 10.2 [ 14601 ]
            Fix Version/s 10.1 [ 16100 ]
            Assignee Sergei Golubchik [ serg ] Sachin Setiya [ sachin.setiya.007 ]
            Resolution Fixed [ 1 ]
            Status In Review [ 10002 ] Closed [ 6 ]
            serg Sergei Golubchik made changes -
            Fix Version/s 10.2.16 [ 23110 ]
            Fix Version/s 10.3.8 [ 23113 ]
            serg Sergei Golubchik made changes -
            Workflow MariaDB v3 [ 76138 ] MariaDB v4 [ 150537 ]
            mariadb-jira-automation Jira Automation (IT) made changes -
            Zendesk Related Tickets 111689

            People

              sachin.setiya.007 Sachin Setiya (Inactive)
              jal25 Jim Lamb
              Votes:
              2 Vote for this issue
              Watchers:
              11 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.