Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Fixed
-
10.1.14, 10.1.19
-
None
-
Red Hat Enterprise Linux Server release 6.8 (Santiago)
Linux hostname 2.6.32-573.12.1.el6.x86_64 #1 SMP Mon Nov 23 12:55:32 EST 2015 x86_64 x86_64 x86_64 GNU/Linux
RPMs all installed via yum from http://yum.mariadb.org/10.1/rhel6-amd64:
MariaDB-client-10.1.14-1.el6.x86_64
MariaDB-common-10.1.14-1.el6.x86_64
MariaDB-compat-10.1.14-1.el6.x86_64
MariaDB-server-10.1.14-1.el6.x86_64
galera-25.3.15-1.rhel6.el6.x86_64
Running on VMware ESX
3 dedicated DB hosts per cluster each with 4GB RAM.Red Hat Enterprise Linux Server release 6.8 (Santiago) Linux hostname 2.6.32-573.12.1.el6.x86_64 #1 SMP Mon Nov 23 12:55:32 EST 2015 x86_64 x86_64 x86_64 GNU/Linux RPMs all installed via yum from http://yum.mariadb.org/10.1/rhel6-amd64: MariaDB-client-10.1.14-1.el6.x86_64 MariaDB-common-10.1.14-1.el6.x86_64 MariaDB-compat-10.1.14-1.el6.x86_64 MariaDB-server-10.1.14-1.el6.x86_64 galera-25.3.15-1.rhel6.el6.x86_64 Running on VMware ESX 3 dedicated DB hosts per cluster each with 4GB RAM.
-
10.2.4-1, 10.2.12, 10.1.31, 10.2.13, 10.1.32
Description
We are running a three-node Galera cluster in both production and preproduction. We recently added data-at-rest encryption to our preproduction cluster. We then found that our standard task to mysqldump production data and load it into preproduction was causing mysqld to crash on the node that was performing the import. The error log showed:
2016-06-20 15:57:33 140415040076544 [ERROR] mysqld: Error writing file '/apps/data/mysqld/mysql-bin' (errno: 0 "Internal error/check (Not system error)")
|
2016-06-20 15:57:33 140415040076544 [ERROR] WSREP: FSM: no such a transition COMMITTING -> ROLLED_BACK
|
160620 15:57:33 [ERROR] mysqld got signal 6 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
Server version: 10.1.14-MariaDBkey_buffer_size=33554432read_buffer_size=131072
|
max_used_connections=1
|
max_threads=502
|
thread_count=3
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1135382 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0x7fb4f726a008
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7fb4ec95b140 thread_stack 0x48400
|
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0x7fb5bd1ed97b]
|
/usr/sbin/mysqld(handle_fatal_signal+0x475)[0x7fb5bcd4c825]
|
/lib64/libpthread.so.0(+0x391f40f7e0)[0x7fb5bc3517e0]
|
/lib64/libc.so.6(gsignal+0x35)[0x7fb5ba7785e5]
|
/lib64/libc.so.6(abort+0x175)[0x7fb5ba779dc5]
|
/usr/lib64/galera/libgalera_smm.so(_ZN6galera3FSMINS_9TrxHandle5StateENS1_10TransitionENS_10EmptyGuardENS_11EmptyActionEE8shift_toES2_+0x180)[0x7fb5a71559f0]
|
mysys/stacktrace.c:247(my_print_stacktrace)[0x7fb5a714940e]
|
sql/signal_handler.cc:160(handle_fatal_signal)[0x7fb5a715c1bd]
|
sql/wsrep_hton.cc:257(wsrep_rollback)[0x7fb5bcce923a]
|
sql/wsrep_hton.cc:268(wsrep_rollback)[0x7fb5bcce9368]
|
sql/handler.cc:1658(ha_rollback_trans(THD*, bool))[0x7fb5bcd4f41a]
|
sql/handler.cc:1483(ha_commit_trans(THD*, bool))[0x7fb5bcd4f804]
|
sql/transaction.cc:435(trans_commit_stmt(THD*))[0x7fb5bccaf258]
|
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0xb7c)[0x7fb5bcbda1bc]
|
sql/sql_class.h:3332(THD::get_stmt_da())[0x7fb5bcbe28fd]
|
sql/sql_parse.cc:7138(wsrep_mysql_parse)[0x7fb5bcbe29fc]
|
sql/sql_parse.cc:1484(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x7fb5bcbe4f3e]
|
sql/sql_parse.cc:1109(do_command(THD*))[0x7fb5bcbe5b0b]
|
sql/sql_connect.cc:1350(do_handle_one_connection(THD*))[0x7fb5bcca169f]
|
sql/sql_connect.cc:1264(handle_one_connection)[0x7fb5bcca17f7]
|
/lib64/libpthread.so.0(+0x391f407aa1)[0x7fb5bc349aa1]
|
/lib64/libc.so.6(clone+0x6d)[0x7fb5ba82eaad]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7fb598c56020): is an invalid pointer
|
Connection ID (thread ID): 6
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
|
|
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
information that should help you find out what is causing the crash.
|
160620 15:57:33 mysqld_safe Number of processes running now: 0
|
160620 15:57:33 mysqld_safe WSREP: not restarting wsrep node automatically
|
160620 15:57:33 mysqld_safe mysqld from pid file /apps/data/mysqld/hostname.pid ended
|
I have honed the import file down to a reproducible minimum of creating and populating two tables, attached as crash1b4.sql. With a fresh database, I can cause the same crash as simply as:
# echo "drop database jimtest ; create database jimtest" | mysql
|
# mysql jimtest < crash1b4.sql
|
ERROR 2013 (HY000) at line 23: Lost connection to MySQL server during query
|
My observations so far:
1. The statements in the file all succeed in isolation
2. If you reverse the order of the two tables in the file, the import succeeds. This is attached as crash1b4-reordered.sql
3. To force the crash we seem to need the long insert line with many rows; shortening this will allow the import to succeed
4. Changing encrypt-tmp-files=1 to encrypt-tmp-files=0 in my.ini and restarting mysqld will allow the import to succeed
5. Removing the node from the cluster (by removing all wsrep_* from my.ini and restarting mysqld) will allow the import to succeed, even with encrypt-tmp-files=1
My my.cnf is attached, with host and domain names replaced for privacy.
Attachments
Issue Links
- blocks
-
MDEV-14868 MariaDB server crashes after using ROLLBACK TO when encrypt_tmp_files=ON
-
- Closed
-
- relates to
-
MDEV-10391 During async GTID replication Galera crashes after error writing to binlog
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Attachment | mysql-error.log [ 42224 ] |
Component/s | Galera [ 10124 ] | |
Fix Version/s | 10.1 [ 16100 ] | |
Assignee | Nirbhay Choubey [ nirbhay_c ] |
Assignee | Nirbhay Choubey [ nirbhay_c ] | Sachin Setiya [ sachin.setiya.007 ] |
Affects Version/s | 10.1.19 [ 22111 ] |
Sprint | 10.2.4-1 [ 121 ] |
Rank | Ranked lower |
Priority | Major [ 3 ] | Critical [ 2 ] |
Fix Version/s | 10.2 [ 14601 ] |
Sprint | 10.2.4-1 [ 121 ] | 10.2.4-1, 10.2.12 [ 121, 216 ] |
Status | Open [ 1 ] | In Progress [ 3 ] |
Sprint | 10.2.4-1, 10.2.12 [ 121, 216 ] | 10.2.4-1, 10.2.12, 10.1.31 [ 121, 216, 225 ] |
Sprint | 10.2.4-1, 10.2.12, 10.1.31 [ 121, 216, 225 ] | 10.2.4-1, 10.2.12, 10.1.31, 10.2.13 [ 121, 216, 225, 228 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Link |
This issue blocks |
Link |
This issue relates to |
Sprint | 10.2.4-1, 10.2.12, 10.1.31, 10.2.13 [ 121, 216, 225, 228 ] | 10.2.4-1, 10.2.12, 10.1.31, 10.2.13, 10.1.32 [ 121, 216, 225, 228, 235 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Sachin Setiya [ sachin.setiya.007 ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Assignee | Sergei Golubchik [ serg ] | Sachin Setiya [ sachin.setiya.007 ] |
Status | In Review [ 10002 ] | Stalled [ 10000 ] |
Status | Stalled [ 10000 ] | In Progress [ 3 ] |
Assignee | Sachin Setiya [ sachin.setiya.007 ] | Sergei Golubchik [ serg ] |
Status | In Progress [ 3 ] | In Review [ 10002 ] |
Comment |
[ Effects of patch before and after
{noformat} diff --git a/unittest/sql/mf_iocache-t.cc b/unittest/sql/mf_iocache-t.cc index 31f98562521..ad5ea86175f 100644 --- a/unittest/sql/mf_iocache-t.cc +++ b/unittest/sql/mf_iocache-t.cc @@ -187,10 +187,73 @@ void mdev9044() close_cached_file(&info); } +/* 2 Reads in cache makes second read to fail (only if first read uses temp file) */ +void mdev10259() +{ + int res; + uchar buf[200]; + memset(buf, FILL, sizeof(buf)); + + diag(" + " Galera and encrypt-tmp-files=1"); + + init_io_cache_encryption(); + + res= open_cached_file(&info, 0, 0, CACHE_SIZE, 0); + ok(res == 0, "open_cached_file" INFO_TAIL); + + res= my_b_write(&info, buf, sizeof(buf)); + ok(res == 0 && info.pos_in_file == 200, "200 write" INFO_TAIL); + + res= my_b_flush_io_cache(&info, 1); + ok(res == 0, "flush" INFO_TAIL); + + ulong saved_pos= my_b_tell(&info); + res= reinit_io_cache(&info, READ_CACHE, 0, 0, 0); + ok(res == 0, "reinit READ_CACHE" INFO_TAIL); + + res= my_b_fill(&info); + ok(res == 200, "fill" INFO_TAIL); + + res= my_b_fill(&info); + ok(res == 0, "fill" INFO_TAIL); + + res= my_b_fill(&info); + ok(res == 0, "fill" INFO_TAIL); + + res= reinit_io_cache(&info, WRITE_CACHE, saved_pos, 0, 0); + ok(res == 0, "reinit WRITE_CACHE" INFO_TAIL); + res= reinit_io_cache(&info, READ_CACHE, 0, 0, 0); + ok(res == 0, "reinit READ_CACHE" INFO_TAIL); + ok(1, "sachin my_b_bytes_in_cache %d", my_b_bytes_in_cache(&info)); + + res= my_b_fill(&info); + ok(res == 200, "fill" INFO_TAIL); + ok(1, "%d" ,res); + + res= my_b_fill(&info); + ok(res == 0, "fill" INFO_TAIL); + + res= my_b_fill(&info); + ok(res == 0, "fill" INFO_TAIL); + + res= reinit_io_cache(&info, WRITE_CACHE, saved_pos, 0, 0); + ok(res == 0, "reinit WRITE_CACHE" INFO_TAIL); + res= reinit_io_cache(&info, READ_CACHE, 0, 0, 0); + ok(res == 0, "reinit READ_CACHE" INFO_TAIL); + ok(1, "sachin my_b_bytes_in_cache %d", my_b_bytes_in_cache(&info)); + + res= my_b_read(&info, buf, sizeof(buf)) || data_bad(buf, sizeof(buf)); + ok(res == 0 && info.pos_in_file == 200, "large read" INFO_TAIL); + + close_cached_file(&info); + +} + int main(int argc __attribute__((unused)),char *argv[]) { MY_INIT(argv[0]); - plan(29); + plan(44); /* temp files with and without encryption */ encrypt_tmp_files= 1; @@ -202,6 +265,10 @@ int main(int argc __attribute__((unused)),char *argv[]) /* regression tests */ mdev9044(); + encrypt_tmp_files= 1; + mdev10259(); + encrypt_tmp_files= 0; + my_end(0); return exit_status(); } {noformat} {noformat} --- not_fix 2018-05-18 13:16:13.844046978 +0530 +++ fix 2018-05-18 12:50:42.117491310 +0530 @@ -37,17 +37,17 @@ ok 32 - flush, pos_in_file = 200, pos_in_mem = 0 ok 33 - reinit READ_CACHE, pos_in_file = 0, pos_in_mem = 0 ok 34 - fill, pos_in_file = 0, pos_in_mem = 0 -ok 35 - fill, pos_in_file = 200, pos_in_mem = 0 -ok 36 - fill, pos_in_file = 200, pos_in_mem = 0 -ok 37 - reinit WRITE_CACHE, pos_in_file = 200, pos_in_mem = 0 +ok 35 - fill, pos_in_file = 0, pos_in_mem = 200 +ok 36 - fill, pos_in_file = 0, pos_in_mem = 200 +ok 37 - reinit WRITE_CACHE, pos_in_file = 0, pos_in_mem = 200 ok 38 - reinit READ_CACHE, pos_in_file = 0, pos_in_mem = 0 -ok 39 - sachin my_b_bytes_in_cache 0 -ok 40 - fill, pos_in_file = 0, pos_in_mem = 0 -ok 41 - 200 -ok 42 - fill, pos_in_file = 200, pos_in_mem = 0 -ok 43 - fill, pos_in_file = 200, pos_in_mem = 0 -ok 44 - reinit WRITE_CACHE, pos_in_file = 200, pos_in_mem = 0 +ok 39 - sachin my_b_bytes_in_cache 200 +not ok 40 - fill, pos_in_file = 0, pos_in_mem = 200 +ok 41 - 0 +ok 42 - fill, pos_in_file = 0, pos_in_mem = 200 +ok 43 - fill, pos_in_file = 0, pos_in_mem = 200 +ok 44 - reinit WRITE_CACHE, pos_in_file = 0, pos_in_mem = 200 ok 45 - reinit READ_CACHE, pos_in_file = 0, pos_in_mem = 0 -ok 46 - sachin my_b_bytes_in_cache 0 +ok 46 - sachin my_b_bytes_in_cache 200 not ok 47 - large read, pos_in_file = 0, pos_in_mem = 200 # 44 tests planned but 47 executed {noformat} ] |
Fix Version/s | 10.1.34 [ 23100 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.1 [ 16100 ] | |
Assignee | Sergei Golubchik [ serg ] | Sachin Setiya [ sachin.setiya.007 ] |
Resolution | Fixed [ 1 ] | |
Status | In Review [ 10002 ] | Closed [ 6 ] |
Fix Version/s | 10.2.16 [ 23110 ] | |
Fix Version/s | 10.3.8 [ 23113 ] |
Workflow | MariaDB v3 [ 76138 ] | MariaDB v4 [ 150537 ] |
Zendesk Related Tickets | 111689 |
Hi I have horned down the cnf file,
Now we can simulate bug with this cnf file(running --start-and-exit and then running crash1b4.sql)
!include ../galera_2nodes.cnf
[mysqld]
file-key-management
log-bin