Details
-
Bug
-
Status: Closed (View Workflow)
-
Major
-
Resolution: Duplicate
-
10.0.12-galera
-
- RHEL 6.5
- MGC 10.0.12
Description
# my_print_defaults mysqld
|
--performance_schema=on
|
--datadir=/opt/data/mysql
|
--tmpdir=/tmp/
|
--log_error=/opt/data/mysql/error.log
|
--log_warnings=2
|
--query_cache_size=0
|
--query_cache_type=0
|
--log_bin
|
--log_slave_updates
|
--binlog_format=ROW
|
--innodb_log_file_size=128M
|
--innodb_buffer_pool_size=2G
|
--innodb_flush_method=O_DIRECT
|
--innodb_flush_log_at_trx_commit=1
|
--innodb_autoinc_lock_mode=2
|
--wsrep_cluster_name=GaleraPOC
|
--wsrep_cluster_address=gcomm://mariadb001,mariadb002,mariadb003
|
--wsrep_provider=/usr/lib64/galera/libgalera_smm.so
|
--wsrep_provider_options=gcache.size=512M
|
--wsrep_sst_method=xtrabackup-v2
|
--wsrep_sst_auth=sst:sstpass
|
--server_id=1
|
--wsrep_node_name=mariadb001
|
--wsrep_node_address=mariadb001
|
--wsrep_sst_receive_address=mariadb001
|
* 10.0.12-MariaDB-wsrep-log MariaDB Server, wsrep_25.10.r4002
|
* wsrep_provider_version=25.3.5(rXXXX)
|
# ./node1.sh
|
.............ERROR 2013 (HY000) at line 1: Lost connection to MySQL server during query
|
ERROR with query
|
# ./node2.sh
|
.......
|
The server on node1 crashes:
140828 15:34:49 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.0.12-MariaDB-wsrep-log
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=7
|
max_threads=153
|
thread_count=8
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467215 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0x7f2d7ce8f008
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f2e43e56ce0 thread_stack 0x48000
|
/usr/sbin/mysqld(my_print_stacktrace+0x2b)[0xb914fb]
|
/usr/sbin/mysqld(handle_fatal_signal+0x398)[0x743318]
|
/lib64/libpthread.so.0(+0xf710)[0x7f2e43b1c710]
|
/lib64/libc.so.6(+0x83742)[0x7f2e421cc742]
|
/usr/sbin/mysqld(_ZNK19rpl_sql_thread_info22cached_charset_compareEPc+0x20)[0x699bf0]
|
/usr/sbin/mysqld(_ZN15Query_log_event14do_apply_eventEP14rpl_group_infoPKcj+0x837)[0x800817]
|
/usr/sbin/mysqld(_Z14wsrep_apply_cbPvPKvmjPK14wsrep_trx_meta+0x525)[0x6f05e5]
|
/usr/lib64/galera/libgalera_smm.so(_ZNK6galera9TrxHandle5applyEPvPF15wsrep_cb_statusS1_PKvmjPK14wsrep_trx_metaERS6_+0xb1)[0x7f2e3eb
|
4c2c1]
|
/usr/lib64/galera/libgalera_smm.so(+0x1aaf95)[0x7f2e3eb83f95]
|
/usr/lib64/galera/libgalera_smm.so(_ZN6galera13ReplicatorSMM10replay_trxEPNS_9TrxHandleEPv+0x12e)[0x7f2e3eb8485e]
|
/usr/lib64/galera/libgalera_smm.so(galera_replay_trx+0x5c)[0x7f2e3eb9845c]
|
/usr/sbin/mysqld(_Z24wsrep_replay_transactionP3THD+0x2de)[0x6f217e]
|
/usr/sbin/mysqld[0x5e1120]
|
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x16d0)[0x5e28a0]
|
/usr/sbin/mysqld(_Z10do_commandP3THD+0x132)[0x5e3072]
|
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x54b)[0x6a190b]
|
/usr/sbin/mysqld(handle_one_connection+0x42)[0x6a1a02]
|
/lib64/libpthread.so.0(+0x79d1)[0x7f2e43b149d1]
|
/lib64/libc.so.6(clone+0x6d)[0x7f2e42231b5d]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f2e1b631251): is an invalid pointer
|
Connection ID (thread ID): 331
|
Status: NOT_KILLED
|
I spoke with Teemu from Codership and he mentioned this is likely a bug specific to MariaDB.
I tested this on:
- 5.5.38: no crash
- PXC 5.6.19: no crash
12:22 < gryp> I have crash of MGC-10.0.12: https://gist.github.com/grypyrg/22b0512cb59b5e32a538, seems to be similar to
knielsen's fix of https://mariadb.atlassian.net/browse/MDEV-6156 . Anybody got any ideas?
12:27 < knielsen> gryp: The code path involved is completely different (Galera replication vs. MariaDB parallel replication).
But it is possible that Galera has a similar bug (it also does parallel replication), might help the
developers track it down...
12:29 < knielsen> Interesting that Galera seems to be calling Query_log_event::do_apply_event() ... I suppose this is to
handle DDL perhaps, which shouldn't run in parallel
12:29 < knielsen> well, not that I know much of what Galera code does
12:29 < gryp> indeed. it's during some DDL's. I'm still working on getting a cleaner test case.
12:31 < gryp> I'm also running with wsrep_slave_threads=1 and the DDL is executed on the crashing node.
12:31 < gryp> knielsen: I'll talk some more with Codership, see what they think based on your feedback. TNx
12:33 < knielsen> gryp: there's more discussions of this bug on maria-developers@ (google should know), in case it helps
understand the issue. Though it could be something different, hard to say
12:33 < gryp> Yep, have found that as well. tnx. (https://www.mail-archive.com/maria-developers@lists.launchpad.net/msg06846.html)