|
The user also saw this crash that may be related:
160203 7:48:06 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.1.10-MariaDB-1~trusty-log
|
key_buffer_size=134217728
|
read_buffer_size=2097152
|
max_used_connections=29
|
max_threads=102
|
thread_count=8
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 759823 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0x7f2f030dd008
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x7f2f3bd98df0 thread_stack 0x48000
|
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f2f49924a6e]
|
/usr/sbin/mysqld(handle_fatal_signal+0x38d)[0x7f2f4945316d]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f2f479ad340]
|
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7f2f49513edd]
|
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7f2f49513f71]
|
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7f2f495164c2]
|
/usr/sbin/mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x12c)[0x7f2f493bc7dc]
|
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1327)[0x7f2f492d3e57]
|
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x20d)[0x7f2f492dc1cd]
|
/usr/sbin/mysqld(+0x4369d9)[0x7f2f492dc9d9]
|
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1e1b)[0x7f2f492deecb]
|
/usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f2f492dfca9]
|
/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f2f493a292a]
|
/usr/sbin/mysqld(handle_one_connection+0x40)[0x7f2f493a2b00]
|
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f2f479a5182]
|
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f2f470c847d]
|
|
Trying to get some variables.
|
Some pointers may be invalid and cause the dump to abort.
|
Query (0x7f2f04938020): is an invalid pointer
|
Connection ID (thread ID): 1283552
|
Status: NOT_KILLED
|
|
Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on
|
|
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
information that should help you find out what is causing the crash.
|
|
|
support issue
|
|
Hi!
If these two crash are related, I think the problem maybe is that binlog_xid_count_list contains NULL or illegal entry point.
I checked MYSQL_BIN_LOG::do_checkpoint_request(), I think only "entry->binlog_id" can trigger this crash, and in the TC_LOG_BINLOG::mark_xid_done(), it also accessed "b->binlog_id".
So I think it maybe has an illegal point of xid_count_per_binlog in binlog_xid_count_list.
But we have no core file, I can't verify it.
|
|
The crash was actually happening on a Galera cluster node; and we got another report with the stack trace identical to above, MDEV-10196, which also affects a Galera cluster. So, the suspicion is Galera might be an important factor here.
Reassigning to nirbhay_c to take a look.
|
|
Yet another one, MDEV-10276.
|
|
And another one, MDEV-10301
|
|
This is a very serious issue!!! Last month, we updated galera from 10.0.x to 10.1.14 in our production environment, but already had crashed 3 times so far, which we never met on 10.0.x version. All crash is inexplicable with signal 11 error. There are several issues related, which looks the same. We want to wait for fix version, but why this serious issue is still "open and unresolved" after this long ?
|
|
ypnuaa037 That's because we haven't been able to reproduce this issue on our end.
Will it be possible for you to run a debug node and report a better stacktrace in case
the crash happens again?
|
|
Ubuntu 14.04.4
libc6:amd64 2.19-0ubuntu6.9
mariadb-server-10.1 10.1.14+maria-1~trusty
mariadb-server-core-10.1 10.1.14+maria-1~trusty
5 Node mariadb galera cluster - crashes sometimes while logrotation is done
syslog:
Dec 7 02:16:55 galera04 mysqld: 161207 2:16:55 [ERROR] mysqld got signal 11 ;
Dec 7 02:16:55 galera04 mysqld: This could be because you hit a bug. It is also possible that this binary
Dec 7 02:16:55 galera04 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Dec 7 02:16:55 galera04 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Dec 7 02:16:55 galera04 mysqld:
Dec 7 02:16:55 galera04 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Dec 7 02:16:55 galera04 mysqld:
Dec 7 02:16:55 galera04 mysqld: We will try our best to scrape up some info that will hopefully help
Dec 7 02:16:55 galera04 mysqld: diagnose the problem, but since we have already crashed,
Dec 7 02:16:55 galera04 mysqld: something is definitely wrong and this may fail.
Dec 7 02:16:55 galera04 mysqld:
Dec 7 02:16:55 galera04 mysqld: Server version: 10.1.14-MariaDB-1~trusty
Dec 7 02:16:55 galera04 mysqld: key_buffer_size=134217728
Dec 7 02:16:55 galera04 mysqld: read_buffer_size=2097152
Dec 7 02:16:55 galera04 mysqld: max_used_connections=14
Dec 7 02:16:55 galera04 mysqld: max_threads=262
Dec 7 02:16:55 galera04 mysqld: thread_count=19
Dec 7 02:16:55 galera04 mysqld: It is possible that mysqld could use up to
Dec 7 02:16:55 galera04 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2819298 K bytes of memory
Dec 7 02:16:55 galera04 mysqld: Hope that's ok; if not, decrease some variables in the equation.
caused by logrotation:
/var/log/mysql.log /var/log/mysql/mysql.log /var/log/mysql/mariadb-slow.log
{
daily
rotate 7
missingok
create 640 mysql adm
compress
sharedscripts
postrotate
test -x /usr/bin/mysqladmin || exit 0
if [ -f `my_print_defaults --mysqld | grep -oP "pid-file=\K[^$]+"` ]; then
# If this fails, check debian.conf!
mysqladmin --defaults-file=/etc/mysql/debian.cnf flush-logs
fi
endscript
}
|
|
Ubuntu 14.04 LTS
Mariadb 10.1.19
3 nodes
Main app running : Magento1.9.3.1
Dec 18 13:48:22 mariadb1 mysqld: 161218 13:48:22 [ERROR] mysqld got signal 11 ;
|
Dec 18 13:48:22 mariadb1 mysqld: This could be because you hit a bug. It is also possible that this binary
|
Dec 18 13:48:22 mariadb1 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
|
Dec 18 13:48:22 mariadb1 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld: We will try our best to scrape up some info that will hopefully help
|
Dec 18 13:48:22 mariadb1 mysqld: diagnose the problem, but since we have already crashed,
|
Dec 18 13:48:22 mariadb1 mysqld: something is definitely wrong and this may fail.
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld: Server version: 10.1.19-MariaDB-1~trusty
|
Dec 18 13:48:22 mariadb1 mysqld: key_buffer_size=134217728
|
Dec 18 13:48:22 mariadb1 mysqld: read_buffer_size=2097152
|
Dec 18 13:48:22 mariadb1 mysqld: max_used_connections=14
|
Dec 18 13:48:22 mariadb1 mysqld: max_threads=302
|
Dec 18 13:48:22 mariadb1 mysqld: thread_count=11
|
Dec 18 13:48:22 mariadb1 mysqld: It is possible that mysqld could use up to
|
Dec 18 13:48:22 mariadb1 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1992742 K bytes of memory
|
Dec 18 13:48:22 mariadb1 mysqld: Hope that's ok; if not, decrease some variables in the equation.
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld: Thread pointer: 0x0x7efd58812008
|
Dec 18 13:48:22 mariadb1 mysqld: Attempting backtrace. You can use the following information to find out
|
Dec 18 13:48:22 mariadb1 mysqld: where mysqld died. If you see no messages after this, something went
|
Dec 18 13:48:22 mariadb1 mysqld: terribly wrong...
|
Dec 18 13:48:22 mariadb1 mysqld: stack_bottom = 0x7eff93b6c210 thread_stack 0x48400
|
Dec 18 13:48:22 mariadb1 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7eff946dd67e]
|
Dec 18 13:48:22 mariadb1 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x7eff94200ba5]
|
Dec 18 13:48:22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7eff92755330]
|
Dec 18 13:48:22 mariadb1 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0xe8)[0x7eff942c2e88]
|
Dec 18 13:48:22 mariadb1 mysqld: /usr/sbin/mysqld(binlog_background_thread+0x2c5)[0x7eff942c3525]
|
Dec 18 13:48:22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7eff9274d184]
|
Dec 18 13:48:22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7eff91c6c37d]
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld: Trying to get some variables.
|
Dec 18 13:48:22 mariadb1 mysqld: Some pointers may be invalid and cause the dump to abort.
|
Dec 18 13:48:22 mariadb1 mysqld: Query (0x0): is an invalid pointer
|
Dec 18 13:48:22 mariadb1 mysqld: Connection ID (thread ID): 3
|
Dec 18 13:48:22 mariadb1 mysqld: Status: NOT_KILLED
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
Dec 18 13:48:22 mariadb1 mysqld: information that should help you find out what is causing the crash.
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld: We think the query pointer is invalid, but we will try to print it anyway.
|
Dec 18 13:48:22 mariadb1 mysqld: Query:
|
Dec 18 13:48:22 mariadb1 mysqld:
|
Dec 18 13:48:22 mariadb1 mysqld_safe: Number of processes running now: 0
|
Dec 18 13:48:22 mariadb1 mysqld_safe: WSREP: not restarting wsrep node automatically
|
Dec 18 13:48:22 mariadb1 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended
|
I don't know how to find the cause, it seems quite random. How can I help you fix this ?
|
|
I did upload a core dump to MDEV-10809 which seems to be related to MDEV-9510
Has anybody solved this issue?
|
|
Affected versions are at least (tested by us) 10.1.10, 10.1.14, 10.1.19 and 10.1.20.
|
|
@eVincent, GeoffMontee, plinux
Do you use HAproxy to do loadbalancing to the galera cluster? How is your healthcheck implemented?
|
|
@engel75: I manage the infrastructure mentionned by eVincent
We use a simple "mysql-check", with a sort of static "one to one" binding between web servers and mariadb servers, like so:
- web1 => mariadb1 with failover on mariadb2 & 3
- web2 => mariadb2 with failover on mariadb1 & 3
- web3 => mariadb3 with failover on mariadb1 & 2
Here the HAProxy snippet:
|
haproxy.cfg
|
listen mariadb_cluster
|
bind *:3307
|
balance source
|
mode tcp
|
option tcpka
|
option mysql-check user haproxy
|
server mariadb1 mariadb1:3306 check
|
server mariadb2 mariadb2:3306 check backup
|
server mariadb3 mariadb3:3306 check backup
|
Benjamin
PS: downgrading to MariaDB 5.5 + Galera Cluster seemed to fix the issue (no crashes since the last startup of the cluster, more than 24h ago)
|
|
Thank you Benjamin. I asked because after disabling haproxy health checks on one of our nodes that node did not have a stuck "flush logs" this morning. I will test this with two nodes till Monday to proof it. As mysql-check does not check if a node is "Synced" we do use a shell script initiated by xinetd to check the status:
show global status where variable_name='wsrep_local_state';
We do use 3 haproxy loadbalancers ("clusterd" via keepalived) to route traffic to the cluster. Thats a lot of checks incoming. We might use a perl daemon to handle those requests kind of "man in the middle".
|
|
The problem with flush logs has nothing to do with the health checks. It seems to be load related as only nodes with enabled haproxy health checks get stuck with flush logs.
|
|
is anybody working on this one? We are still forced to "clean up" the production Galera Cluster every Morning.
As "service mysql stop" gets stuck as well we have to use "kill -9"
echo disable haproxy health check
|
sed -i -e "s/^#echo disabled/echo disabled/" -e "s/^#exit 0/exit 0/" /usr/local/bin/mysqlchk.sh
|
|
echo wait 10s
|
sleep 10
|
|
echo stop mysql
|
timeout 5s service mysql stop
|
|
echo send TERM
|
kill $(pidof mysqld)
|
|
echo wait 5s
|
sleep 5
|
|
echo send KILL
|
kill -9 $(pidof mysqld)
|
|
sleep 1
|
|
echo check mysql is gone
|
ps faux | grep mysql
|
|
|
sleep 2
|
|
echo start mysql
|
service mysql start
|
|
sleep 6s
|
|
echo check galera is synced
|
mysql -e "SHOW STATUS WHERE variable_name ='wsrep_local_state_comment';"
|
|
echo enable haproxy health check
|
sed -i -e "s/^echo disabled/#echo disabled/" -e "s/^exit 0/#exit 0/" /usr/local/bin/mysqlchk.sh
|
We consider using Percona XtraDB Cluster if this issue is not treated as highest priority. Could anybody please let everyone know what's going on?
|
|
As a reminder, if downgrading to MariaDB 5.5 is an option, it is stable.
|
|
engel75 yes, I am looking into it. Unfortunately, the core dump that you submitted did
not help and so far I have not been able repeat this issue on my end.
|
|
Dear nirbhay_c,
I am glad you are working on this issue. Today xtrabackup crashed one node of that cluster with an error that might be related to the same issue:
Feb 6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06 9:50:23 140112419612416 [Note] WSREP: Member 3.0 (galera01-customerA-prod-back) desyncs itself from group
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06 9:50:23 140112419612416 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 123190807)
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06 9:50:23 140112468671232 [Note] WSREP: Provider paused at 202dc465-47c0-11e4-b99e-9b1f0ffcf741:123190807 (440)
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: 170206 9:50:23 [ERROR] mysqld got signal 11 ;
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: This could be because you hit a bug. It is also possible that this binary
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: or one of the libraries it was linked against is corrupt, improperly built,
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
|
Feb 6 09:50:23 galera01-customerA-prod mysqld:
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
Feb 6 09:50:23 galera01-customerA-prod mysqld:
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: We will try our best to scrape up some info that will hopefully help
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: diagnose the problem, but since we have already crashed,
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: something is definitely wrong and this may fail.
|
Feb 6 09:50:23 galera01-customerA-prod mysqld:
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Server version: 10.1.20-MariaDB-1~trusty
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: key_buffer_size=134217728
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: read_buffer_size=2097152
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: max_used_connections=13
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: max_threads=262
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: thread_count=21
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: It is possible that mysqld could use up to
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2819312 K bytes of memory
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Hope that's ok; if not, decrease some variables in the equation.
|
Feb 6 09:50:23 galera01-customerA-prod mysqld:
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Thread pointer: 0x0x7f6c47ad8008
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Attempting backtrace. You can use the following information to find out
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: where mysqld died. If you see no messages after this, something went
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: terribly wrong...
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: stack_bottom = 0x7f6e79ec71f0 thread_stack 0x48400
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f6fbf90bc2e]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7f6fbf42ea95]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f6fbd779330]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7f6fbf4f0aad]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7f6fbf4f0b41]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7f6fbf4f31b2]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x130)[0x7f6fbf396f10]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1314)[0x7f6fbf2a9cd4]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x331)[0x7f6fbf2b22a1]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(+0x439ac9)[0x7f6fbf2b2ac9]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1e2b)[0x7f6fbf2b4fcb]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f6fbf2b5dc9]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f6fbf37cefa]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(handle_one_connection+0x40)[0x7f6fbf37d0d0]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f6fbd771184]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6fbcc9037d]
|
Feb 6 09:50:23 galera01-customerA-prod mysqld:
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Trying to get some variables.
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Some pointers may be invalid and cause the dump to abort.
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Query (0x7f6c33021020): FLUSH BINARY LOGS
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Connection ID (thread ID): 284
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Status: NOT_KILLED
|
Feb 6 09:50:23 galera01-customerA-prod mysqld:
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
|
Feb 6 09:50:23 galera01-customerA-prod mysqld:
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: information that should help you find out what is causing the crash.
|
Feb 6 09:50:23 galera01-customerA-prod mysqld: Writing a core file
|
Without disabling the nodes HAProxy health check I was not able to finish the backup successful.
I've got two core dumps related to the issue:
rw------ 1 mysql mysql 9.6G Feb 6 09:46 core.mysqld.49551.11.1486370801
rw------ 1 mysql mysql 9.5G Feb 6 09:50 core.mysqld.9877.11.1486371023
Should I upload both?
|
|
I was hit by this last night - 1 out of 3 nodes in a Galera cluster.
At that time the cluster would not be performing anything unusual and in fact should have been relatively quiescent.
Version 10.1.18 on Ubuntu 16.04
There is nothing else in the logs before this time.
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: 170224 1:26:33 [ERROR] mysqld got signal 11 ;
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: This could be because you hit a bug. It is also possible that this binary
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: or one of the libraries it was linked against is corrupt, improperly built,
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: or misconfigured. This error can also be caused by malfunctioning hardware.
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: We will try our best to scrape up some info that will hopefully help
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: diagnose the problem, but since we have already crashed,
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: something is definitely wrong and this may fail.
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Server version: 10.1.18-MariaDB-1~xenial
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: key_buffer_size=134217728
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: read_buffer_size=2097152
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: max_used_connections=13
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: max_threads=202
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: thread_count=7
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: It is possible that mysqld could use up to
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1376288 K bytes of memory
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Hope that's ok; if not, decrease some variables in the equation.
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Thread pointer: 0x0x7f58c3849008
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Attempting backtrace. You can use the following information to find out
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: where mysqld died. If you see no messages after this, something went
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: terribly wrong...
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: stack_bottom = 0x7f595dc5a0a8 thread_stack 0x48400
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55638521562e]
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x556384d643d5]
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7f595c3cc3d0]
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x556384e1931b]
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(binlog_background_thread+0x2f3)[0x556384e198f3]
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(+0x705fb1)[0x556384ed4fb1]
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa)[0x7f595c3c26fa]
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f595ba6db5d]
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Trying to get some variables.
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Some pointers may be invalid and cause the dump to abort.
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Query (0x0):
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Connection ID (thread ID): 3
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Status: NOT_KILLED
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: information that should help you find out what is causing the crash.
|
Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE
|
Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Unit entered failed state.
|
Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Failed with result 'exit-code'.
|
|
|
This has reoccurred twice since my last comment. As a means to automatically restart the process I have resorted to setting the following for MariaDB's systemd config:
Restart=on-failure
Originally it is set to 'on-abort' but it seems with a signal 11, which occurs with this bug, the default value will not restart the process. It's not pretty and it's not advised but at least it provides an easy way to ensure the daemon is restarted upon hitting this bug.
|
|
madpsy FYI the systemd config related to Restarts/aborts has been fixed in MDEV-10382 (10.1.20)
|
|
Reading some of the other related bugs there is comment that disabling binary logs prevents the crashes (which makes sense). I thought binary logging was somewhat essential for Galera - the simple question is:
Can binary logging be disabled and if so what does that mean for the cluster?
Thanks
|
|
see https://www.percona.com/blog/2015/07/16/bypassing-sst-pxc-binary-logs/
not needed but risky...
|
|
Thanks. It sounds to me like binary logs are only required for point-in-time backups and not needed for Galera to operate nor for the xtrabackup SST method.
Assuming one has backups of individual DBs, is there still any risk involved in disabling bin log? What I don't want to happen is SST to be triggered just because one of the Galera members restarts.
|
|
I'm one of the people who disabled binary logs in their mariadb 10.x cluster to workaround this bug.
It's worked fine for me, I use xtrabackup to take incremental backups on another server. I've done restore tests of these backups and I'm satisfied.
|
|
Thanks for the info. I've now disabled binary logging and will monitor. Of course that's a workaround which happens to be satisfactory for some but this bug really needs fixed (over a year old!).
|
|
We also had this bug twice in production for a 10.1.22 server (no replication/cluster). The mysqld service did not restart automatically. The second crash caused a table to become corrupted.
170626 10:22:06 [ERROR] mysqld got signal 11 ;
|
This could be because you hit a bug. It is also possible that this binary
|
or one of the libraries it was linked against is corrupt, improperly built,
|
or misconfigured. This error can also be caused by malfunctioning hardware.
|
|
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
|
We will try our best to scrape up some info that will hopefully help
|
diagnose the problem, but since we have already crashed,
|
something is definitely wrong and this may fail.
|
|
Server version: 10.1.22-MariaDB
|
key_buffer_size=134217728
|
read_buffer_size=131072
|
max_used_connections=227
|
max_threads=1502
|
thread_count=133
|
It is possible that mysqld could use up to
|
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3430253 K bytes of memory
|
Hope that's ok; if not, decrease some variables in the equation.
|
|
Thread pointer: 0x0
|
Attempting backtrace. You can use the following information to find out
|
where mysqld died. If you see no messages after this, something went
|
terribly wrong...
|
stack_bottom = 0x0 thread_stack 0x40000
|
*** buffer overflow detected ***: /usr/sbin/mysqld terminated
|
======= Backtrace: =========
|
/lib64/libc.so.6(__fortify_fail+0x37)[0x7fee0c5e4047]
|
/lib64/libc.so.6(+0x10d200)[0x7fee0c5e2200]
|
/lib64/libc.so.6(+0x10efb7)[0x7fee0c5e3fb7]
|
/usr/sbin/mysqld(my_addr_resolve+0x48)[0x7fee0f08ac28]
|
/usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x7fee0f0772a2]
|
/usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7fee0eb9a9d5]
|
/lib64/libpthread.so.0(+0xf370)[0x7fee0e1b5370]
|
/usr/sbin/mysqld(+0x88b37c)[0x7fee0ee6f37c]
|
/usr/sbin/mysqld(+0x8911c3)[0x7fee0ee751c3]
|
/usr/sbin/mysqld(+0x8ebbe8)[0x7fee0eecfbe8]
|
/usr/sbin/mysqld(+0x817d70)[0x7fee0edfbd70]
|
/lib64/libpthread.so.0(+0x7dc5)[0x7fee0e1addc5]
|
/lib64/libc.so.6(clone+0x6d)[0x7fee0c5cc73d]
|
======= Memory map: ========
|
Do you have any idea when this issue will be fixed?
|
|
Hi stijn
Unfortunately we are not able to produce bug from our , If it is possible can you upload debug core dump
|
|
We are also now having this issue on Ubuntu 16.04, package version 10.1.25+maria-1~xenial from the MariaDB repositories, on a 3 node cluster.
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: 170713 8:16:35 [ERROR] mysqld got signal 11 ;
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: This could be because you hit a bug. It is also possible that this binary
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: or one of the libraries it was linked against is corrupt, improperly built,
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: or misconfigured. This error can also be caused by malfunctioning hardware.
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: We will try our best to scrape up some info that will hopefully help
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: diagnose the problem, but since we have already crashed,
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: something is definitely wrong and this may fail.
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Server version: 10.1.25-MariaDB-1~xenial
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: key_buffer_size=268435456
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: read_buffer_size=2097152
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: max_used_connections=23
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: max_threads=258
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: thread_count=22
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: It is possible that mysqld could use up to
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1852580 K bytes of memory
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Hope that's ok; if not, decrease some variables in the equation.
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Thread pointer: 0x7fa017c12008
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Attempting backtrace. You can use the following information to find out
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: where mysqld died. If you see no messages after this, something went
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: terribly wrong...
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: stack_bottom = 0x7fa3052bf0d8 thread_stack 0x48400
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55bb69dc6bde]
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x55bb69912415]
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fa303a30390]
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x55bb699c779b]
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(binlog_background_thread+0x2f3)[0x55bb699c7dd3]
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fa303a266ba]
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa3030d13dd]
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Trying to get some variables.
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Some pointers may be invalid and cause the dump to abort.
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Query (0x0):
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Connection ID (thread ID): 3
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Status: NOT_KILLED
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: information that should help you find out what is causing the crash.
|
Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Main process exited, code=dumped, status=11/SEGV
|
Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Unit entered failed state.
|
Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Failed with result 'core-dump'.
|
Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.
|
Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: Stopped MariaDB database server.
|
Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: Starting MariaDB database server...
|
Jul 13 08:16:41 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:41 140694126172416 [Note] /usr/sbin/mysqld (mysqld 10.1.25-MariaDB-1~
|
xenial) starting as process 38882 ...
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Segmentation fault (core dumped)
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: WSREP: Failed to start mysqld for wsrep recovery: '2017-07-13 8:16:42 140694126172416
|
[Note] InnoDB: Using mutexes to ref count buffer pool pages
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: The InnoDB memory heap is disabled
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Mutexes and rw_locks use GCC atomic
|
builtins
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: GCC builtin __atomic_thread_fence()
|
is used for memory barrier
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Compressed tables use zlib 1.2.8
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Using Linux native AIO
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Using SSE crc32 instructions
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Initializing buffer pool, size = 10
|
.0G
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Completed initialization of buffer
|
pool
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Highest supported file format is Ba
|
rracuda.
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Starting crash recovery from checkp
|
oint LSN=12274681738
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:49 140694126172416 [Note] InnoDB: Restoring possible half-written dat
|
a pages from the doublewrite buffer...
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Transaction 3771194 was in the XA prepared state.
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: 1 transaction(s) which must be rolled back or cleaned up
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: in total 0 row operations to undo
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Trx id counter is 3771648
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:51 140694126172416 [Note] InnoDB: Starting final batch to recover 17
|
pages from redo log
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Last MySQL binlog file position 0 104878519, file name /var/log/mysql/mariadb-
|
bin.000201
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] InnoDB: 128 rollback segment(s) are active.
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140681746048768 [Note] InnoDB: Starting in background the rollback
|
of recovered transactions
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140681746048768 [Note] InnoDB: Rollback of non-prepared transactio
|
ns completed
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] InnoDB: Waiting for purge to start
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] InnoDB: Percona XtraDB (http://www.percona
|
.com) 5.6.36-82.0 started; log sequence number 12274782858
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Warning] InnoDB: Skipping buffer pool dump/restor
|
e during wsrep recovery.
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] Plugin 'FEEDBACK' is disabled.
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] Recovering after a crash using /var/log/mys
|
ql/mariadb-bin
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] Starting crash recovery...
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900 InnoDB: Starting recovery for XA transactions...
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900 InnoDB: Transaction 3771194 in prepared state after recovery
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900 InnoDB: Transaction contains changes to 1 rows
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900 InnoDB: 1 transactions in prepared state after recovery
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] Found 1 prepared transaction(s) in InnoDB
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 170713 8:16:55 [ERROR] mysqld got signal 11 ;
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: This could be because you hit a bug. It is also possible that this binary
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: or one of the libraries it was linked against is corrupt, improperly built,
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: or misconfigured. This error can also be caused by malfunctioning hardware.
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: We will try our best to scrape up some info that will hopefully help
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: diagnose the problem, but since we have already crashed,
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: something is definitely wrong and this may fail.
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Server version: 10.1.25-MariaDB-1~xenial
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: key_buffer_size=268435456
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: read_buffer_size=2097152
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: max_used_connections=0
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: max_threads=258
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: thread_count=0
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: It is possible that mysqld could use up to
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1852580 K bytes of memory
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Hope that's ok; if not, decrease some variables in the equation.
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Thread pointer: 0x0
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Attempting backtrace. You can use the following information to find out
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: where mysqld died. If you see no messages after this, something went
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: terribly wrong...
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: stack_bottom = 0x0 thread_stack 0x48400
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5604d1131bde]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x5604d0c7d415]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ff5e5d3c390]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(wsrep_on+0x18)[0x5604d0c17988]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x805792)[0x5604d0eea792]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x805dc1)[0x5604d0eeadc1]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x599bff)[0x5604d0c7ebff]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0x2cf)[0x5604d0afcdff]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z10ha_recoverP7st_hash+0xc6)[0x5604d0c80b06]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG7recoverEP11st_log_infoPKcP11st_io_cacheP28Format_description_log_eventb+0x58f)[0x5604d0d3359f]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG18do_binlog_recoveryEPKcb+0x372)[0x5604d0d33ba2]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG4openEPKc+0x30)[0x5604d0d36da0]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x376eb5)[0x5604d0a5beb5]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1af9)[0x5604d0a5f149]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff5e52f6830]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_start+0x29)[0x5604d0a532a9]
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
|
Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: information that should help you find out what is causing the crash.'
|
Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Control process exited, code=exited status=1
|
Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: Failed to start MariaDB database server.
|
Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Unit entered failed state.
|
Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Failed with result 'exit-code'.
|
|
|
Hello, Tim.
Technically speaking your report must be different from the original one that mentions
binlog_background_thread.
But at the same yours should be easier to fix. It would be great to have a copy of your server env
to explore what causes this crash at server recovery. It feels this crash might be repeatable, that is it would
show up again should you restart the server upon the first crash.
Cheers,
Andrei
|
|
See my comment here: https://jira.mariadb.org/browse/MDEV-11053?focusedCommentId=97602&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-97602
This is affecting production badly as we cannot alter a table with pt-online-schema-change without having it randomly crash on us.
|
|
Happened again on two nodes after trying to alter a 10GB table:
Node1 roughly 5 minutes after starting pt-osc:
171014 21:44:55 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.1.28-MariaDB-1~xenial
key_buffer_size=16777216
read_buffer_size=524288
max_used_connections=2049
max_threads=4097
thread_count=314
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18937620 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f7aefcd9008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fc8bde3fff8 thread_stack 0x48400
2017-10-14 21:44:57 140500174632704 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.57534S), skipping check
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5622aabd254e]
/usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x5622aa71b005]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fc9c649c390]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x92)[0x5622aa7d1332]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x5622aa7d13c1]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG23trx_group_commit_leaderEPNS_18group_commit_entryE+0x4ec)[0x5622aa7d421c]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG34write_transaction_to_binlog_eventsEPNS_18group_commit_entryE+0x133)[0x5622aa7d4593]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG27write_transaction_to_binlogEP3THDP17binlog_cache_mngrP9Log_eventbbb+0xd0)[0x5622aa7d4840]
/usr/sbin/mysqld(+0x659a06)[0x5622aa7d4a06]
/usr/sbin/mysqld(+0x65a582)[0x5622aa7d5582]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13log_and_orderEP3THDybbb+0x44)[0x5622aa7d56e4]
/usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x7ab)[0x5622aa71ebfb]
/usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0x57)[0x5622aa66eb37]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0xa9c)[0x5622aa58526c]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x311)[0x5622aa58dce1]
/usr/sbin/mysqld(+0x413635)[0x5622aa58e635]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1ea5)[0x5622aa590d35]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x139)[0x5622aa591c09]
/usr/sbin/mysqld(_Z26threadpool_process_requestP3THD+0xff)[0x5622aa6a0c9f]
/usr/sbin/mysqld(+0x577805)[0x5622aa6f2805]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fc9c64926ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc9c5b3d3dd]
Node1 again shortly after recovering, deleting old triggers and restarting pt-osc alter:
171014 21:59:46 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.1.28-MariaDB-1~xenial
key_buffer_size=16777216
read_buffer_size=524288
max_used_connections=938
max_threads=4097
thread_count=315
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18937620 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f314dc12008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f66911c10d8 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55f6a68af54e]
/usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x55f6a63f8005]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f66941ef390]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x55f6a64ae08b]
/usr/sbin/mysqld(binlog_background_thread+0x35b)[0x55f6a64ae72b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f66941e56ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f66938903dd]
Node2 with no alter running but merely during SST to Node1 since Node1 wouldn't recover after 2nd crash:
WSREP_SST: [INFO] Rate-limiting SST to 60M (20171014 22:11:31.060)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20171014 22:11:31.062)
2017-10-14 22:11:32 140016479106816 [Note] WSREP: (06100a81, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 10.0.0.16 4444 (20171014 22:11:41.066)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | snzip | pv -q -L $rlimit | socat -u stdio TCP:10.0.0.16:4444; RC=( ${PIPESTATUS[@]} ) (20171014 22:11:41.068)
171014 22:25:06 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
To report this bug, see https://mariadb.com/kb/en/reporting-bugs
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Server version: 10.1.28-MariaDB-1~xenial
key_buffer_size=16777216
read_buffer_size=524288
max_used_connections=1501
max_threads=4097
thread_count=375
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18926318 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x7f400c812008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f5924bbd0d8 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55e5180db54e]
/usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x55e517c24005]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5927c04390]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x55e517cda08b]
/usr/sbin/mysqld(binlog_background_thread+0x35b)[0x55e517cda72b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f5927bfa6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f59272a53dd]
|
|
Good news. One cause of the crashes is identified and the patch must be out in couple of days.
|
|
The patch is being tried by BB now:
git push https://github.com/MariaDB/server bb-10.1-mdev9510-andrei:bb-10.1-mdev9510-andrei
|
|
That is great news, I've went over the patch and indeed it makes sense. I am curious though as I don't believe any of our applications actually call any session-only disabling of sql_log_bin directly, would this be something that galera makes use of internally to isolate queries to specific nodes? Or could this also be triggered by another condition? I know much of the crashes have been triggered by pt-online-schema-change but I've went over that also and don't see where sql_log_bin=0 would be getting called. Thank you!
|
|
I've also determined the "random" timeframe before the crash gets triggered after the alter or SST begins certainly concides with when the binary logs get rotated after reaching their 1GB set size, which can be anywhere from 3-5 to 10-15 minutes depending on write load...
|
|
Had 2 crashes today and have decided to run with the patch, so far its been running fine without incident, will keep you updated.
|
|
Brendan, hello.
No news from you for these days I read as no bad ones .
Anyway I am submitting the patch for review very shortly.
Thanks for helping!
Andrei
|
|
Thank you, indeed no news = good. This is promising so far.
I've done a few table alters over the weekend, restarted some nodes due to some innodb tunings, haven't had any crashes since the patch, I will however continue to run this build for now to make sure all is good. Thanks again!
|
|
The patch is polished a bit and a regression test is made to reused an existing one.
Here is the pull request:
https://github.com/MariaDB/server/pull/473
I hope sachin.setiya.007 will review it promptly once back to the office.
|
|
Hello, Sachin.
I see you're back to work on Nov 2nd, Thur. I wonder if you could review mdev-9510 patch on that 44th week (and then it would be practically Thur and Fri for that). The patch is for 10.1, it's rather small.
Cheers,
Andrei
|
|
Okay to push, Sorry for delay.
Regards
sachin
|
|
Changing the post-review status accordingly.
|
|
Pushed c7e38076f33.
|
|
Thanks everyone, I am still running the same build with the original patch, not a single crash since. 100% fixed for sure.
|