[MDEV-9510] Segmentation fault in binlog thread causes crash - Jira

Geoff Montee (Inactive) created issue - 2016-02-03 00:50

Geoff Montee (Inactive) made changes - 2016-02-03 00:57

Field	Original Value	New Value
Description	A user saw this crash in MariaDB 10.1.0 on Ubuntu: {noformat} 160201 21:22:07 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see http://kb.askmonty.org/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.10-MariaDB-1~trusty-log key_buffer_size=134217728 read_buffer_size=2097152 max_used_connections=30 max_threads=102 thread_count=15 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 759823 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0x7f1818883008 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f182b2dce10 thread_stack 0x48000 /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f182acf3a6e] /usr/sbin/mysqld(handle_fatal_signal+0x38d)[0x7f182a82216d] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f1828d7c340] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0xc7)[0x7f182a8e2cb7] /usr/sbin/mysqld(binlog_background_thread+0x2b5)[0x7f182a8e3275] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f1828d74182] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f182849747d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x0): is an invalid pointer Connection ID (thread ID): 3 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. {noformat} I will try to get more information.	A user saw this crash in MariaDB 10.1.10 on Ubuntu: {noformat} 160201 21:22:07 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see http://kb.askmonty.org/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.10-MariaDB-1~trusty-log key_buffer_size=134217728 read_buffer_size=2097152 max_used_connections=30 max_threads=102 thread_count=15 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 759823 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0x7f1818883008 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f182b2dce10 thread_stack 0x48000 /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f182acf3a6e] /usr/sbin/mysqld(handle_fatal_signal+0x38d)[0x7f182a82216d] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f1828d7c340] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0xc7)[0x7f182a8e2cb7] /usr/sbin/mysqld(binlog_background_thread+0x2b5)[0x7f182a8e3275] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f1828d74182] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f182849747d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x0): is an invalid pointer Connection ID (thread ID): 3 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash. {noformat} I will try to get more information.

Elena Stepanova made changes - 2016-02-03 02:35

Labels

binlog replication

binlog need_feedback replication

Geoff Montee (Inactive) added a comment - 2016-02-03 19:52

The user also saw this crash that may be related:

160203  7:48:06 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see http://kb.askmonty.org/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.1.10-MariaDB-1~trusty-log

key_buffer_size=134217728

read_buffer_size=2097152

max_used_connections=29

max_threads=102

thread_count=8

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 759823 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0x7f2f030dd008

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x7f2f3bd98df0 thread_stack 0x48000

/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f2f49924a6e]

/usr/sbin/mysqld(handle_fatal_signal+0x38d)[0x7f2f4945316d]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f2f479ad340]

/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7f2f49513edd]

/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7f2f49513f71]

/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7f2f495164c2]

/usr/sbin/mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x12c)[0x7f2f493bc7dc]

/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1327)[0x7f2f492d3e57]

/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x20d)[0x7f2f492dc1cd]

/usr/sbin/mysqld(+0x4369d9)[0x7f2f492dc9d9]

/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1e1b)[0x7f2f492deecb]

/usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f2f492dfca9]

/usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f2f493a292a]

/usr/sbin/mysqld(handle_one_connection+0x40)[0x7f2f493a2b00]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f2f479a5182]

/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f2f470c847d]

Trying to get some variables.

Some pointers may be invalid and cause the dump to abort.

Query (0x7f2f04938020): is an invalid pointer

Connection ID (thread ID): 1283552

Status: NOT_KILLED

Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on

The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

information that should help you find out what is causing the crash.

Geoff Montee (Inactive) added a comment - 2016-02-03 19:52 The user also saw this crash that may be related: 160203 7:48:06 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see http://kb.askmonty.org/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.10-MariaDB-1~trusty-log key_buffer_size=134217728 read_buffer_size=2097152 max_used_connections=29 max_threads=102 thread_count=8 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 759823 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0x7f2f030dd008 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f2f3bd98df0 thread_stack 0x48000 /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f2f49924a6e] /usr/sbin/mysqld(handle_fatal_signal+0x38d)[0x7f2f4945316d] /lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7f2f479ad340] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7f2f49513edd] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7f2f49513f71] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7f2f495164c2] /usr/sbin/mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x12c)[0x7f2f493bc7dc] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1327)[0x7f2f492d3e57] /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x20d)[0x7f2f492dc1cd] /usr/sbin/mysqld(+0x4369d9)[0x7f2f492dc9d9] /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1e1b)[0x7f2f492deecb] /usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f2f492dfca9] /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f2f493a292a] /usr/sbin/mysqld(handle_one_connection+0x40)[0x7f2f493a2b00] /lib/x86_64-linux-gnu/libpthread.so.0(+0x8182)[0x7f2f479a5182] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f2f470c847d] Trying to get some variables. Some pointers may be invalid and cause the dump to abort. Query (0x7f2f04938020): is an invalid pointer Connection ID (thread ID): 1283552 Status: NOT_KILLED Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains information that should help you find out what is causing the crash.

Elena Stepanova made changes - 2016-02-03 20:29

Labels

binlog need_feedback replication

binlog replication

Elena Stepanova added a comment - 2016-02-03 20:29 - edited

support issue

Elena Stepanova added a comment - 2016-02-03 20:29 - edited support issue

Rasmus Johansson (Inactive) made changes - 2016-02-04 11:15

Assignee

Kristian Nielsen [ knielsen ]

Lixun Peng added a comment - 2016-05-11 11:32

Hi!

If these two crash are related, I think the problem maybe is that binlog_xid_count_list contains NULL or illegal entry point.
I checked MYSQL_BIN_LOG::do_checkpoint_request(), I think only "entry->binlog_id" can trigger this crash, and in the TC_LOG_BINLOG::mark_xid_done(), it also accessed "b->binlog_id".
So I think it maybe has an illegal point of xid_count_per_binlog in binlog_xid_count_list.
But we have no core file, I can't verify it.

Lixun Peng added a comment - 2016-05-11 11:32 Hi! If these two crash are related, I think the problem maybe is that binlog_xid_count_list contains NULL or illegal entry point. I checked MYSQL_BIN_LOG::do_checkpoint_request(), I think only "entry->binlog_id" can trigger this crash, and in the TC_LOG_BINLOG::mark_xid_done(), it also accessed "b->binlog_id". So I think it maybe has an illegal point of xid_count_per_binlog in binlog_xid_count_list. But we have no core file, I can't verify it.

Elena Stepanova made changes - 2016-06-09 12:31

Component/s

Galera [ 10124 ]

Elena Stepanova added a comment - 2016-06-09 12:33

The crash was actually happening on a Galera cluster node; and we got another report with the stack trace identical to above, ~~MDEV-10196~~, which also affects a Galera cluster. So, the suspicion is Galera might be an important factor here.

Reassigning to nirbhay_c to take a look.

Elena Stepanova added a comment - 2016-06-09 12:33 The crash was actually happening on a Galera cluster node; and we got another report with the stack trace identical to above, MDEV-10196 , which also affects a Galera cluster. So, the suspicion is Galera might be an important factor here. Reassigning to nirbhay_c to take a look.

Elena Stepanova made changes - 2016-06-09 12:33

Link

This issue relates to ~~MDEV-10196~~ [ ~~MDEV-10196~~ ]

Elena Stepanova made changes - 2016-06-09 12:33

Fix Version/s		10.1 [ 16100 ]
Assignee	Kristian Nielsen [ knielsen ]	Nirbhay Choubey [ nirbhay_c ]

Elena Stepanova made changes - 2016-06-09 12:34

Link

This issue relates to ~~MDEV-10196~~ [ ~~MDEV-10196~~ ]

Elena Stepanova made changes - 2016-06-23 23:03

Link

This issue relates to ~~MDEV-10276~~ [ ~~MDEV-10276~~ ]

Elena Stepanova added a comment - 2016-06-23 23:03

Yet another one, ~~MDEV-10276~~.

Elena Stepanova added a comment - 2016-06-23 23:03 Yet another one, MDEV-10276 .

Stefan Midjich made changes - 2016-06-29 08:00

Link

This issue relates to ~~MDEV-10301~~ [ ~~MDEV-10301~~ ]

Elena Stepanova added a comment - 2016-06-29 23:17

And another one, ~~MDEV-10301~~

Elena Stepanova added a comment - 2016-06-29 23:17 And another one, MDEV-10301

YuanPeng added a comment - 2016-07-29 02:25

This is a very serious issue!!! Last month, we updated galera from 10.0.x to 10.1.14 in our production environment, but already had crashed 3 times so far, which we never met on 10.0.x version. All crash is inexplicable with signal 11 error. There are several issues related, which looks the same. We want to wait for fix version, but why this serious issue is still "open and unresolved" after this long ?

YuanPeng added a comment - 2016-07-29 02:25 This is a very serious issue!!! Last month, we updated galera from 10.0.x to 10.1.14 in our production environment, but already had crashed 3 times so far, which we never met on 10.0.x version. All crash is inexplicable with signal 11 error. There are several issues related, which looks the same. We want to wait for fix version, but why this serious issue is still "open and unresolved" after this long ?

Nirbhay Choubey (Inactive) added a comment - 2016-07-29 13:40

ypnuaa037 That's because we haven't been able to reproduce this issue on our end.
Will it be possible for you to run a debug node and report a better stacktrace in case
the crash happens again?

Nirbhay Choubey (Inactive) added a comment - 2016-07-29 13:40 ypnuaa037 That's because we haven't been able to reproduce this issue on our end. Will it be possible for you to run a debug node and report a better stacktrace in case the crash happens again?

Nirbhay Choubey (Inactive) made changes - 2016-08-19 00:07

Link

This issue relates to ~~MDEV-10501~~ [ ~~MDEV-10501~~ ]

Nirbhay Choubey (Inactive) made changes - 2016-08-19 00:08

Link

This issue relates to ~~MDEV-9901~~ [ ~~MDEV-9901~~ ]

Nirbhay Choubey (Inactive) made changes - 2016-08-19 00:08

Link

This issue relates to ~~MDEV-9498~~ [ ~~MDEV-9498~~ ]

Nirbhay Choubey (Inactive) made changes - 2016-08-19 02:02

Link

This issue relates to ~~MDEV-10196~~ [ ~~MDEV-10196~~ ]

Nirbhay Choubey (Inactive) made changes - 2016-09-01 23:39

Link

This issue relates to ~~MDEV-10656~~ [ ~~MDEV-10656~~ ]

Nirbhay Choubey (Inactive) made changes - 2016-09-01 23:40

Link

This issue relates to ~~MDEV-10645~~ [ ~~MDEV-10645~~ ]

Nirbhay Choubey (Inactive) made changes - 2016-09-01 23:41

Link

This issue relates to ~~MDEV-10601~~ [ ~~MDEV-10601~~ ]

Nirbhay Choubey (Inactive) made changes - 2016-10-25 15:25

Link

This issue relates to ~~MDEV-11053~~ [ ~~MDEV-11053~~ ]

Rasmus Johansson (Inactive) made changes - 2016-10-26 19:51

Sprint

10.1.19 [ 109 ]

Florian Engelmann added a comment - 2016-12-07 14:41

Ubuntu 14.04.4
libc6:amd64 2.19-0ubuntu6.9
mariadb-server-10.1 10.1.14+maria-1~trusty
mariadb-server-core-10.1 10.1.14+maria-1~trusty

5 Node mariadb galera cluster - crashes sometimes while logrotation is done

syslog:

Dec 7 02:16:55 galera04 mysqld: 161207 2:16:55 [ERROR] mysqld got signal 11 ;
Dec 7 02:16:55 galera04 mysqld: This could be because you hit a bug. It is also possible that this binary
Dec 7 02:16:55 galera04 mysqld: or one of the libraries it was linked against is corrupt, improperly built,
Dec 7 02:16:55 galera04 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.
Dec 7 02:16:55 galera04 mysqld:
Dec 7 02:16:55 galera04 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs
Dec 7 02:16:55 galera04 mysqld:
Dec 7 02:16:55 galera04 mysqld: We will try our best to scrape up some info that will hopefully help
Dec 7 02:16:55 galera04 mysqld: diagnose the problem, but since we have already crashed,
Dec 7 02:16:55 galera04 mysqld: something is definitely wrong and this may fail.
Dec 7 02:16:55 galera04 mysqld:
Dec 7 02:16:55 galera04 mysqld: Server version: 10.1.14-MariaDB-1~trusty
Dec 7 02:16:55 galera04 mysqld: key_buffer_size=134217728
Dec 7 02:16:55 galera04 mysqld: read_buffer_size=2097152
Dec 7 02:16:55 galera04 mysqld: max_used_connections=14
Dec 7 02:16:55 galera04 mysqld: max_threads=262
Dec 7 02:16:55 galera04 mysqld: thread_count=19
Dec 7 02:16:55 galera04 mysqld: It is possible that mysqld could use up to
Dec 7 02:16:55 galera04 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2819298 K bytes of memory
Dec 7 02:16:55 galera04 mysqld: Hope that's ok; if not, decrease some variables in the equation.

caused by logrotation:
/var/log/mysql.log /var/log/mysql/mysql.log /var/log/mysql/mariadb-slow.log

{ daily rotate 7 missingok create 640 mysql adm compress sharedscripts postrotate test -x /usr/bin/mysqladmin || exit 0 if [ -f `my_print_defaults --mysqld | grep -oP "pid-file=\K[^$]+"` ]; then # If this fails, check debian.conf! mysqladmin --defaults-file=/etc/mysql/debian.cnf flush-logs fi endscript }

Florian Engelmann added a comment - 2016-12-07 14:41 Ubuntu 14.04.4 libc6:amd64 2.19-0ubuntu6.9 mariadb-server-10.1 10.1.14+maria-1~trusty mariadb-server-core-10.1 10.1.14+maria-1~trusty 5 Node mariadb galera cluster - crashes sometimes while logrotation is done syslog: Dec 7 02:16:55 galera04 mysqld: 161207 2:16:55 [ERROR] mysqld got signal 11 ; Dec 7 02:16:55 galera04 mysqld: This could be because you hit a bug. It is also possible that this binary Dec 7 02:16:55 galera04 mysqld: or one of the libraries it was linked against is corrupt, improperly built, Dec 7 02:16:55 galera04 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware. Dec 7 02:16:55 galera04 mysqld: Dec 7 02:16:55 galera04 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs Dec 7 02:16:55 galera04 mysqld: Dec 7 02:16:55 galera04 mysqld: We will try our best to scrape up some info that will hopefully help Dec 7 02:16:55 galera04 mysqld: diagnose the problem, but since we have already crashed, Dec 7 02:16:55 galera04 mysqld: something is definitely wrong and this may fail. Dec 7 02:16:55 galera04 mysqld: Dec 7 02:16:55 galera04 mysqld: Server version: 10.1.14-MariaDB-1~trusty Dec 7 02:16:55 galera04 mysqld: key_buffer_size=134217728 Dec 7 02:16:55 galera04 mysqld: read_buffer_size=2097152 Dec 7 02:16:55 galera04 mysqld: max_used_connections=14 Dec 7 02:16:55 galera04 mysqld: max_threads=262 Dec 7 02:16:55 galera04 mysqld: thread_count=19 Dec 7 02:16:55 galera04 mysqld: It is possible that mysqld could use up to Dec 7 02:16:55 galera04 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2819298 K bytes of memory Dec 7 02:16:55 galera04 mysqld: Hope that's ok; if not, decrease some variables in the equation. caused by logrotation: /var/log/mysql.log /var/log/mysql/mysql.log /var/log/mysql/mariadb-slow.log { daily rotate 7 missingok create 640 mysql adm compress sharedscripts postrotate test -x /usr/bin/mysqladmin || exit 0 if [ -f `my_print_defaults --mysqld | grep -oP "pid-file=\K[^$]+"` ]; then # If this fails, check debian.conf! mysqladmin --defaults-file=/etc/mysql/debian.cnf flush-logs fi endscript }

Vincent Enjalbert added a comment - 2016-12-19 11:40 - edited

Ubuntu 14.04 LTS
Mariadb 10.1.19
3 nodes
Main app running : Magento1.9.3.1

Dec 18 13:48:22 mariadb1 mysqld: 161218 13:48:22 [ERROR] mysqld got signal 11 ;

Dec 18 13:48:22 mariadb1 mysqld: This could be because you hit a bug. It is also possible that this binary

Dec 18 13:48:22 mariadb1 mysqld: or one of the libraries it was linked against is corrupt, improperly built,

Dec 18 13:48:22 mariadb1 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld: We will try our best to scrape up some info that will hopefully help

Dec 18 13:48:22 mariadb1 mysqld: diagnose the problem, but since we have already crashed,

Dec 18 13:48:22 mariadb1 mysqld: something is definitely wrong and this may fail.

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld: Server version: 10.1.19-MariaDB-1~trusty

Dec 18 13:48:22 mariadb1 mysqld: key_buffer_size=134217728

Dec 18 13:48:22 mariadb1 mysqld: read_buffer_size=2097152

Dec 18 13:48:22 mariadb1 mysqld: max_used_connections=14

Dec 18 13:48:22 mariadb1 mysqld: max_threads=302

Dec 18 13:48:22 mariadb1 mysqld: thread_count=11

Dec 18 13:48:22 mariadb1 mysqld: It is possible that mysqld could use up to

Dec 18 13:48:22 mariadb1 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1992742 K  bytes of memory

Dec 18 13:48:22 mariadb1 mysqld: Hope that's ok; if not, decrease some variables in the equation.

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld: Thread pointer: 0x0x7efd58812008

Dec 18 13:48:22 mariadb1 mysqld: Attempting backtrace. You can use the following information to find out

Dec 18 13:48:22 mariadb1 mysqld: where mysqld died. If you see no messages after this, something went

Dec 18 13:48:22 mariadb1 mysqld: terribly wrong...

Dec 18 13:48:22 mariadb1 mysqld: stack_bottom = 0x7eff93b6c210 thread_stack 0x48400

Dec 18 13:48:22 mariadb1 mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7eff946dd67e]

Dec 18 13:48:22 mariadb1 mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x7eff94200ba5]

Dec 18 13:48:22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7eff92755330]

Dec 18 13:48:22 mariadb1 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0xe8)[0x7eff942c2e88]

Dec 18 13:48:22 mariadb1 mysqld: /usr/sbin/mysqld(binlog_background_thread+0x2c5)[0x7eff942c3525]

Dec 18 13:48:22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7eff9274d184]

Dec 18 13:48:22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7eff91c6c37d]

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld: Trying to get some variables.

Dec 18 13:48:22 mariadb1 mysqld: Some pointers may be invalid and cause the dump to abort.

Dec 18 13:48:22 mariadb1 mysqld: Query (0x0): is an invalid pointer

Dec 18 13:48:22 mariadb1 mysqld: Connection ID (thread ID): 3

Dec 18 13:48:22 mariadb1 mysqld: Status: NOT_KILLED

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

Dec 18 13:48:22 mariadb1 mysqld: information that should help you find out what is causing the crash.

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld: We think the query pointer is invalid, but we will try to print it anyway.

Dec 18 13:48:22 mariadb1 mysqld: Query:

Dec 18 13:48:22 mariadb1 mysqld:

Dec 18 13:48:22 mariadb1 mysqld_safe: Number of processes running now: 0

Dec 18 13:48:22 mariadb1 mysqld_safe: WSREP: not restarting wsrep node automatically

Dec 18 13:48:22 mariadb1 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended

I don't know how to find the cause, it seems quite random. How can I help you fix this ?

Vincent Enjalbert added a comment - 2016-12-19 11:40 - edited Ubuntu 14.04 LTS Mariadb 10.1.19 3 nodes Main app running : Magento1.9.3.1 Dec 18 13 : 48 : 22 mariadb1 mysqld: 161218 13 : 48 : 22 [ERROR] mysqld got signal 11 ; Dec 18 13 : 48 : 22 mariadb1 mysqld: This could be because you hit a bug. It is also possible that this binary Dec 18 13 : 48 : 22 mariadb1 mysqld: or one of the libraries it was linked against is corrupt, improperly built, Dec 18 13 : 48 : 22 mariadb1 mysqld: or misconfigured. This error can also be caused by malfunctioning hardware. Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld: To report this bug, see https: //mariadb.com/kb/en/reporting-bugs Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld: We will try our best to scrape up some info that will hopefully help Dec 18 13 : 48 : 22 mariadb1 mysqld: diagnose the problem, but since we have already crashed, Dec 18 13 : 48 : 22 mariadb1 mysqld: something is definitely wrong and this may fail. Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld: Server version: 10.1 . 19 -MariaDB- 1 ~trusty Dec 18 13 : 48 : 22 mariadb1 mysqld: key_buffer_size= 134217728 Dec 18 13 : 48 : 22 mariadb1 mysqld: read_buffer_size= 2097152 Dec 18 13 : 48 : 22 mariadb1 mysqld: max_used_connections= 14 Dec 18 13 : 48 : 22 mariadb1 mysqld: max_threads= 302 Dec 18 13 : 48 : 22 mariadb1 mysqld: thread_count= 11 Dec 18 13 : 48 : 22 mariadb1 mysqld: It is possible that mysqld could use up to Dec 18 13 : 48 : 22 mariadb1 mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1992742 K bytes of memory Dec 18 13 : 48 : 22 mariadb1 mysqld: Hope that's ok; if not, decrease some variables in the equation. Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld: Thread pointer: 0x0x7efd58812008 Dec 18 13 : 48 : 22 mariadb1 mysqld: Attempting backtrace. You can use the following information to find out Dec 18 13 : 48 : 22 mariadb1 mysqld: where mysqld died. If you see no messages after this , something went Dec 18 13 : 48 : 22 mariadb1 mysqld: terribly wrong... Dec 18 13 : 48 : 22 mariadb1 mysqld: stack_bottom = 0x7eff93b6c210 thread_stack 0x48400 Dec 18 13 : 48 : 22 mariadb1 mysqld: /usr/sbin/mysqld(my_print_stacktrace+ 0x2e )[ 0x7eff946dd67e ] Dec 18 13 : 48 : 22 mariadb1 mysqld: /usr/sbin/mysqld(handle_fatal_signal+ 0x2d5 )[ 0x7eff94200ba5 ] Dec 18 13 : 48 : 22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libpthread.so. 0 (+ 0x10330 )[ 0x7eff92755330 ] Dec 18 13 : 48 : 22 mariadb1 mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+ 0xe8 )[ 0x7eff942c2e88 ] Dec 18 13 : 48 : 22 mariadb1 mysqld: /usr/sbin/mysqld(binlog_background_thread+ 0x2c5 )[ 0x7eff942c3525 ] Dec 18 13 : 48 : 22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libpthread.so. 0 (+ 0x8184 )[ 0x7eff9274d184 ] Dec 18 13 : 48 : 22 mariadb1 mysqld: /lib/x86_64-linux-gnu/libc.so. 6 (clone+ 0x6d )[ 0x7eff91c6c37d ] Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld: Trying to get some variables. Dec 18 13 : 48 : 22 mariadb1 mysqld: Some pointers may be invalid and cause the dump to abort. Dec 18 13 : 48 : 22 mariadb1 mysqld: Query ( 0x0 ): is an invalid pointer Dec 18 13 : 48 : 22 mariadb1 mysqld: Connection ID (thread ID): 3 Dec 18 13 : 48 : 22 mariadb1 mysqld: Status: NOT_KILLED Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld: Optimizer switch : index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld: The manual page at http: //dev.mysql.com/doc/mysql/en/crashing.html contains Dec 18 13 : 48 : 22 mariadb1 mysqld: information that should help you find out what is causing the crash. Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld: We think the query pointer is invalid, but we will try to print it anyway. Dec 18 13 : 48 : 22 mariadb1 mysqld: Query: Dec 18 13 : 48 : 22 mariadb1 mysqld: Dec 18 13 : 48 : 22 mariadb1 mysqld_safe: Number of processes running now: 0 Dec 18 13 : 48 : 22 mariadb1 mysqld_safe: WSREP: not restarting wsrep node automatically Dec 18 13 : 48 : 22 mariadb1 mysqld_safe: mysqld from pid file /var/run/mysqld/mysqld.pid ended I don't know how to find the cause, it seems quite random. How can I help you fix this ?

Florian Engelmann added a comment - 2017-01-25 08:44

I did upload a core dump to MDEV-10809 which seems to be related to MDEV-9510

Has anybody solved this issue?

Florian Engelmann added a comment - 2017-01-25 08:44 I did upload a core dump to MDEV-10809 which seems to be related to MDEV-9510 Has anybody solved this issue?

Florian Engelmann added a comment - 2017-01-25 08:46

Affected versions are at least (tested by us) 10.1.10, 10.1.14, 10.1.19 and 10.1.20.

Florian Engelmann added a comment - 2017-01-25 08:46 Affected versions are at least (tested by us) 10.1.10, 10.1.14, 10.1.19 and 10.1.20.

Florian Engelmann added a comment - 2017-01-26 08:20

@eVincent, GeoffMontee, plinux

Do you use HAproxy to do loadbalancing to the galera cluster? How is your healthcheck implemented?

Florian Engelmann added a comment - 2017-01-26 08:20 @ eVincent , GeoffMontee , plinux Do you use HAproxy to do loadbalancing to the galera cluster? How is your healthcheck implemented?

Benjamin MALYNOVYTCH added a comment - 2017-01-26 08:48 - edited

@engel75: I manage the infrastructure mentionned by eVincent
We use a simple "mysql-check", with a sort of static "one to one" binding between web servers and mariadb servers, like so:

web1 => mariadb1 with failover on mariadb2 & 3
web2 => mariadb2 with failover on mariadb1 & 3
web3 => mariadb3 with failover on mariadb1 & 2

Here the HAProxy snippet:

haproxy.cfg
listen mariadb_cluster
bind *:3307
balance source
mode tcp
option tcpka
option mysql-check user haproxy
server mariadb1 mariadb1:3306 check
server mariadb2 mariadb2:3306 check backup
server mariadb3 mariadb3:3306 check backup

Benjamin

PS: downgrading to MariaDB 5.5 + Galera Cluster seemed to fix the issue (no crashes since the last startup of the cluster, more than 24h ago)

Benjamin MALYNOVYTCH added a comment - 2017-01-26 08:48 - edited @ engel75 : I manage the infrastructure mentionned by eVincent We use a simple "mysql-check", with a sort of static "one to one" binding between web servers and mariadb servers, like so: web1 => mariadb1 with failover on mariadb2 & 3 web2 => mariadb2 with failover on mariadb1 & 3 web3 => mariadb3 with failover on mariadb1 & 2 Here the HAProxy snippet: haproxy.cfg listen mariadb_cluster bind *:3307 balance source mode tcp option tcpka option mysql-check user haproxy server mariadb1 mariadb1:3306 check server mariadb2 mariadb2:3306 check backup server mariadb3 mariadb3:3306 check backup Benjamin PS: downgrading to MariaDB 5.5 + Galera Cluster seemed to fix the issue (no crashes since the last startup of the cluster, more than 24h ago)

Florian Engelmann added a comment - 2017-01-26 09:24

Thank you Benjamin. I asked because after disabling haproxy health checks on one of our nodes that node did not have a stuck "flush logs" this morning. I will test this with two nodes till Monday to proof it. As mysql-check does not check if a node is "Synced" we do use a shell script initiated by xinetd to check the status:

show global status where variable_name='wsrep_local_state';

We do use 3 haproxy loadbalancers ("clusterd" via keepalived) to route traffic to the cluster. Thats a lot of checks incoming. We might use a perl daemon to handle those requests kind of "man in the middle".

Florian Engelmann added a comment - 2017-01-26 09:24 Thank you Benjamin. I asked because after disabling haproxy health checks on one of our nodes that node did not have a stuck "flush logs" this morning. I will test this with two nodes till Monday to proof it. As mysql-check does not check if a node is "Synced" we do use a shell script initiated by xinetd to check the status: show global status where variable_name='wsrep_local_state'; We do use 3 haproxy loadbalancers ("clusterd" via keepalived) to route traffic to the cluster. Thats a lot of checks incoming. We might use a perl daemon to handle those requests kind of "man in the middle".

Florian Engelmann added a comment - 2017-01-30 08:38

The problem with flush logs has nothing to do with the health checks. It seems to be load related as only nodes with enabled haproxy health checks get stuck with flush logs.

Florian Engelmann added a comment - 2017-01-30 08:38 The problem with flush logs has nothing to do with the health checks. It seems to be load related as only nodes with enabled haproxy health checks get stuck with flush logs.

Daniel Black made changes - 2017-02-01 01:11

Link

This issue relates to ~~MDEV-10809~~ [ ~~MDEV-10809~~ ]

Florian Engelmann added a comment - 2017-02-01 10:14 - edited

is anybody working on this one? We are still forced to "clean up" the production Galera Cluster every Morning.

As "service mysql stop" gets stuck as well we have to use "kill -9"

echo disable haproxy health check

sed -i -e "s/^#echo disabled/echo disabled/" -e "s/^#exit 0/exit 0/" /usr/local/bin/mysqlchk.sh

echo wait 10s

sleep 10

echo stop mysql

timeout 5s service mysql stop

echo send TERM

kill $(pidof mysqld)

echo wait 5s

sleep 5

echo send KILL

kill -9 $(pidof mysqld)

sleep 1

echo check mysql is gone

ps faux | grep mysql

sleep 2

echo start mysql

service mysql start

sleep 6s

echo check galera is synced

mysql -e "SHOW STATUS WHERE variable_name ='wsrep_local_state_comment';"

echo enable haproxy health check

sed -i -e "s/^echo disabled/#echo disabled/" -e "s/^exit 0/#exit 0/" /usr/local/bin/mysqlchk.sh

We consider using Percona XtraDB Cluster if this issue is not treated as highest priority. Could anybody please let everyone know what's going on?

Florian Engelmann added a comment - 2017-02-01 10:14 - edited is anybody working on this one? We are still forced to "clean up" the production Galera Cluster every Morning. As "service mysql stop" gets stuck as well we have to use "kill -9" echo disable haproxy health check sed -i -e "s/^#echo disabled/echo disabled/" -e "s/^#exit 0/exit 0/" /usr/local/bin/mysqlchk .sh echo wait 10s sleep 10 echo stop mysql timeout 5s service mysql stop echo send TERM kill $(pidof mysqld) echo wait 5s sleep 5 echo send KILL kill -9 $(pidof mysqld) sleep 1 echo check mysql is gone ps faux | grep mysql sleep 2 echo start mysql service mysql start sleep 6s echo check galera is synced mysql -e "SHOW STATUS WHERE variable_name ='wsrep_local_state_comment';" echo enable haproxy health check sed -i -e "s/^echo disabled/#echo disabled/" -e "s/^exit 0/#exit 0/" /usr/local/bin/mysqlchk .sh We consider using Percona XtraDB Cluster if this issue is not treated as highest priority. Could anybody please let everyone know what's going on?

Benjamin MALYNOVYTCH added a comment - 2017-02-01 11:15

As a reminder, if downgrading to MariaDB 5.5 is an option, it is stable.

Benjamin MALYNOVYTCH added a comment - 2017-02-01 11:15 As a reminder, if downgrading to MariaDB 5.5 is an option, it is stable.

Nirbhay Choubey (Inactive) added a comment - 2017-02-01 14:41

engel75 yes, I am looking into it. Unfortunately, the core dump that you submitted did
not help and so far I have not been able repeat this issue on my end.

Nirbhay Choubey (Inactive) added a comment - 2017-02-01 14:41 engel75 yes, I am looking into it. Unfortunately, the core dump that you submitted did not help and so far I have not been able repeat this issue on my end.

Florian Engelmann added a comment - 2017-02-06 09:05 - edited

Dear nirbhay_c,

I am glad you are working on this issue. Today xtrabackup crashed one node of that cluster with an error that might be related to the same issue:

Feb  6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06  9:50:23 140112419612416 [Note] WSREP: Member 3.0 (galera01-customerA-prod-back) desyncs itself from group

Feb  6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06  9:50:23 140112419612416 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 123190807)

Feb  6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06  9:50:23 140112468671232 [Note] WSREP: Provider paused at 202dc465-47c0-11e4-b99e-9b1f0ffcf741:123190807 (440)

Feb  6 09:50:23 galera01-customerA-prod mysqld: 170206  9:50:23 [ERROR] mysqld got signal 11 ;

Feb  6 09:50:23 galera01-customerA-prod mysqld: This could be because you hit a bug. It is also possible that this binary

Feb  6 09:50:23 galera01-customerA-prod mysqld: or one of the libraries it was linked against is corrupt, improperly built,

Feb  6 09:50:23 galera01-customerA-prod mysqld: or misconfigured. This error can also be caused by malfunctioning hardware.

Feb  6 09:50:23 galera01-customerA-prod mysqld:

Feb  6 09:50:23 galera01-customerA-prod mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Feb  6 09:50:23 galera01-customerA-prod mysqld:

Feb  6 09:50:23 galera01-customerA-prod mysqld: We will try our best to scrape up some info that will hopefully help

Feb  6 09:50:23 galera01-customerA-prod mysqld: diagnose the problem, but since we have already crashed,

Feb  6 09:50:23 galera01-customerA-prod mysqld: something is definitely wrong and this may fail.

Feb  6 09:50:23 galera01-customerA-prod mysqld:

Feb  6 09:50:23 galera01-customerA-prod mysqld: Server version: 10.1.20-MariaDB-1~trusty

Feb  6 09:50:23 galera01-customerA-prod mysqld: key_buffer_size=134217728

Feb  6 09:50:23 galera01-customerA-prod mysqld: read_buffer_size=2097152

Feb  6 09:50:23 galera01-customerA-prod mysqld: max_used_connections=13

Feb  6 09:50:23 galera01-customerA-prod mysqld: max_threads=262

Feb  6 09:50:23 galera01-customerA-prod mysqld: thread_count=21

Feb  6 09:50:23 galera01-customerA-prod mysqld: It is possible that mysqld could use up to

Feb  6 09:50:23 galera01-customerA-prod mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2819312 K  bytes of memory

Feb  6 09:50:23 galera01-customerA-prod mysqld: Hope that's ok; if not, decrease some variables in the equation.

Feb  6 09:50:23 galera01-customerA-prod mysqld:

Feb  6 09:50:23 galera01-customerA-prod mysqld: Thread pointer: 0x0x7f6c47ad8008

Feb  6 09:50:23 galera01-customerA-prod mysqld: Attempting backtrace. You can use the following information to find out

Feb  6 09:50:23 galera01-customerA-prod mysqld: where mysqld died. If you see no messages after this, something went

Feb  6 09:50:23 galera01-customerA-prod mysqld: terribly wrong...

Feb  6 09:50:23 galera01-customerA-prod mysqld: stack_bottom = 0x7f6e79ec71f0 thread_stack 0x48400

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f6fbf90bc2e]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7f6fbf42ea95]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f6fbd779330]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7f6fbf4f0aad]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7f6fbf4f0b41]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7f6fbf4f31b2]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x130)[0x7f6fbf396f10]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1314)[0x7f6fbf2a9cd4]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x331)[0x7f6fbf2b22a1]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(+0x439ac9)[0x7f6fbf2b2ac9]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1e2b)[0x7f6fbf2b4fcb]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f6fbf2b5dc9]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f6fbf37cefa]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(handle_one_connection+0x40)[0x7f6fbf37d0d0]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f6fbd771184]

Feb  6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6fbcc9037d]

Feb  6 09:50:23 galera01-customerA-prod mysqld:

Feb  6 09:50:23 galera01-customerA-prod mysqld: Trying to get some variables.

Feb  6 09:50:23 galera01-customerA-prod mysqld: Some pointers may be invalid and cause the dump to abort.

Feb  6 09:50:23 galera01-customerA-prod mysqld: Query (0x7f6c33021020): FLUSH BINARY LOGS

Feb  6 09:50:23 galera01-customerA-prod mysqld: Connection ID (thread ID): 284

Feb  6 09:50:23 galera01-customerA-prod mysqld: Status: NOT_KILLED

Feb  6 09:50:23 galera01-customerA-prod mysqld:

Feb  6 09:50:23 galera01-customerA-prod mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

Feb  6 09:50:23 galera01-customerA-prod mysqld:

Feb  6 09:50:23 galera01-customerA-prod mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

Feb  6 09:50:23 galera01-customerA-prod mysqld: information that should help you find out what is causing the crash.

Feb  6 09:50:23 galera01-customerA-prod mysqld: Writing a core file

Without disabling the nodes HAProxy health check I was not able to finish the backup successful.
I've got two core dumps related to the issue:

rw------ 1 mysql mysql 9.6G Feb 6 09:46 core.mysqld.49551.11.1486370801
rw------ 1 mysql mysql 9.5G Feb 6 09:50 core.mysqld.9877.11.1486371023

Should I upload both?

Florian Engelmann added a comment - 2017-02-06 09:05 - edited Dear nirbhay_c , I am glad you are working on this issue. Today xtrabackup crashed one node of that cluster with an error that might be related to the same issue: Feb 6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06 9:50:23 140112419612416 [Note] WSREP: Member 3.0 (galera01-customerA-prod-back) desyncs itself from group Feb 6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06 9:50:23 140112419612416 [Note] WSREP: Shifting SYNCED -> DONOR/DESYNCED (TO: 123190807) Feb 6 09:50:23 galera01-customerA-prod mysqld: 2017-02-06 9:50:23 140112468671232 [Note] WSREP: Provider paused at 202dc465-47c0-11e4-b99e-9b1f0ffcf741:123190807 (440) Feb 6 09:50:23 galera01-customerA-prod mysqld: 170206 9:50:23 [ERROR] mysqld got signal 11 ; Feb 6 09:50:23 galera01-customerA-prod mysqld: This could be because you hit a bug. It is also possible that this binary Feb 6 09:50:23 galera01-customerA-prod mysqld: or one of the libraries it was linked against is corrupt, improperly built, Feb 6 09:50:23 galera01-customerA-prod mysqld: or misconfigured. This error can also be caused by malfunctioning hardware. Feb 6 09:50:23 galera01-customerA-prod mysqld: Feb 6 09:50:23 galera01-customerA-prod mysqld: To report this bug, see https://mariadb.com/kb/en/reporting-bugs Feb 6 09:50:23 galera01-customerA-prod mysqld: Feb 6 09:50:23 galera01-customerA-prod mysqld: We will try our best to scrape up some info that will hopefully help Feb 6 09:50:23 galera01-customerA-prod mysqld: diagnose the problem, but since we have already crashed, Feb 6 09:50:23 galera01-customerA-prod mysqld: something is definitely wrong and this may fail. Feb 6 09:50:23 galera01-customerA-prod mysqld: Feb 6 09:50:23 galera01-customerA-prod mysqld: Server version: 10.1.20-MariaDB-1~trusty Feb 6 09:50:23 galera01-customerA-prod mysqld: key_buffer_size=134217728 Feb 6 09:50:23 galera01-customerA-prod mysqld: read_buffer_size=2097152 Feb 6 09:50:23 galera01-customerA-prod mysqld: max_used_connections=13 Feb 6 09:50:23 galera01-customerA-prod mysqld: max_threads=262 Feb 6 09:50:23 galera01-customerA-prod mysqld: thread_count=21 Feb 6 09:50:23 galera01-customerA-prod mysqld: It is possible that mysqld could use up to Feb 6 09:50:23 galera01-customerA-prod mysqld: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 2819312 K bytes of memory Feb 6 09:50:23 galera01-customerA-prod mysqld: Hope that's ok; if not, decrease some variables in the equation. Feb 6 09:50:23 galera01-customerA-prod mysqld: Feb 6 09:50:23 galera01-customerA-prod mysqld: Thread pointer: 0x0x7f6c47ad8008 Feb 6 09:50:23 galera01-customerA-prod mysqld: Attempting backtrace. You can use the following information to find out Feb 6 09:50:23 galera01-customerA-prod mysqld: where mysqld died. If you see no messages after this, something went Feb 6 09:50:23 galera01-customerA-prod mysqld: terribly wrong... Feb 6 09:50:23 galera01-customerA-prod mysqld: stack_bottom = 0x7f6e79ec71f0 thread_stack 0x48400 Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x7f6fbf90bc2e] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7f6fbf42ea95] Feb 6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x10330)[0x7f6fbd779330] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x9d)[0x7f6fbf4f0aad] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x7f6fbf4f0b41] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG16rotate_and_purgeEb+0xc2)[0x7f6fbf4f31b2] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z20reload_acl_and_cacheP3THDyP10TABLE_LISTPi+0x130)[0x7f6fbf396f10] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0x1314)[0x7f6fbf2a9cd4] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x331)[0x7f6fbf2b22a1] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(+0x439ac9)[0x7f6fbf2b2ac9] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1e2b)[0x7f6fbf2b4fcb] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z10do_commandP3THD+0x169)[0x7f6fbf2b5dc9] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(_Z24do_handle_one_connectionP3THD+0x18a)[0x7f6fbf37cefa] Feb 6 09:50:23 galera01-customerA-prod mysqld: /usr/sbin/mysqld(handle_one_connection+0x40)[0x7f6fbf37d0d0] Feb 6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libpthread.so.0(+0x8184)[0x7f6fbd771184] Feb 6 09:50:23 galera01-customerA-prod mysqld: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f6fbcc9037d] Feb 6 09:50:23 galera01-customerA-prod mysqld: Feb 6 09:50:23 galera01-customerA-prod mysqld: Trying to get some variables. Feb 6 09:50:23 galera01-customerA-prod mysqld: Some pointers may be invalid and cause the dump to abort. Feb 6 09:50:23 galera01-customerA-prod mysqld: Query (0x7f6c33021020): FLUSH BINARY LOGS Feb 6 09:50:23 galera01-customerA-prod mysqld: Connection ID (thread ID): 284 Feb 6 09:50:23 galera01-customerA-prod mysqld: Status: NOT_KILLED Feb 6 09:50:23 galera01-customerA-prod mysqld: Feb 6 09:50:23 galera01-customerA-prod mysqld: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off Feb 6 09:50:23 galera01-customerA-prod mysqld: Feb 6 09:50:23 galera01-customerA-prod mysqld: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains Feb 6 09:50:23 galera01-customerA-prod mysqld: information that should help you find out what is causing the crash. Feb 6 09:50:23 galera01-customerA-prod mysqld: Writing a core file Without disabling the nodes HAProxy health check I was not able to finish the backup successful. I've got two core dumps related to the issue: rw ------ 1 mysql mysql 9.6G Feb 6 09:46 core.mysqld.49551.11.1486370801 rw ------ 1 mysql mysql 9.5G Feb 6 09:50 core.mysqld.9877.11.1486371023 Should I upload both?

Nathan Madpsy added a comment - 2017-02-24 09:58

I was hit by this last night - 1 out of 3 nodes in a Galera cluster.

At that time the cluster would not be performing anything unusual and in fact should have been relatively quiescent.

Version 10.1.18 on Ubuntu 16.04

There is nothing else in the logs before this time.

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: 170224  1:26:33 [ERROR] mysqld got signal 11 ;

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: This could be because you hit a bug. It is also possible that this binary

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: or one of the libraries it was linked against is corrupt, improperly built,

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: or misconfigured. This error can also be caused by malfunctioning hardware.

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: We will try our best to scrape up some info that will hopefully help

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: diagnose the problem, but since we have already crashed,

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: something is definitely wrong and this may fail.

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Server version: 10.1.18-MariaDB-1~xenial

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: key_buffer_size=134217728

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: read_buffer_size=2097152

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: max_used_connections=13

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: max_threads=202

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: thread_count=7

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: It is possible that mysqld could use up to

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1376288 K  bytes of memory

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Hope that's ok; if not, decrease some variables in the equation.

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Thread pointer: 0x0x7f58c3849008

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Attempting backtrace. You can use the following information to find out

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: where mysqld died. If you see no messages after this, something went

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: terribly wrong...

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: stack_bottom = 0x7f595dc5a0a8 thread_stack 0x48400

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55638521562e]

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x556384d643d5]

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7f595c3cc3d0]

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x556384e1931b]

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(binlog_background_thread+0x2f3)[0x556384e198f3]

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(+0x705fb1)[0x556384ed4fb1]

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa)[0x7f595c3c26fa]

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f595ba6db5d]

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Trying to get some variables.

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Some pointers may be invalid and cause the dump to abort.

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Query (0x0):

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Connection ID (thread ID): 3

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Status: NOT_KILLED

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: information that should help you find out what is causing the crash.

Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE

Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Unit entered failed state.

Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Failed with result 'exit-code'.

Nathan Madpsy added a comment - 2017-02-24 09:58 I was hit by this last night - 1 out of 3 nodes in a Galera cluster. At that time the cluster would not be performing anything unusual and in fact should have been relatively quiescent. Version 10.1.18 on Ubuntu 16.04 There is nothing else in the logs before this time. Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: 170224 1:26:33 [ERROR] mysqld got signal 11 ; Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: This could be because you hit a bug. It is also possible that this binary Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: or one of the libraries it was linked against is corrupt, improperly built, Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: or misconfigured. This error can also be caused by malfunctioning hardware. Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: We will try our best to scrape up some info that will hopefully help Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: diagnose the problem, but since we have already crashed, Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: something is definitely wrong and this may fail. Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Server version: 10.1.18-MariaDB-1~xenial Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: key_buffer_size=134217728 Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: read_buffer_size=2097152 Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: max_used_connections=13 Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: max_threads=202 Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: thread_count=7 Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: It is possible that mysqld could use up to Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1376288 K bytes of memory Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Hope that's ok; if not, decrease some variables in the equation. Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Thread pointer: 0x0x7f58c3849008 Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Attempting backtrace. You can use the following information to find out Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: where mysqld died. If you see no messages after this, something went Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: terribly wrong... Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: stack_bottom = 0x7f595dc5a0a8 thread_stack 0x48400 Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55638521562e] Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(handle_fatal_signal+0x2d5)[0x556384d643d5] Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x113d0)[0x7f595c3cc3d0] Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x556384e1931b] Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(binlog_background_thread+0x2f3)[0x556384e198f3] Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /usr/sbin/mysqld(+0x705fb1)[0x556384ed4fb1] Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76fa)[0x7f595c3c26fa] Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f595ba6db5d] Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Trying to get some variables. Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Some pointers may be invalid and cause the dump to abort. Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Query (0x0): Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Connection ID (thread ID): 3 Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Status: NOT_KILLED Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains Feb 24 01:26:33 ams-prod1-db01 mysqld[13237]: information that should help you find out what is causing the crash. Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Main process exited, code=exited, status=1/FAILURE Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Unit entered failed state. Feb 24 01:26:33 ams-prod1-db01 systemd[1]: mariadb.service: Failed with result 'exit-code'.

Sergei Golubchik made changes - 2017-02-25 21:34

Assignee

Nirbhay Choubey [ nirbhay_c ]

Sergei Golubchik made changes - 2017-02-27 18:06

Priority

Major [ 3 ]

Critical [ 2 ]

Sachin Setiya (Inactive) made changes - 2017-03-02 06:19

Assignee

Sachin Setiya [ sachin.setiya.007 ]

Rasmus Johansson (Inactive) made changes - 2017-03-02 11:34

Sprint

10.1.19 [ 109 ]

10.1.19, 10.1.22 [ 109, 143 ]

Rasmus Johansson (Inactive) made changes - 2017-03-02 11:34

Rank

Ranked lower

Nathan Madpsy added a comment - 2017-03-02 11:41

This has reoccurred twice since my last comment. As a means to automatically restart the process I have resorted to setting the following for MariaDB's systemd config:

Restart=on-failure

Originally it is set to 'on-abort' but it seems with a signal 11, which occurs with this bug, the default value will not restart the process. It's not pretty and it's not advised but at least it provides an easy way to ensure the daemon is restarted upon hitting this bug.

Nathan Madpsy added a comment - 2017-03-02 11:41 This has reoccurred twice since my last comment. As a means to automatically restart the process I have resorted to setting the following for MariaDB's systemd config: Restart=on-failure Originally it is set to 'on-abort' but it seems with a signal 11, which occurs with this bug, the default value will not restart the process. It's not pretty and it's not advised but at least it provides an easy way to ensure the daemon is restarted upon hitting this bug.

Daniel Black made changes - 2017-03-04 03:53

Link

This issue relates to ~~MDEV-11860~~ [ ~~MDEV-11860~~ ]

Sachin Setiya (Inactive) made changes - 2017-03-07 12:27

Status

Open [ 1 ]

In Progress [ 3 ]

Daniel Black added a comment - 2017-03-15 00:06

madpsy FYI the systemd config related to Restarts/aborts has been fixed in ~~MDEV-10382~~ (10.1.20)

Daniel Black added a comment - 2017-03-15 00:06 madpsy FYI the systemd config related to Restarts/aborts has been fixed in MDEV-10382 (10.1.20)

Rasmus Johansson (Inactive) made changes - 2017-03-15 10:14

Sprint

10.1.19, 10.1.22 [ 109, 143 ]

10.1.19, 10.1.22, 10.2.6-1 [ 109, 143, 146 ]

Rasmus Johansson (Inactive) made changes - 2017-03-15 10:14

Rank

Ranked lower

Rasmus Johansson (Inactive) made changes - 2017-03-22 09:50

Sprint

10.1.19, 10.1.22, 10.2.6-1 [ 109, 143, 146 ]

10.1.19, 10.1.22, 10.2.6-2 [ 109, 143, 148 ]

Rasmus Johansson (Inactive) made changes - 2017-03-22 09:50

Rank

Ranked lower

Nathan Madpsy added a comment - 2017-03-22 16:56

Reading some of the other related bugs there is comment that disabling binary logs prevents the crashes (which makes sense). I thought binary logging was somewhat essential for Galera - the simple question is:

Can binary logging be disabled and if so what does that mean for the cluster?

Thanks

Nathan Madpsy added a comment - 2017-03-22 16:56 Reading some of the other related bugs there is comment that disabling binary logs prevents the crashes (which makes sense). I thought binary logging was somewhat essential for Galera - the simple question is: Can binary logging be disabled and if so what does that mean for the cluster? Thanks

Florian Engelmann added a comment - 2017-03-22 16:59

see https://www.percona.com/blog/2015/07/16/bypassing-sst-pxc-binary-logs/

not needed but risky...

Florian Engelmann added a comment - 2017-03-22 16:59 see https://www.percona.com/blog/2015/07/16/bypassing-sst-pxc-binary-logs/ not needed but risky...

Nathan Madpsy added a comment - 2017-03-22 17:07

Thanks. It sounds to me like binary logs are only required for point-in-time backups and not needed for Galera to operate nor for the xtrabackup SST method.

Assuming one has backups of individual DBs, is there still any risk involved in disabling bin log? What I don't want to happen is SST to be triggered just because one of the Galera members restarts.

Nathan Madpsy added a comment - 2017-03-22 17:07 Thanks. It sounds to me like binary logs are only required for point-in-time backups and not needed for Galera to operate nor for the xtrabackup SST method. Assuming one has backups of individual DBs, is there still any risk involved in disabling bin log? What I don't want to happen is SST to be triggered just because one of the Galera members restarts.

Stefan Midjich added a comment - 2017-03-23 08:01

I'm one of the people who disabled binary logs in their mariadb 10.x cluster to workaround this bug.

It's worked fine for me, I use xtrabackup to take incremental backups on another server. I've done restore tests of these backups and I'm satisfied.

Stefan Midjich added a comment - 2017-03-23 08:01 I'm one of the people who disabled binary logs in their mariadb 10.x cluster to workaround this bug. It's worked fine for me, I use xtrabackup to take incremental backups on another server. I've done restore tests of these backups and I'm satisfied.

Nathan Madpsy added a comment - 2017-03-24 09:26

Thanks for the info. I've now disabled binary logging and will monitor. Of course that's a workaround which happens to be satisfactory for some but this bug really needs fixed (over a year old!).

Nathan Madpsy added a comment - 2017-03-24 09:26 Thanks for the info. I've now disabled binary logging and will monitor. Of course that's a workaround which happens to be satisfactory for some but this bug really needs fixed (over a year old!).

Rasmus Johansson (Inactive) made changes - 2017-03-29 10:55

Sprint

10.1.19, 10.1.22, 10.2.6-2 [ 109, 143, 148 ]

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3 [ 109, 143, 148, 150 ]

Geoff Montee (Inactive) made changes - 2017-04-04 23:23

Affects Version/s

10.1.20 [ 22112 ]

Daniel Black made changes - 2017-06-12 09:50

Link

This issue is duplicated by ~~MDEV-13058~~ [ ~~MDEV-13058~~ ]

Stijn added a comment - 2017-06-26 11:48

We also had this bug twice in production for a 10.1.22 server (no replication/cluster). The mysqld service did not restart automatically. The second crash caused a table to become corrupted.

170626 10:22:06 [ERROR] mysqld got signal 11 ;

This could be because you hit a bug. It is also possible that this binary

or one of the libraries it was linked against is corrupt, improperly built,

or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help

diagnose the problem, but since we have already crashed,

something is definitely wrong and this may fail.

Server version: 10.1.22-MariaDB

key_buffer_size=134217728

read_buffer_size=131072

max_used_connections=227

max_threads=1502

thread_count=133

It is possible that mysqld could use up to

key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3430253 K  bytes of memory

Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x0

Attempting backtrace. You can use the following information to find out

where mysqld died. If you see no messages after this, something went

terribly wrong...

stack_bottom = 0x0 thread_stack 0x40000

*** buffer overflow detected ***: /usr/sbin/mysqld terminated

======= Backtrace: =========

/lib64/libc.so.6(__fortify_fail+0x37)[0x7fee0c5e4047]

/lib64/libc.so.6(+0x10d200)[0x7fee0c5e2200]

/lib64/libc.so.6(+0x10efb7)[0x7fee0c5e3fb7]

/usr/sbin/mysqld(my_addr_resolve+0x48)[0x7fee0f08ac28]

/usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x7fee0f0772a2]

/usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7fee0eb9a9d5]

/lib64/libpthread.so.0(+0xf370)[0x7fee0e1b5370]

/usr/sbin/mysqld(+0x88b37c)[0x7fee0ee6f37c]

/usr/sbin/mysqld(+0x8911c3)[0x7fee0ee751c3]

/usr/sbin/mysqld(+0x8ebbe8)[0x7fee0eecfbe8]

/usr/sbin/mysqld(+0x817d70)[0x7fee0edfbd70]

/lib64/libpthread.so.0(+0x7dc5)[0x7fee0e1addc5]

/lib64/libc.so.6(clone+0x6d)[0x7fee0c5cc73d]

======= Memory map: ========

Do you have any idea when this issue will be fixed?

Stijn added a comment - 2017-06-26 11:48 We also had this bug twice in production for a 10.1.22 server (no replication/cluster). The mysqld service did not restart automatically. The second crash caused a table to become corrupted. 170626 10:22:06 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.22-MariaDB key_buffer_size=134217728 read_buffer_size=131072 max_used_connections=227 max_threads=1502 thread_count=133 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 3430253 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x0 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x0 thread_stack 0x40000 *** buffer overflow detected ***: /usr/sbin/mysqld terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x37)[0x7fee0c5e4047] /lib64/libc.so.6(+0x10d200)[0x7fee0c5e2200] /lib64/libc.so.6(+0x10efb7)[0x7fee0c5e3fb7] /usr/sbin/mysqld(my_addr_resolve+0x48)[0x7fee0f08ac28] /usr/sbin/mysqld(my_print_stacktrace+0x1c2)[0x7fee0f0772a2] /usr/sbin/mysqld(handle_fatal_signal+0x305)[0x7fee0eb9a9d5] /lib64/libpthread.so.0(+0xf370)[0x7fee0e1b5370] /usr/sbin/mysqld(+0x88b37c)[0x7fee0ee6f37c] /usr/sbin/mysqld(+0x8911c3)[0x7fee0ee751c3] /usr/sbin/mysqld(+0x8ebbe8)[0x7fee0eecfbe8] /usr/sbin/mysqld(+0x817d70)[0x7fee0edfbd70] /lib64/libpthread.so.0(+0x7dc5)[0x7fee0e1addc5] /lib64/libc.so.6(clone+0x6d)[0x7fee0c5cc73d] ======= Memory map: ======== Do you have any idea when this issue will be fixed?

Sachin Setiya (Inactive) added a comment - 2017-06-27 05:03

Hi stijn

Unfortunately we are not able to produce bug from our , If it is possible can you upload debug core dump

Sachin Setiya (Inactive) added a comment - 2017-06-27 05:03 Hi stijn Unfortunately we are not able to produce bug from our , If it is possible can you upload debug core dump

Rasmus Johansson (Inactive) made changes - 2017-07-06 20:54

Sprint

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3 [ 109, 143, 148, 150 ]

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3, 10.3.1-2 [ 109, 143, 148, 150, 174 ]

Rasmus Johansson (Inactive) made changes - 2017-07-06 20:54

Rank

Ranked lower

Sachin Setiya (Inactive) made changes - 2017-07-11 07:14

Assignee

Sachin Setiya [ sachin.setiya.007 ]

Andrei Elkin [ elkin ]

Tim White added a comment - 2017-07-12 23:14

We are also now having this issue on Ubuntu 16.04, package version 10.1.25+maria-1~xenial from the MariaDB repositories, on a 3 node cluster.

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: 170713  8:16:35 [ERROR] mysqld got signal 11 ;

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: This could be because you hit a bug. It is also possible that this binary

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: or one of the libraries it was linked against is corrupt, improperly built,

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: or misconfigured. This error can also be caused by malfunctioning hardware.

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: We will try our best to scrape up some info that will hopefully help

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: diagnose the problem, but since we have already crashed,

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: something is definitely wrong and this may fail.

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Server version: 10.1.25-MariaDB-1~xenial

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: key_buffer_size=268435456

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: read_buffer_size=2097152

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: max_used_connections=23

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: max_threads=258

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: thread_count=22

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: It is possible that mysqld could use up to

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1852580 K  bytes of memory

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Hope that's ok; if not, decrease some variables in the equation.

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Thread pointer: 0x7fa017c12008

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Attempting backtrace. You can use the following information to find out

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: where mysqld died. If you see no messages after this, something went

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: terribly wrong...

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: stack_bottom = 0x7fa3052bf0d8 thread_stack 0x48400

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55bb69dc6bde]

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x55bb69912415]

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fa303a30390]

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x55bb699c779b]

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(binlog_background_thread+0x2f3)[0x55bb699c7dd3]

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fa303a266ba]

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa3030d13dd]

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Trying to get some variables.

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Some pointers may be invalid and cause the dump to abort.

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Query (0x0):

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Connection ID (thread ID): 3

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Status: NOT_KILLED

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: information that should help you find out what is causing the crash.

Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Main process exited, code=dumped, status=11/SEGV

Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Unit entered failed state.

Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Failed with result 'core-dump'.

Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart.

Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: Stopped MariaDB database server.

Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: Starting MariaDB database server...

Jul 13 08:16:41 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:41 140694126172416 [Note] /usr/sbin/mysqld (mysqld 10.1.25-MariaDB-1~

xenial) starting as process 38882 ...

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Segmentation fault (core dumped)

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: WSREP: Failed to start mysqld for wsrep recovery: '2017-07-13  8:16:42 140694126172416

 [Note] InnoDB: Using mutexes to ref count buffer pool pages

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: The InnoDB memory heap is disabled

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: Mutexes and rw_locks use GCC atomic

 builtins

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: GCC builtin __atomic_thread_fence()

 is used for memory barrier

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: Compressed tables use zlib 1.2.8

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: Using Linux native AIO

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: Using SSE crc32 instructions

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: Initializing buffer pool, size = 10

.0G

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: Completed initialization of buffer

pool

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: Highest supported file format is Ba

rracuda.

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:42 140694126172416 [Note] InnoDB: Starting crash recovery from checkp

oint LSN=12274681738

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:49 140694126172416 [Note] InnoDB: Restoring possible half-written dat

a pages from the doublewrite buffer...

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Transaction 3771194 was in the XA prepared state.

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: 1 transaction(s) which must be rolled back or cleaned up

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: in total 0 row operations to undo

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Trx id counter is 3771648

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:51 140694126172416 [Note] InnoDB: Starting final batch to recover 17

pages from redo log

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Last MySQL binlog file position 0 104878519, file name /var/log/mysql/mariadb-

bin.000201

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140694126172416 [Note] InnoDB: 128 rollback segment(s) are active.

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140681746048768 [Note] InnoDB: Starting in background the rollback

 of recovered transactions

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140681746048768 [Note] InnoDB: Rollback of non-prepared transactio

ns completed

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140694126172416 [Note] InnoDB: Waiting for purge to start

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140694126172416 [Note] InnoDB:  Percona XtraDB (http://www.percona

.com) 5.6.36-82.0 started; log sequence number 12274782858

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140694126172416 [Warning] InnoDB: Skipping buffer pool dump/restor

e during wsrep recovery.

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140694126172416 [Note] Plugin 'FEEDBACK' is disabled.

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140694126172416 [Note] Recovering after a crash using /var/log/mys

ql/mariadb-bin

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140694126172416 [Note] Starting crash recovery...

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900  InnoDB: Starting recovery for XA transactions...

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900  InnoDB: Transaction 3771194 in prepared state after recovery

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900  InnoDB: Transaction contains changes to 1 rows

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900  InnoDB: 1 transactions in prepared state after recovery

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13  8:16:55 140694126172416 [Note] Found 1 prepared transaction(s) in InnoDB

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 170713  8:16:55 [ERROR] mysqld got signal 11 ;

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: This could be because you hit a bug. It is also possible that this binary

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: or one of the libraries it was linked against is corrupt, improperly built,

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: or misconfigured. This error can also be caused by malfunctioning hardware.

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: We will try our best to scrape up some info that will hopefully help

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: diagnose the problem, but since we have already crashed,

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: something is definitely wrong and this may fail.

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Server version: 10.1.25-MariaDB-1~xenial

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: key_buffer_size=268435456

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: read_buffer_size=2097152

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: max_used_connections=0

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: max_threads=258

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: thread_count=0

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: It is possible that mysqld could use up to

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1852580 K  bytes of memory

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Hope that's ok; if not, decrease some variables in the equation.

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Thread pointer: 0x0

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Attempting backtrace. You can use the following information to find out

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: where mysqld died. If you see no messages after this, something went

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: terribly wrong...

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: stack_bottom = 0x0 thread_stack 0x48400

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5604d1131bde]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x5604d0c7d415]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ff5e5d3c390]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(wsrep_on+0x18)[0x5604d0c17988]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x805792)[0x5604d0eea792]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x805dc1)[0x5604d0eeadc1]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x599bff)[0x5604d0c7ebff]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0x2cf)[0x5604d0afcdff]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z10ha_recoverP7st_hash+0xc6)[0x5604d0c80b06]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG7recoverEP11st_log_infoPKcP11st_io_cacheP28Format_description_log_eventb+0x58f)[0x5604d0d3359f]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG18do_binlog_recoveryEPKcb+0x372)[0x5604d0d33ba2]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG4openEPKc+0x30)[0x5604d0d36da0]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x376eb5)[0x5604d0a5beb5]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1af9)[0x5604d0a5f149]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff5e52f6830]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_start+0x29)[0x5604d0a532a9]

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains

Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: information that should help you find out what is causing the crash.'

Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Control process exited, code=exited status=1

Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: Failed to start MariaDB database server.

Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Unit entered failed state.

Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Failed with result 'exit-code'.

Tim White added a comment - 2017-07-12 23:14 We are also now having this issue on Ubuntu 16.04, package version 10.1.25+maria-1~xenial from the MariaDB repositories, on a 3 node cluster. Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: 170713 8:16:35 [ERROR] mysqld got signal 11 ; Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: This could be because you hit a bug. It is also possible that this binary Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: or one of the libraries it was linked against is corrupt, improperly built, Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: or misconfigured. This error can also be caused by malfunctioning hardware. Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: We will try our best to scrape up some info that will hopefully help Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: diagnose the problem, but since we have already crashed, Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: something is definitely wrong and this may fail. Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Server version: 10.1.25-MariaDB-1~xenial Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: key_buffer_size=268435456 Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: read_buffer_size=2097152 Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: max_used_connections=23 Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: max_threads=258 Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: thread_count=22 Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: It is possible that mysqld could use up to Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1852580 K bytes of memory Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Hope that's ok; if not, decrease some variables in the equation. Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Thread pointer: 0x7fa017c12008 Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Attempting backtrace. You can use the following information to find out Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: where mysqld died. If you see no messages after this, something went Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: terribly wrong... Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: stack_bottom = 0x7fa3052bf0d8 thread_stack 0x48400 Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55bb69dc6bde] Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x55bb69912415] Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fa303a30390] Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x55bb699c779b] Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /usr/sbin/mysqld(binlog_background_thread+0x2f3)[0x55bb699c7dd3] Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fa303a266ba] Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fa3030d13dd] Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Trying to get some variables. Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Some pointers may be invalid and cause the dump to abort. Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Query (0x0): Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Connection ID (thread ID): 3 Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Status: NOT_KILLED Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: Optimizer switch: index_merge=on,index_merge_union=on,index_merge_sort_union=on,index_merge_intersection=on,index_merge_sort_intersection=off,engine_condition_pushdown=off,index_condition_pushdown=on,derived_merge=on,derived_with_keys=on,firstmatch=on,loosescan=on,materialization=on,in_to_exists=on,semijoin=on,partial_match_rowid_merge=on,partial_match_table_scan=on,subquery_cache=on,mrr=off,mrr_cost_based=off,mrr_sort_keys=off,outer_join_with_cache=on,semijoin_with_cache=on,join_cache_incremental=on,join_cache_hashed=on,join_cache_bka=on,optimize_join_buffer_size=off,table_elimination=on,extended_keys=on,exists_to_in=on,orderby_uses_equalities=off Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains Jul 13 08:16:35 suq-mariadb-2-2 mysqld[2400]: information that should help you find out what is causing the crash. Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Main process exited, code=dumped, status=11/SEGV Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Unit entered failed state. Jul 13 08:16:35 suq-mariadb-2-2 systemd[1]: mariadb.service: Failed with result 'core-dump'. Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Service hold-off time over, scheduling restart. Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: Stopped MariaDB database server. Jul 13 08:16:40 suq-mariadb-2-2 systemd[1]: Starting MariaDB database server... Jul 13 08:16:41 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:41 140694126172416 [Note] /usr/sbin/mysqld (mysqld 10.1.25-MariaDB-1~ xenial) starting as process 38882 ... Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Segmentation fault (core dumped) Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: WSREP: Failed to start mysqld for wsrep recovery: '2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Using mutexes to ref count buffer pool pages Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: The InnoDB memory heap is disabled Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: GCC builtin __atomic_thread_fence() is used for memory barrier Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Compressed tables use zlib 1.2.8 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Using Linux native AIO Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Using SSE crc32 instructions Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Initializing buffer pool, size = 10 .0G Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Completed initialization of buffer pool Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Highest supported file format is Ba rracuda. Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:42 140694126172416 [Note] InnoDB: Starting crash recovery from checkp oint LSN=12274681738 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:49 140694126172416 [Note] InnoDB: Restoring possible half-written dat a pages from the doublewrite buffer... Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Transaction 3771194 was in the XA prepared state. Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: 1 transaction(s) which must be rolled back or cleaned up Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: in total 0 row operations to undo Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Trx id counter is 3771648 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:51 140694126172416 [Note] InnoDB: Starting final batch to recover 17 pages from redo log Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: InnoDB: Last MySQL binlog file position 0 104878519, file name /var/log/mysql/mariadb- bin.000201 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] InnoDB: 128 rollback segment(s) are active. Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140681746048768 [Note] InnoDB: Starting in background the rollback of recovered transactions Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140681746048768 [Note] InnoDB: Rollback of non-prepared transactio ns completed Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] InnoDB: Waiting for purge to start Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] InnoDB: Percona XtraDB (http://www.percona .com) 5.6.36-82.0 started; log sequence number 12274782858 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Warning] InnoDB: Skipping buffer pool dump/restor e during wsrep recovery. Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] Plugin 'FEEDBACK' is disabled. Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] Recovering after a crash using /var/log/mys ql/mariadb-bin Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] Starting crash recovery... Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900 InnoDB: Starting recovery for XA transactions... Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900 InnoDB: Transaction 3771194 in prepared state after recovery Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900 InnoDB: Transaction contains changes to 1 rows Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 08:16:55 7ff5e7699900 InnoDB: 1 transactions in prepared state after recovery Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 2017-07-13 8:16:55 140694126172416 [Note] Found 1 prepared transaction(s) in InnoDB Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: 170713 8:16:55 [ERROR] mysqld got signal 11 ; Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: This could be because you hit a bug. It is also possible that this binary Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: or one of the libraries it was linked against is corrupt, improperly built, Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: or misconfigured. This error can also be caused by malfunctioning hardware. Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: To report this bug, see https://mariadb.com/kb/en/reporting-bugs Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: We will try our best to scrape up some info that will hopefully help Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: diagnose the problem, but since we have already crashed, Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: something is definitely wrong and this may fail. Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Server version: 10.1.25-MariaDB-1~xenial Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: key_buffer_size=268435456 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: read_buffer_size=2097152 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: max_used_connections=0 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: max_threads=258 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: thread_count=0 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: It is possible that mysqld could use up to Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 1852580 K bytes of memory Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Hope that's ok; if not, decrease some variables in the equation. Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Thread pointer: 0x0 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: Attempting backtrace. You can use the following information to find out Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: where mysqld died. If you see no messages after this, something went Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: terribly wrong... Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: stack_bottom = 0x0 thread_stack 0x48400 Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5604d1131bde] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x5604d0c7d415] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7ff5e5d3c390] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(wsrep_on+0x18)[0x5604d0c17988] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x805792)[0x5604d0eea792] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x805dc1)[0x5604d0eeadc1] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x599bff)[0x5604d0c7ebff] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z24plugin_foreach_with_maskP3THDPFcS0_P13st_plugin_intPvEijS3_+0x2cf)[0x5604d0afcdff] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z10ha_recoverP7st_hash+0xc6)[0x5604d0c80b06] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG7recoverEP11st_log_infoPKcP11st_io_cacheP28Format_description_log_eventb+0x58f)[0x5604d0d3359f] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG18do_binlog_recoveryEPKcb+0x372)[0x5604d0d33ba2] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG4openEPKc+0x30)[0x5604d0d36da0] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(+0x376eb5)[0x5604d0a5beb5] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_Z11mysqld_mainiPPc+0x1af9)[0x5604d0a5f149] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7ff5e52f6830] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: /usr/sbin/mysqld(_start+0x29)[0x5604d0a532a9] Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains Jul 13 08:17:40 suq-mariadb-2-2 mysqld[38628]: information that should help you find out what is causing the crash.' Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Control process exited, code=exited status=1 Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: Failed to start MariaDB database server. Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Unit entered failed state. Jul 13 08:17:40 suq-mariadb-2-2 systemd[1]: mariadb.service: Failed with result 'exit-code'.

Andrei Elkin added a comment - 2017-07-13 05:56

Hello, Tim.

Technically speaking your report must be different from the original one that mentions
binlog_background_thread.
But at the same yours should be easier to fix. It would be great to have a copy of your server env
to explore what causes this crash at server recovery. It feels this crash might be repeatable, that is it would
show up again should you restart the server upon the first crash.

Cheers,

Andrei

Andrei Elkin added a comment - 2017-07-13 05:56 Hello, Tim. Technically speaking your report must be different from the original one that mentions binlog_background_thread. But at the same yours should be easier to fix. It would be great to have a copy of your server env to explore what causes this crash at server recovery. It feels this crash might be repeatable, that is it would show up again should you restart the server upon the first crash. Cheers, Andrei

Brendan P added a comment - 2017-07-13 16:46

See my comment here: https://jira.mariadb.org/browse/MDEV-11053?focusedCommentId=97602&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-97602

This is affecting production badly as we cannot alter a table with pt-online-schema-change without having it randomly crash on us.

Brendan P added a comment - 2017-07-13 16:46 See my comment here: https://jira.mariadb.org/browse/MDEV-11053?focusedCommentId=97602&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-97602 This is affecting production badly as we cannot alter a table with pt-online-schema-change without having it randomly crash on us.

Andrii Nikitin (Inactive) made changes - 2017-09-11 09:11

Link

This issue relates to ~~MDEV-13747~~ [ ~~MDEV-13747~~ ]

Geoff Montee (Inactive) made changes - 2017-09-14 18:43

Affects Version/s

10.1.23 [ 22503 ]

Sergei Golubchik made changes - 2017-10-04 14:27

Affects Version/s

10.2 [ 14601 ]

Sergei Golubchik made changes - 2017-10-04 14:27

Fix Version/s

10.2 [ 14601 ]

Sergei Golubchik made changes - 2017-10-04 14:29

Sprint

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3, 10.3.1-2 [ 109, 143, 148, 150, 174 ]

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3, 10.3.1-2, 10.2.10 [ 109, 143, 148, 150, 174, 183 ]

Brendan P added a comment - 2017-10-15 13:53

Happened again on two nodes after trying to alter a 10GB table:

Node1 roughly 5 minutes after starting pt-osc:
171014 21:44:55 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.28-MariaDB-1~xenial
key_buffer_size=16777216
read_buffer_size=524288
max_used_connections=2049
max_threads=4097
thread_count=314
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18937620 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f7aefcd9008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7fc8bde3fff8 thread_stack 0x48400
2017-10-14 21:44:57 140500174632704 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.57534S), skipping check
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x5622aabd254e]
/usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x5622aa71b005]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7fc9c649c390]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x92)[0x5622aa7d1332]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11)[0x5622aa7d13c1]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG23trx_group_commit_leaderEPNS_18group_commit_entryE+0x4ec)[0x5622aa7d421c]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG34write_transaction_to_binlog_eventsEPNS_18group_commit_entryE+0x133)[0x5622aa7d4593]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG27write_transaction_to_binlogEP3THDP17binlog_cache_mngrP9Log_eventbbb+0xd0)[0x5622aa7d4840]
/usr/sbin/mysqld(+0x659a06)[0x5622aa7d4a06]
/usr/sbin/mysqld(+0x65a582)[0x5622aa7d5582]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13log_and_orderEP3THDybbb+0x44)[0x5622aa7d56e4]
/usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x7ab)[0x5622aa71ebfb]
/usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0x57)[0x5622aa66eb37]
/usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0xa9c)[0x5622aa58526c]
/usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x311)[0x5622aa58dce1]
/usr/sbin/mysqld(+0x413635)[0x5622aa58e635]
/usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1ea5)[0x5622aa590d35]
/usr/sbin/mysqld(_Z10do_commandP3THD+0x139)[0x5622aa591c09]
/usr/sbin/mysqld(_Z26threadpool_process_requestP3THD+0xff)[0x5622aa6a0c9f]
/usr/sbin/mysqld(+0x577805)[0x5622aa6f2805]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7fc9c64926ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7fc9c5b3d3dd]

Node1 again shortly after recovering, deleting old triggers and restarting pt-osc alter:
171014 21:59:46 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.28-MariaDB-1~xenial
key_buffer_size=16777216
read_buffer_size=524288
max_used_connections=938
max_threads=4097
thread_count=315
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18937620 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f314dc12008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f66911c10d8 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55f6a68af54e]
/usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x55f6a63f8005]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f66941ef390]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x55f6a64ae08b]
/usr/sbin/mysqld(binlog_background_thread+0x35b)[0x55f6a64ae72b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f66941e56ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f66938903dd]

Node2 with no alter running but merely during SST to Node1 since Node1 wouldn't recover after 2nd crash:
WSREP_SST: [INFO] Rate-limiting SST to 60M (20171014 22:11:31.060)
WSREP_SST: [INFO] Sleeping before data transfer for SST (20171014 22:11:31.062)
2017-10-14 22:11:32 140016479106816 [Note] WSREP: (06100a81, 'tcp://0.0.0.0:4567') turning message relay requesting off
WSREP_SST: [INFO] Streaming the backup to joiner at 10.0.0.16 4444 (20171014 22:11:41.066)
WSREP_SST: [INFO] Evaluating innobackupex --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | snzip | pv -q -L $rlimit | socat -u stdio TCP:10.0.0.16:4444; RC=( ${PIPESTATUS[@]} ) (20171014 22:11:41.068)
171014 22:25:06 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.

To report this bug, see https://mariadb.com/kb/en/reporting-bugs

We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.

Server version: 10.1.28-MariaDB-1~xenial
key_buffer_size=16777216
read_buffer_size=524288
max_used_connections=1501
max_threads=4097
thread_count=375
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18926318 K bytes of memory
Hope that's ok; if not, decrease some variables in the equation.

Thread pointer: 0x7f400c812008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7f5924bbd0d8 thread_stack 0x48400
/usr/sbin/mysqld(my_print_stacktrace+0x2e)[0x55e5180db54e]
/usr/sbin/mysqld(handle_fatal_signal+0x2f5)[0x55e517c24005]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x11390)[0x7f5927c04390]
/usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b)[0x55e517cda08b]
/usr/sbin/mysqld(binlog_background_thread+0x35b)[0x55e517cda72b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba)[0x7f5927bfa6ba]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f59272a53dd]

Brendan P added a comment - 2017-10-15 13:53 Happened again on two nodes after trying to alter a 10GB table: Node1 roughly 5 minutes after starting pt-osc: 171014 21:44:55 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.28-MariaDB-1~xenial key_buffer_size=16777216 read_buffer_size=524288 max_used_connections=2049 max_threads=4097 thread_count=314 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18937620 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f7aefcd9008 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7fc8bde3fff8 thread_stack 0x48400 2017-10-14 21:44:57 140500174632704 [Warning] WSREP: last inactive check more than PT1.5S ago (PT1.57534S), skipping check /usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x5622aabd254e] /usr/sbin/mysqld(handle_fatal_signal+0x2f5) [0x5622aa71b005] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7fc9c649c390] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG21do_checkpoint_requestEm+0x92) [0x5622aa7d1332] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG20checkpoint_and_purgeEm+0x11) [0x5622aa7d13c1] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG23trx_group_commit_leaderEPNS_18group_commit_entryE+0x4ec) [0x5622aa7d421c] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG34write_transaction_to_binlog_eventsEPNS_18group_commit_entryE+0x133) [0x5622aa7d4593] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG27write_transaction_to_binlogEP3THDP17binlog_cache_mngrP9Log_eventbbb+0xd0) [0x5622aa7d4840] /usr/sbin/mysqld(+0x659a06) [0x5622aa7d4a06] /usr/sbin/mysqld(+0x65a582) [0x5622aa7d5582] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13log_and_orderEP3THDybbb+0x44) [0x5622aa7d56e4] /usr/sbin/mysqld(_Z15ha_commit_transP3THDb+0x7ab) [0x5622aa71ebfb] /usr/sbin/mysqld(_Z17trans_commit_stmtP3THD+0x57) [0x5622aa66eb37] /usr/sbin/mysqld(_Z21mysql_execute_commandP3THD+0xa9c) [0x5622aa58526c] /usr/sbin/mysqld(_Z11mysql_parseP3THDPcjP12Parser_state+0x311) [0x5622aa58dce1] /usr/sbin/mysqld(+0x413635) [0x5622aa58e635] /usr/sbin/mysqld(_Z16dispatch_command19enum_server_commandP3THDPcj+0x1ea5) [0x5622aa590d35] /usr/sbin/mysqld(_Z10do_commandP3THD+0x139) [0x5622aa591c09] /usr/sbin/mysqld(_Z26threadpool_process_requestP3THD+0xff) [0x5622aa6a0c9f] /usr/sbin/mysqld(+0x577805) [0x5622aa6f2805] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7fc9c64926ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7fc9c5b3d3dd] Node1 again shortly after recovering, deleting old triggers and restarting pt-osc alter: 171014 21:59:46 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.28-MariaDB-1~xenial key_buffer_size=16777216 read_buffer_size=524288 max_used_connections=938 max_threads=4097 thread_count=315 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18937620 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f314dc12008 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f66911c10d8 thread_stack 0x48400 /usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x55f6a68af54e] /usr/sbin/mysqld(handle_fatal_signal+0x2f5) [0x55f6a63f8005] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f66941ef390] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b) [0x55f6a64ae08b] /usr/sbin/mysqld(binlog_background_thread+0x35b) [0x55f6a64ae72b] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f66941e56ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f66938903dd] Node2 with no alter running but merely during SST to Node1 since Node1 wouldn't recover after 2nd crash: WSREP_SST: [INFO] Rate-limiting SST to 60M (20171014 22:11:31.060) WSREP_SST: [INFO] Sleeping before data transfer for SST (20171014 22:11:31.062) 2017-10-14 22:11:32 140016479106816 [Note] WSREP: (06100a81, 'tcp://0.0.0.0:4567') turning message relay requesting off WSREP_SST: [INFO] Streaming the backup to joiner at 10.0.0.16 4444 (20171014 22:11:41.066) WSREP_SST: [INFO] Evaluating innobackupex --no-version-check $tmpopts $INNOEXTRA --galera-info --stream=$sfmt $itmpdir 2>${DATA}/innobackup.backup.log | snzip | pv -q -L $rlimit | socat -u stdio TCP:10.0.0.16:4444; RC=( ${PIPESTATUS [@] } ) (20171014 22:11:41.068) 171014 22:25:06 [ERROR] mysqld got signal 11 ; This could be because you hit a bug. It is also possible that this binary or one of the libraries it was linked against is corrupt, improperly built, or misconfigured. This error can also be caused by malfunctioning hardware. To report this bug, see https://mariadb.com/kb/en/reporting-bugs We will try our best to scrape up some info that will hopefully help diagnose the problem, but since we have already crashed, something is definitely wrong and this may fail. Server version: 10.1.28-MariaDB-1~xenial key_buffer_size=16777216 read_buffer_size=524288 max_used_connections=1501 max_threads=4097 thread_count=375 It is possible that mysqld could use up to key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 18926318 K bytes of memory Hope that's ok; if not, decrease some variables in the equation. Thread pointer: 0x7f400c812008 Attempting backtrace. You can use the following information to find out where mysqld died. If you see no messages after this, something went terribly wrong... stack_bottom = 0x7f5924bbd0d8 thread_stack 0x48400 /usr/sbin/mysqld(my_print_stacktrace+0x2e) [0x55e5180db54e] /usr/sbin/mysqld(handle_fatal_signal+0x2f5) [0x55e517c24005] /lib/x86_64-linux-gnu/libpthread.so.0(+0x11390) [0x7f5927c04390] /usr/sbin/mysqld(_ZN13MYSQL_BIN_LOG13mark_xid_doneEmb+0x8b) [0x55e517cda08b] /usr/sbin/mysqld(binlog_background_thread+0x35b) [0x55e517cda72b] /lib/x86_64-linux-gnu/libpthread.so.0(+0x76ba) [0x7f5927bfa6ba] /lib/x86_64-linux-gnu/libc.so.6(clone+0x6d) [0x7f59272a53dd]

Andrei Elkin added a comment - 2017-10-15 18:18

Good news. One cause of the crashes is identified and the patch must be out in couple of days.

Andrei Elkin added a comment - 2017-10-15 18:18 Good news. One cause of the crashes is identified and the patch must be out in couple of days.

Andrei Elkin added a comment - 2017-10-16 16:37

The patch is being tried by BB now:

git push https://github.com/MariaDB/server bb-10.1-mdev9510-andrei:bb-10.1-mdev9510-andrei

Andrei Elkin added a comment - 2017-10-16 16:37 The patch is being tried by BB now: git push https://github.com/MariaDB/server bb-10.1-mdev9510-andrei:bb-10.1-mdev9510-andrei

Brendan P added a comment - 2017-10-17 11:56

That is great news, I've went over the patch and indeed it makes sense. I am curious though as I don't believe any of our applications actually call any session-only disabling of sql_log_bin directly, would this be something that galera makes use of internally to isolate queries to specific nodes? Or could this also be triggered by another condition? I know much of the crashes have been triggered by pt-online-schema-change but I've went over that also and don't see where sql_log_bin=0 would be getting called. Thank you!

Brendan P added a comment - 2017-10-17 11:56 That is great news, I've went over the patch and indeed it makes sense. I am curious though as I don't believe any of our applications actually call any session-only disabling of sql_log_bin directly, would this be something that galera makes use of internally to isolate queries to specific nodes? Or could this also be triggered by another condition? I know much of the crashes have been triggered by pt-online-schema-change but I've went over that also and don't see where sql_log_bin=0 would be getting called. Thank you!

Brendan P added a comment - 2017-10-17 11:59

I've also determined the "random" timeframe before the crash gets triggered after the alter or SST begins certainly concides with when the binary logs get rotated after reaching their 1GB set size, which can be anywhere from 3-5 to 10-15 minutes depending on write load...

Brendan P added a comment - 2017-10-17 11:59 I've also determined the "random" timeframe before the crash gets triggered after the alter or SST begins certainly concides with when the binary logs get rotated after reaching their 1GB set size, which can be anywhere from 3-5 to 10-15 minutes depending on write load...

Brendan P added a comment - 2017-10-19 00:58

Had 2 crashes today and have decided to run with the patch, so far its been running fine without incident, will keep you updated.

Brendan P added a comment - 2017-10-19 00:58 Had 2 crashes today and have decided to run with the patch, so far its been running fine without incident, will keep you updated.

Andrei Elkin added a comment - 2017-10-23 08:55

Brendan, hello.

No news from you for these days I read as no bad ones .
Anyway I am submitting the patch for review very shortly.

Thanks for helping!

Andrei

Andrei Elkin added a comment - 2017-10-23 08:55 Brendan, hello. No news from you for these days I read as no bad ones . Anyway I am submitting the patch for review very shortly. Thanks for helping! Andrei

Andrei Elkin made changes - 2017-10-23 10:45

Link

This issue is duplicated by ~~MDEV-9960~~ [ ~~MDEV-9960~~ ]

Brendan P added a comment - 2017-10-23 12:08

Thank you, indeed no news = good. This is promising so far.

I've done a few table alters over the weekend, restarted some nodes due to some innodb tunings, haven't had any crashes since the patch, I will however continue to run this build for now to make sure all is good. Thanks again!

Brendan P added a comment - 2017-10-23 12:08 Thank you, indeed no news = good. This is promising so far. I've done a few table alters over the weekend, restarted some nodes due to some innodb tunings, haven't had any crashes since the patch, I will however continue to run this build for now to make sure all is good. Thanks again!

Andrei Elkin made changes - 2017-10-25 09:00

Link

This issue is duplicated by ~~MDEV-9960~~ [ ~~MDEV-9960~~ ]

Andrei Elkin made changes - 2017-10-25 09:04

Link

This issue includes ~~MDEV-9960~~ [ ~~MDEV-9960~~ ]

Andrei Elkin added a comment - 2017-10-25 14:56

The patch is polished a bit and a regression test is made to reused an existing one.
Here is the pull request:
https://github.com/MariaDB/server/pull/473
I hope sachin.setiya.007 will review it promptly once back to the office.

Andrei Elkin added a comment - 2017-10-25 14:56 The patch is polished a bit and a regression test is made to reused an existing one. Here is the pull request: https://github.com/MariaDB/server/pull/473 I hope sachin.setiya.007 will review it promptly once back to the office.

Andrei Elkin added a comment - 2017-10-25 14:57

Hello, Sachin.

I see you're back to work on Nov 2nd, Thur. I wonder if you could review mdev-9510 patch on that 44th week (and then it would be practically Thur and Fri for that). The patch is for 10.1, it's rather small.

Cheers,

Andrei

Andrei Elkin added a comment - 2017-10-25 14:57 Hello, Sachin. I see you're back to work on Nov 2nd, Thur. I wonder if you could review mdev-9510 patch on that 44th week (and then it would be practically Thur and Fri for that). The patch is for 10.1, it's rather small. Cheers, Andrei

Andrei Elkin made changes - 2017-10-25 14:57

Status

In Progress [ 3 ]

In Review [ 10002 ]

Sergei Golubchik made changes - 2017-10-31 18:16

Sprint

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3, 10.3.1-2, 10.2.10 [ 109, 143, 148, 150, 174, 183 ]

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3, 10.3.1-2, 10.2.10, 10.1.29 [ 109, 143, 148, 150, 174, 183, 202 ]

Sachin Setiya (Inactive) added a comment - 2017-11-03 05:39

Okay to push, Sorry for delay.

Regards
sachin

Sachin Setiya (Inactive) added a comment - 2017-11-03 05:39 Okay to push, Sorry for delay. Regards sachin

Andrei Elkin added a comment - 2017-11-06 15:40

Changing the post-review status accordingly.

Andrei Elkin added a comment - 2017-11-06 15:40 Changing the post-review status accordingly.

Andrei Elkin made changes - 2017-11-06 15:40

Status

In Review [ 10002 ]

Stalled [ 10000 ]

Andrei Elkin made changes - 2017-11-07 16:19

Link

This issue relates to ~~MDEV-9901~~ [ ~~MDEV-9901~~ ]

Andrei Elkin made changes - 2017-11-07 16:21

Link

This issue is duplicated by ~~MDEV-9901~~ [ ~~MDEV-9901~~ ]

Sergei Golubchik made changes - 2017-11-08 09:24

Sprint

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3, 10.3.1-2, 10.2.10, 10.1.29 [ 109, 143, 148, 150, 174, 183, 202 ]

10.1.19, 10.1.22, 10.2.6-2, 10.2.6-3, 10.3.1-2, 10.2.10, 10.1.29, 10.2.11 [ 109, 143, 148, 150, 174, 183, 202, 203 ]

Sergei Golubchik made changes - 2017-11-08 10:12

Link

This issue is part of ~~MDEV-9960~~ [ ~~MDEV-9960~~ ]

Sergei Golubchik made changes - 2017-11-08 10:12

Link

This issue includes ~~MDEV-9960~~ [ ~~MDEV-9960~~ ]

Andrei Elkin added a comment - 2017-11-15 20:48

Pushed c7e38076f33.

Andrei Elkin added a comment - 2017-11-15 20:48 Pushed c7e38076f33.

Andrei Elkin made changes - 2017-11-15 20:48

Fix Version/s		10.1.30 [ 22637 ]
Fix Version/s		10.2.11 [ 22634 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.1 [ 16100 ]
Resolution		Fixed [ 1 ]
Status	Stalled [ 10000 ]	Closed [ 6 ]

Brendan P added a comment - 2017-11-23 17:06

Thanks everyone, I am still running the same build with the original patch, not a single crash since. 100% fixed for sure.

Brendan P added a comment - 2017-11-23 17:06 Thanks everyone, I am still running the same build with the original patch, not a single crash since. 100% fixed for sure.

Elena Stepanova made changes - 2017-11-28 20:52

Link

This issue is duplicated by ~~MDEV-14521~~ [ ~~MDEV-14521~~ ]

Sergei Golubchik made changes - 2017-12-04 10:12

Link

This issue duplicates ~~MDEV-14395~~ [ ~~MDEV-14395~~ ]

Daniel Black made changes - 2018-01-22 21:05

Link

This issue relates to ~~MDEV-11595~~ [ ~~MDEV-11595~~ ]

Ralf Gebhardt made changes - 2018-02-14 11:38

Link

This issue is duplicated by ~~MDEV-15307~~ [ ~~MDEV-15307~~ ]

Sergei Golubchik made changes - 2021-12-06 21:42

Workflow

MariaDB v3 [ 73866 ]

MariaDB v4 [ 150073 ]

Jira Automation (IT) made changes - 2024-07-04 08:52

Zendesk Related Tickets

199143 120351 104924 171005 125314 136735 180752

MariaDB Server

Segmentation fault in binlog thread causes crash

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration