[MDEV-11734] mysqld got signal 7 Created: 2017-01-06  Updated: 2020-08-25  Resolved: 2018-10-12

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.0.23, 10.0.23-galera
Fix Version/s: 10.0.24-galera

Type: Bug Priority: Major
Reporter: Saharsh Shah Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Production



 Description   

170104 14:43:01 [Note] WSREP: Created page /db/mysql_data/gcache.page.000065 of size 247762356 bytes
170104 14:43:02 [ERROR] mysqld got signal 7 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.0.23-MariaDB-wsrep-log
key_buffer_size=8388608
read_buffer_size=249856
max_used_connections=25
max_threads=153
thread_count=4
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 58382 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x0 thread_stack 0x20000
mysys/stacktrace.c:247(my_print_stacktrace)[0xbfb27e]
sql/signal_handler.cc:153(handle_fatal_signal)[0x73faec]
/lib64/libpthread.so.0[0x355dc0f710]
/lib64/libc.so.6(memcpy+0x310)[0x355d889980]
/usr/local/mysql/lib/libgalera_smm.so(_Z22gcs_defrag_handle_fragP10gcs_defragPK12gcs_act_fragP7gcs_actb+0x126)[0x7f203a48c496]
/usr/local/mysql/lib/libgalera_smm.so(_Z13gcs_core_recvP8gcs_coreP12gcs_act_rcvdx+0x51c)[0x7f203a4936bc]
/usr/local/mysql/lib/libgalera_smm.so(+0x1eda80)[0x7f203a499a80]
/lib64/libpthread.so.0[0x355dc079d1]
/lib64/libc.so.6(clone+0x6d)[0x355d8e89dd]
The manual page at http://dev.mysql.com/doc/mysql/en/crashing.html contains
information that should help you find out what is causing the crash.
170104 14:43:04 mysqld_safe Number of processes running now: 0
170104 14:43:04 mysqld_safe WSREP: not restarting wsrep node automatically
170104 14:43:04 mysqld_safe mysqld from pid file /db/mysql_procfiles/mysql.pid ended



 Comments   
Comment by Daniel Black [ 2017-02-26 ]

It looks like you hand compiled a galera shared library where the fault (SIGBUS) occurred? Which galera version are you using? What processor do you have? In what state was the cluster before this was run? Can you provide your configuration files?

Using gdb are you able to attach the output from:

gdb /usr/local/mysql/lib/libgalera_smm.so -batch -ex 'disassemble /m gcs_defrag_handle_frag'

Comment by Richard Stracke [ 2018-01-02 ]

similar stacktrace can be found here:

Galera Issue #324

Here the issue is caused by insufficient diskspace.

Interesting comment from Alexey

I can confirm that it is possible to create a page that would exceed the remaining physical size of the storage and signal 7 would be generated when there is attempt to write beyond it.

Ok, so the potential fix for this bug would be to check for available disk space and return error if it is not enough for the page file. This may work relatively reliably if access to the partition is limited to gcache. However the problem is the error return.
 
What may be (naively) expected in this case is that the transaction whose writeset could not be allocated is rolled back and things continue as usual. Unfortunately it is not so. The problem is that the writeset is cached AFTER having been replicated and as a result the node does not know its fate on other nodes. So it must assume the worst and terminate.
 
So this solution ends up being only marginally better than bus error signal, since the node leaves the cluster gracefully. However bus error is postponed until that part of the page is actually accessed and as such may never happen, giving the node a chance to survive

It is fixed in galera 25.3.12 --> MariaDB 10.0.24

Comment by Jan Lindström (Inactive) [ 2018-10-12 ]

1. Stacktrace found in both tickets ( MDEV-11734 & https://github.com/codership/galera/issues/324 ) is same
2. Comments mention that this is fixed in 25.3.12 --> MariaDB 10.0.24
3. This is 3 years old bug and it is galera library fix Galera crash on gcache.page. file creation when disk space filled · Issue #324 · codership/galera This is somewhat related to #317 When I was testing updates for a table having 25MB of data in it's longblob column, finally mysqld crashed with: 13:52:06 UTC - mysqld got signal 7 ; This could...

Generated at Thu Feb 08 07:52:15 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.