[MDEV-9961] Garbd segfaults when using a configuration file on RH6 Created: 2016-04-21  Updated: 2017-11-29  Resolved: 2016-11-18

Status: Closed
Project: MariaDB Server
Component/s: Galera Arbitrator garbd
Affects Version/s: 10.0.24-galera
Fix Version/s: 10.1.20, 10.2.3, 5.5.54-galera, 10.0.29-galera

Type: Bug Priority: Major
Reporter: Art van Scheppingen Assignee: Nirbhay Choubey (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Redhat 6.7


Attachments: Text File garbd_strace.txt    
Issue Links:
Relates
relates to MDEV-14530 garbd binary from linux generic "glib... Closed

 Description   

On a Redhat 6.7 installation we see Garbd segfaulting when we use a configuration file when starting Garbd, on Redhat 7.1 we don't see this issue happening.

[user@n1 ~]$ sudo /usr/bin/garbd --cfg /etc/garbd.cnf
[user@n1 ~]$ echo $?
139

Nothing gets logged to the garbd log file.
If we provide all the options inside the configuration file manually on the command line like this everything is fine:

sudo /usr/bin/garbd  -g my_wsrep_cluster -a "gcomm://10.10.16.12:4567,10.10.16.13:4567" -o "gmcast.listen_addr=tcp://0.0.0.0:4567" -l /var/log/garbd.log

The configuration files get generated by ClusterControl for both OS-es and contain the exact same information:

[sudo@n1 ~]$ sudo cat /etc/garbd.cnf
address = gcomm://10.10.16.12:4567,10.10.16.13:4567
group = my_wsrep_cluster
options = gmcast.listen_addr=tcp://0.0.0.0:4567
log = /var/log/garbd.log

Also the garbd.log file exists and has the right permissions: if we start up Garbd with all the commandline parameters it is fine.

Using strace gave us the folowing last few lines:

open("/etc/garbd.cnf", O_RDONLY)        = 3
read(3, "address=gcomm://10.10.16.12:4567"..., 8191) = 143
read(3, "", 8191)                       = 0
close(3)                                = 0
open("/var/log/garbd.log", O_WRONLY|O_CREAT|O_APPEND, 0666) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=11312, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5bcc8b9000
fstat(3, {st_mode=S_IFREG|0644, st_size=11312, ...}) = 0
lseek(3, 11312, SEEK_SET)               = 11312
gettimeofday({1461197582, 39668}, NULL) = 0
open("/etc/localtime", O_RDONLY)        = 4
fstat(4, {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
fstat(4, {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5bcc8b8000
read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 118
lseek(4, -62, SEEK_CUR)                 = 56
read(4, "TZif2\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\1\0\0\0\1\0\0\0\0"..., 4096) = 62
close(4)                                = 0
munmap(0x7f5bcc8b8000, 4096)            = 0
write(3, "2016-04-21 00:13:02.039  INFO: C"..., 69) = 69
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_MAPERR, si_addr=0xfffffffffffffff8} ---
+++ killed by SIGSEGV +++

I have added the full strace to the report.

The package we install:
galera-25.3.15-1.rhel6.el6.x86_64



 Comments   
Comment by Daniel Black [ 2016-04-21 ]

I'd suggest setting 'ulimit -c unlimited' in the shell before running garb and then using "gdb `which garbd` core" to get a backtrace ("thread apply all bt full").

When you do use strace, use -s99 so that the text strings aren't limited to something too small.

Comment by Art van Scheppingen [ 2016-04-21 ]

Thanks for the suggestion.
This is the backtrace I get:
(gdb) thread apply all bt full

Thread 1 (Thread 0x7fc412b177e0 (LWP 30096)):
#0 0x0000000000428a05 in _gnu_cxx::_exchange_and_add_dispatch ()
No symbol table info available.
#1 0x0000000000437fbd in garb::Config::Config(int, char**) ()
No symbol table info available.
#2 0x0000000000429fe6 in garb::main(int, char**) ()
No symbol table info available.
#3 0x000000000042a21d in main ()
No symbol table info available.

Do you need the core dump I created as well?

Comment by David Kedves [ 2016-06-01 ]

I got exactly the same crash [ http://ftp.vim.org/db/mariadb/yum/10.1/centos6-amd64/rpms/galera-25.3.15-1.rhel6.el6.x86_64.rpm ]
When used (any) config file (--cfg), I am pretty sure that there is some building issue as when I re-built the garbd using sources with same-version
[ http://archive.mariadb.org/mariadb-10.1.13-old/galera-25.3.15/src/galera-25.3.15.tar.gz ] this issue is no longer happening..

So my suggestion is, could you please guys re-build the package (and update the yum repos) on a recent centos6/rhel6 system?
(maybe this bug happening because of some older g++ was involved or dunno...)

Comment by Geoff Montee (Inactive) [ 2016-11-03 ]

I submitted the same bug to Codership a while ago:

https://github.com/codership/galera/issues/430

I did not realize that this bug may be specific to MariaDB's galera packages.

Comment by Nirbhay Choubey (Inactive) [ 2016-11-18 ]

Fixed in Galera-25.3.19.

Generated at Thu Feb 08 07:38:37 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.