[MDEV-4344] Galera: Server crashes on setting wsrep_cluster_address with old version of libgcc Created: 2013-03-31  Updated: 2023-11-03

Status: Open
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 5.5.29-galera
Fix Version/s: N/A

Type: Bug Priority: Major
Reporter: Elena Stepanova Assignee: Ramesh Sivaraman
Resolution: Unresolved Votes: 0
Labels: galera, need_verification
Environment:

CentOS release 5.3 i386 libgcc 4.1.2-44.el5


Issue Links:
Relates
relates to MDEV-4242 Galera in buildbot: SELinux in CentOS... Closed

 Description   

We have a package installation test, which does the following:

  • install MariaDB-Galera server, MariaDB client and Galera library;
  • start server with the default configuration (that is, without any wsrep* options);
  • run

    mysql -uroot -e 'set global wsrep_provider="/usr/lib/galera/libgalera_smm.so"; set global wsrep_cluster_address="gcomm://"'

When it's run on CentOS 5.3 i386, it crashes on the 2nd SET statement.
The problem is 100% reproducible, both on our pre-built RPMs and on a custom debug build.
Debug doesn't help much though, since the coredump comes corrupted, and stack trace printed in the error log is not very good either; but everything starts with "terminate called after throwing an instance of 'gu::NotFound'":

130331 19:54:04 [Note] WSREP: Start replication
130331 19:54:04 [Note] WSREP: Setting initial position to 00000000-0000-0000-0000-000000000000:-1
130331 19:54:04 [Note] WSREP: protonet asio version 0
130331 19:54:04 [Note] WSREP: backend: asio
terminate called after throwing an instance of 'gu::NotFound'
130331 19:54:04 [ERROR] mysqld got signal 6 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed, 
something is definitely wrong and this may fail.
 
Server version: 5.5.29-MariaDB-debug
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=1
It is possible that mysqld could use up to 
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 465455 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0xa15c068
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x93b34384 thread_stack 0x48000
mysys/stacktrace.c:246(my_print_stacktrace)[0x88f646b]
sql/signal_handler.cc:155(handle_fatal_signal)[0x83dbdc9]
??:0(??)[0xca7420]
??:0(??)[0x34f691]
??:0(??)[0x2884c0]
??:0(??)[0x285f25]
??:0(??)[0x285f62]
??:0(??)[0x28609a]
??:0(??)[0x7f1ca6]
??:0(??)[0x85a8fc]
??:0(??)[0x87a262]
??:0(??)[0x89844e]
??:0(??)[0x8e475b]
??:0(??)[0x8e1430]
??:0(??)[0x8d83bb]
??:0(??)[0x8dcc9c]
??:0(??)[0x927544]
??:0(??)[0x9431ae]
sql/wsrep_mysqld.cc:679(wsrep_start_replication())[0x836ed54]
sql/wsrep_var.cc:333(wsrep_cluster_address_update(sys_var*, THD*, enum_var_type))[0x837810a]
sql/set_var.cc:200(sys_var::update(THD*, set_var*))[0x817e373]
sql/set_var.cc:670(set_var::update(THD*))[0x817ff1a]
sql/set_var.cc:574(sql_set_variables(THD*, List<set_var_base>*))[0x817f0c1]
sql/sql_parse.cc:3540(mysql_execute_command(THD*))[0x821d27e]
sql/sql_parse.cc:6318(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x8222774]
sql/sql_parse.cc:6155(wsrep_mysql_parse)[0x8223362]
sql/sql_parse.cc:1250(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x8224882]
sql/sql_parse.cc:891(do_command(THD*))[0x8226834]
sql/sql_connect.cc:1291(do_handle_one_connection(THD*))[0x83118c1]
sql/sql_connect.cc:1200(handle_one_connection)[0x8311a29]
??:0(??)[0x77249b]
??:0(??)[0x3f642e]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0xa1668c0): set global wsrep_cluster_address="gcomm://"
Connection ID (thread ID): 2
Status: NOT_KILLED
 

Further investigation shows that the crash goes away after the currently installed libgcc 4.1.2-44.el5 is upgraded to 4.1.2-54.el5.

I cannot positively determine whether it's a libgcc problem or a server/wsrep/galera bug, passing it to Codership to decide.



 Comments   
Comment by Daniel Black [ 2015-09-19 ]

from fale on irc;

libgcc-4.8.3-9.el7.x86_64 (lastest as of today)

Giving SET GLOBAL wsrep_cluster_address='gcomm://';

150919 10:50:36 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
2015-09-19 10:50:36 140724381264000 [Note] /usr/sbin/mysqld (mysqld 10.1.7-MariaDB) starting as process 2533 ...
2015-09-19 10:50:36 140724381264000 [Warning] You need to use --log-bin to make --binlog-format work.
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: The InnoDB memory heap is disabled
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: Memory barrier is not used
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: Compressed tables use zlib 1.2.7
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: Using Linux native AIO
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: Using CPU crc32 instructions
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2015-09-19 10:50:36 140724381264000 [Note] InnoDB: Completed initialization of buffer pool
2015-09-19 10:50:37 140724381264000 [Note] InnoDB: Highest supported file format is Barracuda.
2015-09-19 10:50:37 140724381264000 [Note] InnoDB: 128 rollback segment(s) are active.
2015-09-19 10:50:37 140724381264000 [Note] InnoDB: Waiting for purge to start
2015-09-19 10:50:37 140724381264000 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.25-73.1 started; log sequence number 1617291
2015-09-19 10:50:37 140724381264000 [Note] Plugin 'FEEDBACK' is disabled.
2015-09-19 10:50:37 140723680675584 [Note] InnoDB: Dumping buffer pool(s) not yet started
2015-09-19 10:50:37 140724381264000 [Note] Server socket created on IP: '0.0.0.0'.
2015-09-19 10:50:37 140724381264000 [Note] Event Scheduler: Loaded 0 events
2015-09-19 10:50:37 140724381264000 [Note] /usr/sbin/mysqld: ready for connections.
Version: '10.1.7-MariaDB'  socket: '/var/lib/mysql/mysql.sock'  port: 3306  MariaDB Server
2015-09-19 11:04:41 140724380821248 [Note] WSREP: Stop replication
2015-09-19 11:04:41 140724380821248 [Note] WSREP: Provider was not loaded, in stop replication
2015-09-19 11:04:41 140724380821248 [Note] WSREP: Start replication
150919 11:04:41 [ERROR] mysqld got signal 11 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
 
To report this bug, see http://kb.askmonty.org/en/reporting-bugs
 
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
 
Server version: 10.1.7-MariaDB
key_buffer_size=134217728
read_buffer_size=131072
max_used_connections=1
max_threads=153
thread_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 467102 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
 
Thread pointer: 0x0x7ffcccebc008
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0x7ffcf2badd30 thread_stack 0x48000
mysys/stacktrace.c:247(my_print_stacktrace)[0x7ffcf368002e]
sql/signal_handler.cc:160(handle_fatal_signal)[0x7ffcf31b2d4d]
/lib64/libpthread.so.0(+0xf130)[0x7ffcf27ff130]
sql/wsrep_mysqld.cc:916(wsrep_start_replication())[0x7ffcf3153d53]
sql/wsrep_var.cc:372(wsrep_cluster_address_update(sys_var*, THD*, enum_var_type))[0x7ffcf31600da]
sql/sys_vars_shared.h:74(~AutoWLock)[0x7ffcf2fba9bc]
sql/set_var.cc:796(set_var::update(THD*))[0x7ffcf2fbaba7]
sql/sql_list.h:456(base_list_iterator::next_fast())[0x7ffcf2fbbb69]
sql/sql_parse.cc:4278(mysql_execute_command(THD*))[0x7ffcf303e16f]
sql/sql_parse.cc:7228(mysql_parse(THD*, char*, unsigned int, Parser_state*))[0x7ffcf30415ce]
sql/sql_parse.cc:1488(dispatch_command(enum_server_command, THD*, char*, unsigned int))[0x7ffcf3044a7b]
sql/sql_parse.cc:1112(do_command(THD*))[0x7ffcf30452f9]
sql/sql_connect.cc:1350(do_handle_one_connection(THD*))[0x7ffcf310664a]
sql/sql_connect.cc:1264(handle_one_connection)[0x7ffcf3106820]
/lib64/libpthread.so.0(+0x7df5)[0x7ffcf27f7df5]
/lib64/libc.so.6(clone+0x6d)[0x7ffcf0c1c1ad]
 
Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0x7ffcccf8d020): is an invalid pointer
Connection ID (thread ID): 5
Status: NOT_KILLED

update: wsrep_on != on when the address was set.

Comment by Nirbhay Choubey (Inactive) [ 2015-10-27 ]

The 2 traces do no look similar.

I tried to reproduce this on CentOS-5 on a debug build, but all went fine.

MariaDB [(none)]> select version();
+----------------------+
| version()            |
+----------------------+
| 10.1.8-MariaDB-debug |
+----------------------+
1 row in set (0.00 sec)
 
MariaDB [(none)]> \! uname -a
Linux vm-centos5-amd64 2.6.18-128.el5 #1 SMP Wed Jan 21 10:41:14 EST 2009 x86_64 x86_64 x86_64 GNU/Linux
MariaDB [(none)]> \! gcc --version
gcc (GCC) 4.1.2 20080704 (Red Hat 4.1.2-46)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
 
MariaDB [(none)]> SET GLOBAL wsrep_cluster_address='gcomm://';
Query OK, 0 rows affected (0.00 sec)

Comment by Nirbhay Choubey (Inactive) [ 2015-10-27 ]

I do see an abort within galera library when its loaded.

2015-10-26 23:46:39 47640025622320 [Note] WSREP: CRC-32C: using "slicing-by-8" algorithm.
terminate called after throwing an instance of 'gu::NotSet'
151026 23:46:39 [ERROR] mysqld got signal 6 ;

Generated at Thu Feb 08 06:55:42 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.