[MDEV-20505] Server crash on startup beacuse of bad wsrep configuration Created: 2019-09-05  Updated: 2019-09-12  Resolved: 2019-09-12

Status: Closed
Project: MariaDB Server
Component/s: Galera
Affects Version/s: 10.4.6, 10.4.7
Fix Version/s: 10.4.9

Type: Bug Priority: Major
Reporter: Michal Schorm Assignee: Jan Lindström (Inactive)
Resolution: Fixed Votes: 0
Labels: None
Environment:

Fedora


Attachments: File 10.4.7.coredump.tar.gz    

 Description   

Hello,
some behaviour changed between 10.3 & 10.4 (and also between 10.4.5 & 10.4.6 if I recall correctly), which leads to server crash (SIGSEGV) on startup.

In 10.3, the server will starts correctly, and it will print some lines to the systemd journal, that the wsrep is not properly configured, but the server runs fine.
That's how the server always behave, and how I'd like it to have it.

MariaDB 10.3.17, journal entries:

Sep 05 09:28:42 host-0-0-0-0 mysql-prepare-db-dir[20455]: 2019-09-05  9:28:42 0 [ERROR] WSREP: rsync SST method requires wsrep_cluster_address to be configured on startup.
Sep 05 09:28:45 host-0-0-0-0 mysqld[20555]: 2019-09-05  9:28:45 0 [ERROR] WSREP: rsync SST method requires wsrep_cluster_address to be configured on startup.


I used the exact same configuration for both 10.3 and 10.4:
(Default Fedora Configuration)

# /usr/libexec/mysqld --print-defaults
/usr/libexec/mysqld would have been started with the following arguments:
--binlog_format=ROW --default-storage-engine=innodb --innodb_autoinc_lock_mode=2 --bind-address=0.0.0.0 --wsrep_on=1 --wsrep_provider=/usr/lib64/galera/libgalera_smm.so --wsrep_cluster_name=my_wsrep_cluster --wsrep_slave_threads=1 --wsrep_certify_nonPK=1 --wsrep_max_ws_rows=0 --wsrep_max_ws_size=2147483647 --wsrep_debug=0 --wsrep_convert_LOCK_to_trx=0 --wsrep_retry_autocommit=1 --wsrep_auto_increment_control=1 --wsrep_drupal_282555_workaround=0 --wsrep_causal_reads=0 --wsrep_notify_cmd= --wsrep_sst_method=rsync --wsrep_sst_auth=root: --datadir=/var/lib/mysql --socket=/var/lib/mysql/mysql.sock --log-error=/var/log/mariadb/mariadb.log --pid-file=/run/mariadb/mariadb.pid 


The 10.4.7 however, will crash before it can even write a log.
Systemd journal entries:

2019-09-05  8:51:57 0 [ERROR] WSREP: rsync SST method requires wsrep_cluster_address to be configured on startup.
2019-09-05  8:51:57 0 [ERROR] Aborting

Info from the gdb:

Program received signal SIGSEGV, Segmentation fault.
0x0000555555bfd5b2 in wsrep::server_state::state (this=0x0) at /usr/src/debug/mariadb-10.4.7-1.debug.000.fc30.x86_64/wsrep-lib/include/wsrep/server_state.hpp:524
handle_fatal_signal (sig=65536) at /usr/src/debug/mariadb-10.4.7-1.debug.000.fc30.x86_64/sql/signal_handler.cc:103

Attached coredump (23MB extracted) generated by: "coredumpctl -1 dump --output /tmp/10.4.7.coredump" and compressed.

The crash is 100% reproducible, so I can provide more info, if you specify what you want to know.

EDIT:
Probabbly good to add that when properly configured (either with "wsrep_on=0" or properly configured replication), it works fine.


Generated at Thu Feb 08 08:59:59 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.