[MDEV-27682] Bundled wsrep_notify.sh causes mariadbd to freeze during start Created: 2022-01-30  Updated: 2022-11-22  Resolved: 2022-10-04

Status: Closed
Project: MariaDB Server
Component/s: Galera SST
Affects Version/s: 10.5.12
Fix Version/s: 10.3.37, 10.4.27, 10.5.18, 10.6.11, 10.7.7, 10.8.6, 10.9.4, 10.10.2, 10.11.1

Type: Bug Priority: Critical
Reporter: Michal Kozlowski Assignee: Julius Goryavsky
Resolution: Fixed Votes: 1
Labels: None
Environment:

Debian 11 5.10.0-11-amd64 #1 SMP Debian 5.10.92-1 (2022-01-18) x86_64 GNU/Linux


Attachments: Text File journal.log     Text File mysql-error.log    

 Description   

starting with galera_new_cluster stuck at this point

mysql      29610  0.6  2.5 1159880 99828 ?       Ssl  02:48   0:00 /usr/sbin/mariadbd --wsrep-new-cluster --wsrep_start_position=09e652bc-8169-11ec-848c-dfaaaea09e0a:999
mysql      29638  0.0  0.0   2420   528 ?        S    02:48   0:00  \_ sh -c /usr/local/bin/wsrep_notify.sh --status initialized
mysql      29639  0.0  0.0   2420   524 ?        S    02:48   0:00      \_ /bin/sh -eu /usr/local/bin/wsrep_notify.sh --status initialized
mysql      29641  0.0  0.1  20104  7828 ?        S    02:48   0:00          \_ mysql -B

root@frytka:~# strace -p 29610
strace: Process 29610 attached
wait4(29638,

There are connection errors in error.log on every attempt to execute wsrep_notify.sh

2022-01-30  2:48:48 1 [Note] WSREP: Server status change disconnected -> connected
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/run/mysqld/mysqld.sock' (111)
2022-01-30  2:48:48 1 [ERROR] WSREP: Process completed with error: /usr/local/bin/wsrep_notify.sh --status connected: 1 (Operation not permitted)
2022-01-30  2:48:48 1 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/local/bin/wsrep_notify.sh --status connected"
 
2022-01-30  2:48:48 1 [Note] WSREP: Server status change connected -> joiner
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/run/mysqld/mysqld.sock' (111)
2022-01-30  2:48:48 1 [ERROR] WSREP: Process completed with error: /usr/local/bin/wsrep_notify.sh --status joiner: 1 (Operation not permitted)
2022-01-30  2:48:48 1 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/local/bin/wsrep_notify.sh --status joiner"
 
2022-01-30  2:48:48 1 [Note] WSREP: Server status change joiner -> initializing
ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/run/mysqld/mysqld.sock' (111)
2022-01-30  2:48:48 1 [ERROR] WSREP: Process completed with error: /usr/local/bin/wsrep_notify.sh --status initializing: 1 (Operation not permitted)
2022-01-30  2:48:48 1 [ERROR] WSREP: Notification command failed: 1 (Operation not permitted): "/usr/local/bin/wsrep_notify.sh --status initializing"

But:
1: when it's in frozen state, I can't connect either
2: after killing (SIGKILL) mariadbd and restarting without wsrep_notify_cmd everything is working normally
3: I've modified the script to connect as service account (mysql) via socket, but had the same issue when trying to use TCP to localhost

Config I'm running:

[sst]
encrypt=3
sst-log-archive-dir=/var/log/mysql/
sst-log-archive=0
tkey = /etc/mysql/ssl/key.pem
tcert = /etc/mysql/ssl/cert.pem
tca = /etc/mysql/ssl/ca.pem
ssl-mode=VERIFY_CA
 
[galera]
wsrep_on                 = ON
wsrep_cluster_name       = "clustername"
wsrep_provider           = /usr/lib/galera/libgalera_smm.so
wsrep_cluster_address    = gcomm://host02.domain.com,host01.domain.com,host03.domain.com
binlog_format            = row
default_storage_engine   = InnoDB
innodb_autoinc_lock_mode = 2
innodb_doublewrite = 1
 
wsrep_notify_cmd=/usr/local/bin/wsrep_notify.sh
 
wsrep_provider_options="socket.ssl_cert=/etc/mysql/ssl/cert.pem;socket.ssl_key=/etc/mysql/ssl/key.pem;socket.ssl_ca=/etc/mysql/ssl/ca.pem"
#wsrep_sst_method = rsync_wan
wsrep_sst_method = mariabackup
wsrep_sst_auth = mysql:
 
 
wsrep_node_address = host01.domain.com



 Comments   
Comment by Nilnandan Joshi [ 2022-04-05 ]

I'm facing almost same issue. When I use wsrep_notify_cmd=/home/mh/instances/10508/wsrep_notify.sh in my.cnf and try to bootstrap very first node, it's getting started but hanging.

2021-12-18  2:40:39 0 [Note] InnoDB: 128 rollback segments are active.
2021-12-18  2:40:39 0 [Note] InnoDB: Creating shared tablespace for temporary tables
2021-12-18  2:40:39 0 [Note] InnoDB: Setting file './ibtmp1' size to 12 MB. Physically writing the file full; Please wait ...
2021-12-18  2:40:39 0 [Note] InnoDB: File './ibtmp1' size is now 12 MB.
2021-12-18  2:40:39 0 [Note] InnoDB: 10.5.8 started; log sequence number 556538; transaction id 1404
2021-12-18  2:40:39 0 [Note] Plugin 'FEEDBACK' is disabled.
2021-12-18  2:40:39 0 [Note] InnoDB: Loading buffer pool(s) from /home/mh/instances/mariadb-10.5.8-linux-x86_64.10508/data/ib_buffer_pool
2021-12-18  2:40:39 0 [Note] InnoDB: Buffer pool(s) load completed at 211218  2:40:39
2021-12-18  2:40:39 0 [Note] Server socket created on IP: '::'.
2021-12-18  2:40:39 0 [Note] WSREP: wsrep_init_schema_and_SR 0x0
2021-12-18  2:40:39 0 [Note] WSREP: Server initialized
2021-12-18  2:40:39 0 [Note] WSREP: Server status change initializing -> initialized
2021-12-18  2:40:39 2 [Note] WSREP: Bootstrapping a new cluster, setting initial position to 00000000-0000-0000-0000-000000000000:-1
2021-12-18  2:40:39 5 [Note] WSREP: Recovered cluster id 5e868df3-5fd5-11ec-beab-eef60befafd4
2021-12-18  2:40:39 2 [Note] WSREP: Server status change initialized -> joined

Even I can't login to server.

[root@nilcentos7 10508]# mysql -h127.0.0.1 -uneel -pnil@123 -P 10508
^C
[root@nilcentos7 10508]# 

I've created user like below, add details into wsrep_notify.sh script something like below

MariaDB [(none)]> GRANT ALL PRIVILEGES ON *.* TO 'neel'@'localhost' IDENTIFIED BY 'neel@123';
Query OK, 0 rows affected (0.032 sec)

My script has below changes. (I have created neel@localhost user at MariaDB server)

USER=neel
PASS=nil@123
HOST=127.0.0.1
PORT=10508
SCHEMA="mtr_wsrep_notify"
MEMB_TABLE="$SCHEMA.membership"
STATUS_TABLE="$SCHEMA.status"
..
..
.
case $STATUS in
    "joined" | "donor" | "synced")
        $COM | mysql -B -u$USER -p$PASS -h$HOST -P$PORT
        ;;
    *)
        exit 0
        ;;
esac

Comment by Jan Lindström (Inactive) [ 2022-09-28 ]

ok to push

Comment by Julius Goryavsky [ 2022-10-04 ]

Fixed, https://github.com/MariaDB/server/commit/19f0b96d53dec47d7b8680c44997afba2ed7431e

Comment by Marko Mäkelä [ 2022-10-06 ]

sysprg, this patch caused conflicts on merge to 10.4, and the test galera.galera_var_notify_ssl_ipv6 started to hang in both local environments where I tested it. Also, some test (name unknown, but hopefully it is that one) started to hang on several buildbot workers. Please fix.

In the future, please provide branches for newer versions if there are conflicts. 10.3 uses Galera 3 and newer versions use Galera 4. That can make a huge difference.

Generated at Thu Feb 08 09:54:46 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.