When running with drop privileges on a RHEL9 host, mariadb WSREP reliance on `lsof` may cause SST to take a while before starting. The culprit is wsrep scripts preference for lsof. While ss works alright on these containers, lsof hangs forever, until the timeout defined here:
After docker-compose up donor the logs for docker-compose up joiner are attached without serious delay.
Can you include your logs of the SST?
Did yo manually change to sockstat or ss and observer the quicker sst?
Daniel Black
added a comment - After docker-compose up donor the logs for docker-compose up joiner are attached without serious delay.
Can you include your logs of the SST?
Did yo manually change to sockstat or ss and observer the quicker sst?
I tested a container built without lsof (literally rm'ing it), startup time is immediate.
I also observe the RPM dependencies of MariaDB are a bit curious. We require both `lsof` and `iproute` (which contains `ss`) packages, despite `lsof` taking precedence over `ss`.
The proposed PR still leaves the `ss` dependency as it is but makes `lsof` optional. By doing that, the actual script behaviour which can somehow be described as:
if lsof is available, use it
else
if sockstat is available, use it
else
if ss is available, use it
else
complain
Will be respected.
Andre F de M
added a comment - from reading on lsof inside of containers the issue seems to be related to the docker host and the need to run the container in privileged mode.
https://unix.stackexchange.com/questions/722948/lsof-is-frozen-inside-the-container
issues with ulimits
https://github.com/docker/for-linux/issues/73
I tested a container built without lsof (literally rm'ing it), startup time is immediate.
I also observe the RPM dependencies of MariaDB are a bit curious. We require both `lsof` and `iproute` (which contains `ss`) packages, despite `lsof` taking precedence over `ss`.
The proposed PR still leaves the `ss` dependency as it is but makes `lsof` optional. By doing that, the actual script behaviour which can somehow be described as:
if lsof is available, use it
else
if sockstat is available, use it
else
if ss is available, use it
else
complain
Will be respected.
mysql 712 0.0 0.0 9264 6400 ? S 01:35 0:00 socat -u openssl-listen:4444,reuseaddr,<SSL_DETAILS>>
The behaviours are consistent during 1st join and re-join (reason one of the logs is a rejoin and the other is a join).
Note: the names may be a bit messed in terms of who is joiner vs who is donor since I am not hardcoding donors via `wsrep_sst_donor`, instead. I left one of the nodes as "donor" but don't take that too strictly. Just making it easier to read there are 3 nodes running when the SST happened.
Andre F de M
added a comment - Without LSOF:
As you can see on logs without `lsof` SST starts immediately after the joiner comes up.
With LSOF:
The container starts, however, as soon we see:
WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u openssl-listen:4444,reuseaddr,<SSL-DETAILS> stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} ) (20240525 01:35:39.721)
2024-05-25 1:35:42 0 [Note] WSREP: (1758f2de-8acb, 'ssl://0.0.0.0:4567') turning message relay requesting off
2024-05-25 1:36:09 0 [Note] WSREP: Joiner waited 30 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:36:39 0 [Note] WSREP: Joiner waited 60 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:37:09 0 [Note] WSREP: Joiner waited 90 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:37:39 0 [Note] WSREP: Joiner waited 120 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:38:09 0 [Note] WSREP: Joiner waited 150 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:38:39 0 [Note] WSREP: Joiner waited 180 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:39:09 0 [Note] WSREP: Joiner waited 210 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:39:39 0 [Note] WSREP: Joiner waited 240 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:40:09 0 [Note] WSREP: Joiner waited 270 sec, extending systemd startup timeout as SSTis not completed
2024-05-25 1:40:21 2 [Note] WSREP: Prepared SST request: mariabackup|joiner:d1a530298e61c2b29fc62b2f9d7d926a@joiner:4444/xtrabackup_sst//1
Checking the container you see:
mysql 427 0.0 0.0 5652 3840 ? S 01:35 0:00 /usr/bin/bash /usr//bin/wsrep_sst_mariabackup --role joiner --address joiner
mysql 703 0.0 0.0 5652 2280 ? S 01:35 0:00 /usr/bin/bash /usr//bin/wsrep_sst_mariabackup --role joiner --address joiner
mysql 705 100 0.0 9728 3200 ? R 01:35 0:43 lsof -Pnl -i :4444
mysql 706 0.0 0.0 3888 1280 ? S 01:35 0:00 grep -q -E ^(socat|nc)[^[:space:]]*[[:space:]]+[0-9]+[[:space:]].*\(LISTEN\)
mysql 709 0.0 0.0 5320 2560 ? S 01:35 0:00 /usr/bin/coreutils --coreutils-prog-shebang=timeout /usr//bin/timeout -k 310 300 socat -u openssl-lis
mysql 710 0.0 0.0 76672 1280 ? Sl 01:35 0:00 /usr//bin/mbstream -x
mysql 712 0.0 0.0 9264 6400 ? S 01:35 0:00 socat -u openssl-listen:4444,reuseaddr,<SSL_DETAILS>>
The behaviours are consistent during 1st join and re-join (reason one of the logs is a rejoin and the other is a join).
Note: the names may be a bit messed in terms of who is joiner vs who is donor since I am not hardcoding donors via `wsrep_sst_donor`, instead. I left one of the nodes as "donor" but don't take that too strictly. Just making it easier to read there are 3 nodes running when the SST happened.
Julius Goryavsky
added a comment - Thanks, first part of the fix (dependency correction) is merged with the head revision: https://github.com/MariaDB/server/commit/83040474dcb6e9468bb0a7e42f3a31dc6e156d1a
The regex expressions used for lsof also cause problems with rsync, which is fixed in https://jira.mariadb.org/browse/MDEV-30686 (MDEV-30686 is not directly related to this ticket, although research into this ticket has led to a solution for MDEV-30686).
Julius Goryavsky
added a comment - The regex expressions used for lsof also cause problems with rsync, which is fixed in https://jira.mariadb.org/browse/MDEV-30686 ( MDEV-30686 is not directly related to this ticket, although research into this ticket has led to a solution for MDEV-30686 ).
After docker-compose up donor the logs for docker-compose up joiner are attached without serious delay.
Can you include your logs of the SST?
Did yo manually change to sockstat or ss and observer the quicker sst?