Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-34234

Delayed SST when running on unprivileged containers on RHEL9

Details

    Description

      When running with drop privileges on a RHEL9 host, mariadb WSREP reliance on `lsof` may cause SST to take a while before starting. The culprit is wsrep scripts preference for lsof. While ss works alright on these containers, lsof hangs forever, until the timeout defined here:

      https://github.com/MariaDB/server/blob/44b23bb18476b9f2f3d88fabab41f206f3531f98/scripts/wsrep_sst_mariabackup.sh#L802-L819

      is complete.

      Given logic exists to use `lsof`, `sockstat` or `ss`

      https://github.com/MariaDB/server/blob/44974a0788c80ba6b309fc0c05f68f101a40293a/scripts/wsrep_sst_common.sh#L1210-L1228

      CMAKE should be adjusted to list `lsof` as a soft dependency.

      Attachments

        1. docker-compose.yml
          2 kB
        2. joiner.log
          23 kB
        3. sst_logs_with_lsof.log
          24 kB
        4. sst_logs_without_lsof.log
          26 kB

        Activity

          danblack Daniel Black added a comment -

          After docker-compose up donor the logs for docker-compose up joiner are attached without serious delay.

          Can you include your logs of the SST?

          Did yo manually change to sockstat or ss and observer the quicker sst?

          danblack Daniel Black added a comment - After docker-compose up donor the logs for docker-compose up joiner are attached without serious delay. Can you include your logs of the SST? Did yo manually change to sockstat or ss and observer the quicker sst?
          trixpan Andre F de M added a comment -

          from reading on lsof inside of containers the issue seems to be related to the docker host and the need to run the container in privileged mode.

          https://unix.stackexchange.com/questions/722948/lsof-is-frozen-inside-the-container

          issues with ulimits

          https://github.com/docker/for-linux/issues/73

          I tested a container built without lsof (literally rm'ing it), startup time is immediate.

          I also observe the RPM dependencies of MariaDB are a bit curious. We require both `lsof` and `iproute` (which contains `ss`) packages, despite `lsof` taking precedence over `ss`.

          The proposed PR still leaves the `ss` dependency as it is but makes `lsof` optional. By doing that, the actual script behaviour which can somehow be described as:

          if lsof is available, use it
          else
          if sockstat is available, use it
          else
          if ss is available, use it
          else
          complain

          Will be respected.

          trixpan Andre F de M added a comment - from reading on lsof inside of containers the issue seems to be related to the docker host and the need to run the container in privileged mode. https://unix.stackexchange.com/questions/722948/lsof-is-frozen-inside-the-container issues with ulimits https://github.com/docker/for-linux/issues/73 I tested a container built without lsof (literally rm'ing it), startup time is immediate. I also observe the RPM dependencies of MariaDB are a bit curious. We require both `lsof` and `iproute` (which contains `ss`) packages, despite `lsof` taking precedence over `ss`. The proposed PR still leaves the `ss` dependency as it is but makes `lsof` optional. By doing that, the actual script behaviour which can somehow be described as: if lsof is available, use it else if sockstat is available, use it else if ss is available, use it else complain Will be respected.
          trixpan Andre F de M added a comment -

          Without LSOF:
          As you can see on logs without `lsof` SST starts immediately after the joiner comes up.

          With LSOF:
          The container starts, however, as soon we see:

          WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u openssl-listen:4444,reuseaddr,<SSL-DETAILS> stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} ) (20240525 01:35:39.721)
          2024-05-25  1:35:42 0 [Note] WSREP: (1758f2de-8acb, 'ssl://0.0.0.0:4567') turning message relay requesting off
          2024-05-25  1:36:09 0 [Note] WSREP: Joiner waited 30 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:36:39 0 [Note] WSREP: Joiner waited 60 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:37:09 0 [Note] WSREP: Joiner waited 90 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:37:39 0 [Note] WSREP: Joiner waited 120 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:38:09 0 [Note] WSREP: Joiner waited 150 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:38:39 0 [Note] WSREP: Joiner waited 180 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:39:09 0 [Note] WSREP: Joiner waited 210 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:39:39 0 [Note] WSREP: Joiner waited 240 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:40:09 0 [Note] WSREP: Joiner waited 270 sec, extending systemd startup timeout as SSTis not completed
          2024-05-25  1:40:21 2 [Note] WSREP: Prepared SST request: mariabackup|joiner:d1a530298e61c2b29fc62b2f9d7d926a@joiner:4444/xtrabackup_sst//1
          

          Checking the container you see:

          mysql        427  0.0  0.0   5652  3840 ?        S    01:35   0:00 /usr/bin/bash /usr//bin/wsrep_sst_mariabackup --role joiner --address joiner
          mysql        703  0.0  0.0   5652  2280 ?        S    01:35   0:00 /usr/bin/bash /usr//bin/wsrep_sst_mariabackup --role joiner --address joiner
          mysql        705  100  0.0   9728  3200 ?        R    01:35   0:43 lsof -Pnl -i :4444
          mysql        706  0.0  0.0   3888  1280 ?        S    01:35   0:00 grep -q -E ^(socat|nc)[^[:space:]]*[[:space:]]+[0-9]+[[:space:]].*\(LISTEN\)
          mysql        709  0.0  0.0   5320  2560 ?        S    01:35   0:00 /usr/bin/coreutils --coreutils-prog-shebang=timeout /usr//bin/timeout -k 310 300 socat -u openssl-lis
          mysql        710  0.0  0.0  76672  1280 ?        Sl   01:35   0:00 /usr//bin/mbstream -x
          mysql        712  0.0  0.0   9264  6400 ?        S    01:35   0:00 socat -u openssl-listen:4444,reuseaddr,<SSL_DETAILS>>
          

          The behaviours are consistent during 1st join and re-join (reason one of the logs is a rejoin and the other is a join).

          Note: the names may be a bit messed in terms of who is joiner vs who is donor since I am not hardcoding donors via `wsrep_sst_donor`, instead. I left one of the nodes as "donor" but don't take that too strictly. Just making it easier to read there are 3 nodes running when the SST happened.

          trixpan Andre F de M added a comment - Without LSOF: As you can see on logs without `lsof` SST starts immediately after the joiner comes up. With LSOF: The container starts, however, as soon we see: WSREP_SST: [INFO] Evaluating timeout -k 310 300 socat -u openssl-listen:4444,reuseaddr,<SSL-DETAILS> stdio | '/usr//bin/mbstream' -x; RC=( ${PIPESTATUS[@]} ) (20240525 01:35:39.721) 2024-05-25 1:35:42 0 [Note] WSREP: (1758f2de-8acb, 'ssl://0.0.0.0:4567') turning message relay requesting off 2024-05-25 1:36:09 0 [Note] WSREP: Joiner waited 30 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:36:39 0 [Note] WSREP: Joiner waited 60 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:37:09 0 [Note] WSREP: Joiner waited 90 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:37:39 0 [Note] WSREP: Joiner waited 120 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:38:09 0 [Note] WSREP: Joiner waited 150 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:38:39 0 [Note] WSREP: Joiner waited 180 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:39:09 0 [Note] WSREP: Joiner waited 210 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:39:39 0 [Note] WSREP: Joiner waited 240 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:40:09 0 [Note] WSREP: Joiner waited 270 sec, extending systemd startup timeout as SSTis not completed 2024-05-25 1:40:21 2 [Note] WSREP: Prepared SST request: mariabackup|joiner:d1a530298e61c2b29fc62b2f9d7d926a@joiner:4444/xtrabackup_sst//1 Checking the container you see: mysql 427 0.0 0.0 5652 3840 ? S 01:35 0:00 /usr/bin/bash /usr//bin/wsrep_sst_mariabackup --role joiner --address joiner mysql 703 0.0 0.0 5652 2280 ? S 01:35 0:00 /usr/bin/bash /usr//bin/wsrep_sst_mariabackup --role joiner --address joiner mysql 705 100 0.0 9728 3200 ? R 01:35 0:43 lsof -Pnl -i :4444 mysql 706 0.0 0.0 3888 1280 ? S 01:35 0:00 grep -q -E ^(socat|nc)[^[:space:]]*[[:space:]]+[0-9]+[[:space:]].*\(LISTEN\) mysql 709 0.0 0.0 5320 2560 ? S 01:35 0:00 /usr/bin/coreutils --coreutils-prog-shebang=timeout /usr//bin/timeout -k 310 300 socat -u openssl-lis mysql 710 0.0 0.0 76672 1280 ? Sl 01:35 0:00 /usr//bin/mbstream -x mysql 712 0.0 0.0 9264 6400 ? S 01:35 0:00 socat -u openssl-listen:4444,reuseaddr,<SSL_DETAILS>> The behaviours are consistent during 1st join and re-join (reason one of the logs is a rejoin and the other is a join). Note: the names may be a bit messed in terms of who is joiner vs who is donor since I am not hardcoding donors via `wsrep_sst_donor`, instead. I left one of the nodes as "donor" but don't take that too strictly. Just making it easier to read there are 3 nodes running when the SST happened.

          Thanks, first part of the fix (dependency correction) is merged with the head revision: https://github.com/MariaDB/server/commit/83040474dcb6e9468bb0a7e42f3a31dc6e156d1a

          sysprg Julius Goryavsky added a comment - Thanks, first part of the fix (dependency correction) is merged with the head revision: https://github.com/MariaDB/server/commit/83040474dcb6e9468bb0a7e42f3a31dc6e156d1a

          The regex expressions used for lsof also cause problems with rsync, which is fixed in https://jira.mariadb.org/browse/MDEV-30686 (MDEV-30686 is not directly related to this ticket, although research into this ticket has led to a solution for MDEV-30686).

          sysprg Julius Goryavsky added a comment - The regex expressions used for lsof also cause problems with rsync, which is fixed in https://jira.mariadb.org/browse/MDEV-30686 ( MDEV-30686 is not directly related to this ticket, although research into this ticket has led to a solution for MDEV-30686 ).
          sysprg Julius Goryavsky added a comment - Fixed, https://github.com/MariaDB/server/commit/642195d255619854d13f20b9937b604dbbe2a5fb

          People

            sysprg Julius Goryavsky
            trixpan Andre F de M
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.