Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25434

mariadb container to have HEALTHCHECK

Details

    Attachments

      Issue Links

        Activity

          what is the preferred health check? I can run something like SELECT VERSION(), but we can implement something smart here I think

          abychko Alexey Bychko (Inactive) added a comment - what is the preferred health check? I can run something like SELECT VERSION() , but we can implement something smart here I think
          danblack Daniel Black added a comment -

          the docs mentions something like the output is recorded. So innodb recovery and/galera status would be useful here. I'm not sure either of those will push it into unhealthy, but maybe its output would be useful.

          danblack Daniel Black added a comment - the docs mentions something like the output is recorded. So innodb recovery and/galera status would be useful here. I'm not sure either of those will push it into unhealthy, but maybe its output would be useful.
          danblack Daniel Black added a comment - - edited

          Also the purpose is to ensure that the container it has started, so careful not to return 'healthy' on the db initialization entrypoint stages.

          danblack Daniel Black added a comment - - edited Also the purpose is to ensure that the container it has started, so careful not to return 'healthy' on the db initialization entrypoint stages.
          danblack Daniel Black added a comment -

          Also think how this would integrate with a mysql_upgrade in a container (https://github.com/MariaDB/mariadb-docker/issues/350) such that a healthy is returning after upgrade or fresh install.

          danblack Daniel Black added a comment - Also think how this would integrate with a mysql_upgrade in a container ( https://github.com/MariaDB/mariadb-docker/issues/350 ) such that a healthy is returning after upgrade or fresh install.
          danblack Daniel Black added a comment - - edited https://github.com/docker-library/healthcheck/blob/master/mysql/docker-healthcheck maybe a good starting point. readworthy - https://github.com/MariaDB/mariadb-docker/issues/94

          a comment from github:

          I do not feel that generalized healthchecks on the official images are really that useful.
           
          users will have their own idea of what is "healthy"
          it does not actually test that the service is listening to connections outside of localhost (see https://github.com/docker-library/healthcheck for some examples that do more than what's proposed here, including attempting to check whether the service is listening remotely)
          some of the Official Images even purposely start in a localhost only mode for database initialization and then kill and start the main service with full network availability
          after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of
          

          abychko Alexey Bychko (Inactive) added a comment - a comment from github: I do not feel that generalized healthchecks on the official images are really that useful.   users will have their own idea of what is "healthy" it does not actually test that the service is listening to connections outside of localhost (see https://github.com/docker-library/healthcheck for some examples that do more than what's proposed here, including attempting to check whether the service is listening remotely) some of the Official Images even purposely start in a localhost only mode for database initialization and then kill and start the main service with full network availability after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of
          danblack Daniel Black added a comment -

          So there's no way we can do a release with a HEALTHCHECK directive however including the script in the container without a HEALTHCHECK enable should still be valuable (and better than what the user can write themselves) and consistent with current accepted official images.

          Github comment generally corresponds to https://github.com/docker-library/faq#healthcheck and answering the points:

          • many users will have their own idea of what "healthy" means

          Fair call, lets provide a healthcheck script that can provide a basic port exposed check (as readiness) which should provide the base of a user check should they choose to use it.

          • and credentials change over time making generic health checks hard to define

          With the basic readiness as a port based check it probably won't need credentials. But if you go for a full connection, see extra-port and dedicated user below. If a user enables HEALTHCHECK or docker/podman run --health-cmd= then they can be responsible for ensuring those are still there (or the consequence of the script failing and erroring).

          • after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of

          I don't expect this would be too much load, but without a HEALTHCHECK directive this won't be used.

          • Kubernetes does not use Docker's heath checks (opting instead for separate liveness and readiness probes) sometimes things like databases will take too long to initialize, and a defined health check will often cause the orchestration system to prematurely kill the container (docker-library/mysql#439 for instance)

          To account for the kubernetes case, lets make the script accept --liveness option (default to just "readiness") that provides that additional checking those according to the kubernetes documentation - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ .

          Given the liveness (and maybe readiness) test would require a db connection we probably should include the extra-port/extra_max_connections in the default config (https://github.com/MariaDB/mariadb-docker/blob/master/Dockerfile.template#L116) and use that for the connection so we don't fail liveness when we run out of max-connections. Adding an extra user to the entrypoint with a minimum amount of privs would hopefully be immune to the the authentication changes that usually go on, but simply error out if this user doesn't exist is reasonable grounds to just error on the liveliness check.

          Looking forward to seeing what you come up with in terms of a liveness test. I'd be happy for the script to accept a variety of options if you consider various liveness tests that you may not want to be always there.

          Please update documentation on how to use this: https://github.com/docker-library/docs/tree/master/mariadb

          danblack Daniel Black added a comment - So there's no way we can do a release with a HEALTHCHECK directive however including the script in the container without a HEALTHCHECK enable should still be valuable (and better than what the user can write themselves) and consistent with current accepted official images. Github comment generally corresponds to https://github.com/docker-library/faq#healthcheck and answering the points: many users will have their own idea of what "healthy" means Fair call, lets provide a healthcheck script that can provide a basic port exposed check (as readiness) which should provide the base of a user check should they choose to use it. and credentials change over time making generic health checks hard to define With the basic readiness as a port based check it probably won't need credentials. But if you go for a full connection, see extra-port and dedicated user below. If a user enables HEALTHCHECK or docker/podman run --health-cmd= then they can be responsible for ensuring those are still there (or the consequence of the script failing and erroring). after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of I don't expect this would be too much load, but without a HEALTHCHECK directive this won't be used. Kubernetes does not use Docker's heath checks (opting instead for separate liveness and readiness probes) sometimes things like databases will take too long to initialize, and a defined health check will often cause the orchestration system to prematurely kill the container (docker-library/mysql#439 for instance) To account for the kubernetes case, lets make the script accept --liveness option (default to just "readiness") that provides that additional checking those according to the kubernetes documentation - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ . Given the liveness (and maybe readiness) test would require a db connection we probably should include the extra-port/extra_max_connections in the default config ( https://github.com/MariaDB/mariadb-docker/blob/master/Dockerfile.template#L116 ) and use that for the connection so we don't fail liveness when we run out of max-connections. Adding an extra user to the entrypoint with a minimum amount of privs would hopefully be immune to the the authentication changes that usually go on, but simply error out if this user doesn't exist is reasonable grounds to just error on the liveliness check. Looking forward to seeing what you come up with in terms of a liveness test. I'd be happy for the script to accept a variety of options if you consider various liveness tests that you may not want to be always there. Please update documentation on how to use this: https://github.com/docker-library/docs/tree/master/mariadb

          A fairly obvious generic minimal healthcheck for a MariaDB server could be mysqladmin status call.
          It creates a short status line which doesn't require too much expertise to inspect and can be easily compared to the previous results, and it fails when the server is unreachable.

          $ bin/mysqladmin status --protocol=tcp -uc
          Uptime: 895  Threads: 6  Questions: 40  Slow queries: 0  Opens: 20  Flush tables: 1  Open tables: 13  Queries per second avg: 0.044
          

          It does require an existing user account, but it's unprivileged account, just USAGE is enough, which makes it an easier option than some other status-related queries.

          elenst Elena Stepanova added a comment - A fairly obvious generic minimal healthcheck for a MariaDB server could be mysqladmin status call. It creates a short status line which doesn't require too much expertise to inspect and can be easily compared to the previous results, and it fails when the server is unreachable. $ bin/mysqladmin status --protocol=tcp -uc Uptime: 895 Threads: 6 Questions: 40 Slow queries: 0 Opens: 20 Flush tables: 1 Open tables: 13 Queries per second avg: 0.044 It does require an existing user account, but it's unprivileged account, just USAGE is enough, which makes it an easier option than some other status-related queries.
          danblack Daniel Black added a comment - created https://github.com/MariaDB/mariadb-docker/pull/408 , in review with faust
          danblack Daniel Black added a comment -

          Found some odd behaviour with mysqladmin in MDEV-27731 and because I hadn't implemented a --user arg, it would have hit users quickly. Might add it later. mysqladmin ping also there an less output.

          $ git diff
          diff --git a/healthcheck.sh b/healthcheck.sh
          index 32dfec9..3e77781 100755
          --- a/healthcheck.sh
          +++ b/healthcheck.sh
          @@ -69,6 +69,18 @@ connect()
                  return 0
           }
           
          +# PING
          +#
          +# Ping using a tcp
          +ping()
          +{
          +       mysqladmin ${nodefaults:+--no-defaults} \
          +               ${def['file']:+--defaults-file=${def['file']}} \
          +               ${def['extra_file']:+--defaults-extra-file=${def['extra_file']}}  \
          +               ${def['group_suffix']:+--defaults-group-suffix=${def['group_suffix']}}  \
          +               --protocol tcp ping
          +}
          +
           # INNODB_BUFFER_POOL_LOADED
           #
           # Tests the load of the innodb buffer pool as been complete
          

          danblack Daniel Black added a comment - Found some odd behaviour with mysqladmin in MDEV-27731 and because I hadn't implemented a --user arg, it would have hit users quickly. Might add it later. mysqladmin ping also there an less output. $ git diff diff --git a/healthcheck.sh b/healthcheck.sh index 32dfec9..3e77781 100755 --- a/healthcheck.sh +++ b/healthcheck.sh @@ -69,6 +69,18 @@ connect() return 0 } +# PING +# +# Ping using a tcp +ping() +{ + mysqladmin ${nodefaults:+--no-defaults} \ + ${def['file']:+--defaults-file=${def['file']}} \ + ${def['extra_file']:+--defaults-extra-file=${def['extra_file']}} \ + ${def['group_suffix']:+--defaults-group-suffix=${def['group_suffix']}} \ + --protocol tcp ping +} + # INNODB_BUFFER_POOL_LOADED # # Tests the load of the innodb buffer pool as been complete

          When it comes to InnoDB, I think that it might make sense to implement a systemd watchdog (MDEV-24669), a dead man’s switch that would be actuated every now and then, and would be blocked if dict_sys.mutex or dict_sys.latch cannot be acquired for a long time.

          The InnoDB health might also be computed from some performance metrics, but I do not have specific ideas. The checkpoint age (log bytes written since the latest checkpoint) or the number of dirty pages in the buffer pool may not be reasonable measures if idle page flushing (MDEV-24949) is not enabled.

          Perhaps there could be a ‘health formula’ that could be re(de)fined by the DBA to match the deployment? Something like a user-defined function that would return a single number?

          marko Marko Mäkelä added a comment - When it comes to InnoDB, I think that it might make sense to implement a systemd watchdog ( MDEV-24669 ), a dead man’s switch that would be actuated every now and then, and would be blocked if dict_sys.mutex or dict_sys.latch cannot be acquired for a long time. The InnoDB health might also be computed from some performance metrics, but I do not have specific ideas. The checkpoint age (log bytes written since the latest checkpoint) or the number of dirty pages in the buffer pool may not be reasonable measures if idle page flushing ( MDEV-24949 ) is not enabled. Perhaps there could be a ‘health formula’ that could be re(de)fined by the DBA to match the deployment? Something like a user-defined function that would return a single number?
          danblack Daniel Black added a comment -

          Noted. There's some talk of partial integration of systemd features into container runtimes, but its still fairly conceptual only at the moment.

          I was considering a generic query as an argument to the script to leave it to any function the DBA wants. User's aren't currently tied to using the healthcheck script provided.

          danblack Daniel Black added a comment - Noted. There's some talk of partial integration of systemd features into container runtimes, but its still fairly conceptual only at the moment. I was considering a generic query as an argument to the script to leave it to any function the DBA wants. User's aren't currently tied to using the healthcheck script provided.

          People

            danblack Daniel Black
            danblack Daniel Black
            Votes:
            1 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Git Integration

                Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.