[MDEV-25434] mariadb container to have HEALTHCHECK - Jira

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Fix Version/s: 10.2.42, 10.3.33, 10.4.23, 10.5.14, 10.6.6, 10.7.2, 10.8.1
Component/s: Docker
Labels:
None

Description

Reference: https://docs.docker.com/engine/reference/builder/#healthcheck

Attachments

Issue Links

relates to

MDEV-24669 implement innodb_fatal_semaphore_wait_threshold as systemd watchdog task

Open

MDEV-25670 mariadb container needs to run mysql_upgrade

Closed

Activity

Ascending order - Click to sort in descending order

Alexey Bychko (Inactive) added a comment - 2021-04-28 13:00

what is the preferred health check? I can run something like SELECT VERSION(), but we can implement something smart here I think

Alexey Bychko (Inactive) added a comment - 2021-04-28 13:00 what is the preferred health check? I can run something like SELECT VERSION() , but we can implement something smart here I think

Daniel Black added a comment - 2021-04-28 13:35

the docs mentions something like the output is recorded. So innodb recovery and/galera status would be useful here. I'm not sure either of those will push it into unhealthy, but maybe its output would be useful.

Daniel Black added a comment - 2021-04-28 13:35 the docs mentions something like the output is recorded. So innodb recovery and/galera status would be useful here. I'm not sure either of those will push it into unhealthy, but maybe its output would be useful.

Daniel Black added a comment - 2021-04-28 22:23 - edited

Also the purpose is to ensure that the container it has started, so careful not to return 'healthy' on the db initialization entrypoint stages.

Daniel Black added a comment - 2021-04-28 22:23 - edited Also the purpose is to ensure that the container it has started, so careful not to return 'healthy' on the db initialization entrypoint stages.

Daniel Black added a comment - 2021-05-12 14:30

Also think how this would integrate with a mysql_upgrade in a container (https://github.com/MariaDB/mariadb-docker/issues/350) such that a healthy is returning after upgrade or fresh install.

Daniel Black added a comment - 2021-05-12 14:30 Also think how this would integrate with a mysql_upgrade in a container ( https://github.com/MariaDB/mariadb-docker/issues/350 ) such that a healthy is returning after upgrade or fresh install.

Daniel Black added a comment - 2021-05-12 14:33 - edited

https://github.com/docker-library/healthcheck/blob/master/mysql/docker-healthcheck maybe a good starting point.

readworthy - https://github.com/MariaDB/mariadb-docker/issues/94

Daniel Black added a comment - 2021-05-12 14:33 - edited https://github.com/docker-library/healthcheck/blob/master/mysql/docker-healthcheck maybe a good starting point. readworthy - https://github.com/MariaDB/mariadb-docker/issues/94

Alexey Bychko (Inactive) added a comment - 2021-05-12 16:21

a comment from github:

I do not feel that generalized healthchecks on the official images are really that useful.

users will have their own idea of what is "healthy"

it does not actually test that the service is listening to connections outside of localhost (see https://github.com/docker-library/healthcheck for some examples that do more than what's proposed here, including attempting to check whether the service is listening remotely)

some of the Official Images even purposely start in a localhost only mode for database initialization and then kill and start the main service with full network availability

after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of

Alexey Bychko (Inactive) added a comment - 2021-05-12 16:21 a comment from github: I do not feel that generalized healthchecks on the official images are really that useful. users will have their own idea of what is "healthy" it does not actually test that the service is listening to connections outside of localhost (see https://github.com/docker-library/healthcheck for some examples that do more than what's proposed here, including attempting to check whether the service is listening remotely) some of the Official Images even purposely start in a localhost only mode for database initialization and then kill and start the main service with full network availability after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of

Daniel Black added a comment - 2021-05-19 02:08

So there's no way we can do a release with a HEALTHCHECK directive however including the script in the container without a HEALTHCHECK enable should still be valuable (and better than what the user can write themselves) and consistent with current accepted official images.

Github comment generally corresponds to https://github.com/docker-library/faq#healthcheck and answering the points:

many users will have their own idea of what "healthy" means

Fair call, lets provide a healthcheck script that can provide a basic port exposed check (as readiness) which should provide the base of a user check should they choose to use it.

and credentials change over time making generic health checks hard to define

With the basic readiness as a port based check it probably won't need credentials. But if you go for a full connection, see extra-port and dedicated user below. If a user enables HEALTHCHECK or docker/podman run --health-cmd= then they can be responsible for ensuring those are still there (or the consequence of the script failing and erroring).

after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of

I don't expect this would be too much load, but without a HEALTHCHECK directive this won't be used.

Kubernetes does not use Docker's heath checks (opting instead for separate liveness and readiness probes) sometimes things like databases will take too long to initialize, and a defined health check will often cause the orchestration system to prematurely kill the container (docker-library/mysql#439 for instance)

To account for the kubernetes case, lets make the script accept --liveness option (default to just "readiness") that provides that additional checking those according to the kubernetes documentation - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ .

Given the liveness (and maybe readiness) test would require a db connection we probably should include the extra-port/extra_max_connections in the default config (https://github.com/MariaDB/mariadb-docker/blob/master/Dockerfile.template#L116) and use that for the connection so we don't fail liveness when we run out of max-connections. Adding an extra user to the entrypoint with a minimum amount of privs would hopefully be immune to the the authentication changes that usually go on, but simply error out if this user doesn't exist is reasonable grounds to just error on the liveliness check.

Looking forward to seeing what you come up with in terms of a liveness test. I'd be happy for the script to accept a variety of options if you consider various liveness tests that you may not want to be always there.

Please update documentation on how to use this: https://github.com/docker-library/docs/tree/master/mariadb

Daniel Black added a comment - 2021-05-19 02:08 So there's no way we can do a release with a HEALTHCHECK directive however including the script in the container without a HEALTHCHECK enable should still be valuable (and better than what the user can write themselves) and consistent with current accepted official images. Github comment generally corresponds to https://github.com/docker-library/faq#healthcheck and answering the points: many users will have their own idea of what "healthy" means Fair call, lets provide a healthcheck script that can provide a basic port exposed check (as readiness) which should provide the base of a user check should they choose to use it. and credentials change over time making generic health checks hard to define With the basic readiness as a port based check it probably won't need credentials. But if you go for a full connection, see extra-port and dedicated user below. If a user enables HEALTHCHECK or docker/podman run --health-cmd= then they can be responsible for ensuring those are still there (or the consequence of the script failing and erroring). after upgrading their images, current users will have extra unexpected load on their systems for healthchecks they don't necessarily need/want and may be unaware of I don't expect this would be too much load, but without a HEALTHCHECK directive this won't be used. Kubernetes does not use Docker's heath checks (opting instead for separate liveness and readiness probes) sometimes things like databases will take too long to initialize, and a defined health check will often cause the orchestration system to prematurely kill the container (docker-library/mysql#439 for instance) To account for the kubernetes case, lets make the script accept --liveness option (default to just "readiness") that provides that additional checking those according to the kubernetes documentation - https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/ . Given the liveness (and maybe readiness) test would require a db connection we probably should include the extra-port/extra_max_connections in the default config ( https://github.com/MariaDB/mariadb-docker/blob/master/Dockerfile.template#L116 ) and use that for the connection so we don't fail liveness when we run out of max-connections. Adding an extra user to the entrypoint with a minimum amount of privs would hopefully be immune to the the authentication changes that usually go on, but simply error out if this user doesn't exist is reasonable grounds to just error on the liveliness check. Looking forward to seeing what you come up with in terms of a liveness test. I'd be happy for the script to accept a variety of options if you consider various liveness tests that you may not want to be always there. Please update documentation on how to use this: https://github.com/docker-library/docs/tree/master/mariadb

Elena Stepanova added a comment - 2021-06-23 18:21

A fairly obvious generic minimal healthcheck for a MariaDB server could be mysqladmin status call.
It creates a short status line which doesn't require too much expertise to inspect and can be easily compared to the previous results, and it fails when the server is unreachable.

$ bin/mysqladmin status --protocol=tcp -uc

Uptime: 895  Threads: 6  Questions: 40  Slow queries: 0  Opens: 20  Flush tables: 1  Open tables: 13  Queries per second avg: 0.044

It does require an existing user account, but it's unprivileged account, just USAGE is enough, which makes it an easier option than some other status-related queries.

Elena Stepanova added a comment - 2021-06-23 18:21 A fairly obvious generic minimal healthcheck for a MariaDB server could be mysqladmin status call. It creates a short status line which doesn't require too much expertise to inspect and can be easily compared to the previous results, and it fails when the server is unreachable. $ bin/mysqladmin status --protocol=tcp -uc Uptime: 895 Threads: 6 Questions: 40 Slow queries: 0 Opens: 20 Flush tables: 1 Open tables: 13 Queries per second avg: 0.044 It does require an existing user account, but it's unprivileged account, just USAGE is enough, which makes it an easier option than some other status-related queries.

Daniel Black added a comment - 2022-02-01 08:10

created https://github.com/MariaDB/mariadb-docker/pull/408, in review with faust

Daniel Black added a comment - 2022-02-01 08:10 created https://github.com/MariaDB/mariadb-docker/pull/408 , in review with faust

Daniel Black added a comment - 2022-02-03 00:08

Found some odd behaviour with mysqladmin in MDEV-27731 and because I hadn't implemented a --user arg, it would have hit users quickly. Might add it later. mysqladmin ping also there an less output.

$ git diff

diff --git a/healthcheck.sh b/healthcheck.sh

index 32dfec9..3e77781 100755

--- a/healthcheck.sh

+++ b/healthcheck.sh

@@ -69,6 +69,18 @@ connect()

        return 0

+# PING

+#

+# Ping using a tcp

+ping()

+{

+       mysqladmin ${nodefaults:+--no-defaults} \

+               ${def['file']:+--defaults-file=${def['file']}} \

+               ${def['extra_file']:+--defaults-extra-file=${def['extra_file']}}  \

+               ${def['group_suffix']:+--defaults-group-suffix=${def['group_suffix']}}  \

+               --protocol tcp ping

+}

 # INNODB_BUFFER_POOL_LOADED

 # Tests the load of the innodb buffer pool as been complete

Daniel Black added a comment - 2022-02-03 00:08 Found some odd behaviour with mysqladmin in MDEV-27731 and because I hadn't implemented a --user arg, it would have hit users quickly. Might add it later. mysqladmin ping also there an less output. $ git diff diff --git a/healthcheck.sh b/healthcheck.sh index 32dfec9..3e77781 100755 --- a/healthcheck.sh +++ b/healthcheck.sh @@ -69,6 +69,18 @@ connect() return 0 } +# PING +# +# Ping using a tcp +ping() +{ + mysqladmin ${nodefaults:+--no-defaults} \ + ${def['file']:+--defaults-file=${def['file']}} \ + ${def['extra_file']:+--defaults-extra-file=${def['extra_file']}} \ + ${def['group_suffix']:+--defaults-group-suffix=${def['group_suffix']}} \ + --protocol tcp ping +} + # INNODB_BUFFER_POOL_LOADED # # Tests the load of the innodb buffer pool as been complete

Marko Mäkelä added a comment - 2022-03-30 15:28

When it comes to InnoDB, I think that it might make sense to implement a systemd watchdog (MDEV-24669), a dead man’s switch that would be actuated every now and then, and would be blocked if dict_sys.mutex or dict_sys.latch cannot be acquired for a long time.

The InnoDB health might also be computed from some performance metrics, but I do not have specific ideas. The checkpoint age (log bytes written since the latest checkpoint) or the number of dirty pages in the buffer pool may not be reasonable measures if idle page flushing (~~MDEV-24949~~) is not enabled.

Perhaps there could be a ‘health formula’ that could be re(de)fined by the DBA to match the deployment? Something like a user-defined function that would return a single number?

Marko Mäkelä added a comment - 2022-03-30 15:28 When it comes to InnoDB, I think that it might make sense to implement a systemd watchdog ( MDEV-24669 ), a dead man’s switch that would be actuated every now and then, and would be blocked if dict_sys.mutex or dict_sys.latch cannot be acquired for a long time. The InnoDB health might also be computed from some performance metrics, but I do not have specific ideas. The checkpoint age (log bytes written since the latest checkpoint) or the number of dirty pages in the buffer pool may not be reasonable measures if idle page flushing ( MDEV-24949 ) is not enabled. Perhaps there could be a ‘health formula’ that could be re(de)fined by the DBA to match the deployment? Something like a user-defined function that would return a single number?

Daniel Black added a comment - 2022-03-31 03:29

Noted. There's some talk of partial integration of systemd features into container runtimes, but its still fairly conceptual only at the moment.

I was considering a generic query as an argument to the script to leave it to any function the DBA wants. User's aren't currently tied to using the healthcheck script provided.

Daniel Black added a comment - 2022-03-31 03:29 Noted. There's some talk of partial integration of systemd features into container runtimes, but its still fairly conceptual only at the moment. I was considering a generic query as an argument to the script to leave it to any function the DBA wants. User's aren't currently tied to using the healthcheck script provided.

People

Assignee:: Daniel Black

Reporter:: Daniel Black

Votes:: 1 Vote for this issue

Watchers:: 6 Start watching this issue

Dates

Created:: 2021-04-17 03:00

Updated:: 2022-03-31 03:29

Resolved:: 2022-02-02 22:58

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server