Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-25887

"Got notification message from PID xxxx, but reception only permitted for main PID yyyy" in systemd during SST

Details

    Description

      During SST, sometimes MariaDB service systemd failed with

      "Got notification message from PID xxxx, but reception only permitted for main PID yyyy"

      Found an interesting page, which face the same issue:

      https://community.centminmod.com/threads/mariadb-failed-to-restart-at-first-install.18780/

      Root Cause Analysis here is:

      1.The issue may happen when reaching system limits in term of available PIDs, causing PID recycling to be used.

      2. systemd is not always able to find the cgroup the systemd notification was issued from, typically when services spawn many short-living processes.

      3. In such case, systemd collects the PID of the process that sent the notification, browses the list of processes it monitors and delivers the notification to the service that once had a process with that same PID attached to it, which may be wrong in case PIDs were recycled.
      Usually, this has no impact, because usual systemd notification is just I'm alive.
      In case a service defined in systemd with the Type=Notify and NotifyAccess=All settings shuts down quickly before its STOPPING=1 notification was processed by systemd, then systemd may deliver the notification to another service and make that service shut down instead.

      So maybe
      NotifyAccess=all
      is a solution, but I can't estimate a possible drawback.

      Attachments

        Issue Links

          Activity

            p4guru George L added a comment - - edited

            Seems this is sometimes happening still on CentOS 7.9 with Mariadb official YUM repo even with 10.3.35. Maybe this assessment is applicable https://serverfault.com/a/835756/34814

            To the background, the SD_NOTIFY was implemented recently (to allow systemd to follow the service status), but the notification is send wrongly also from the per-user processes, which causes this message from systemd.

            The fix is already prepared and testing build is available in the bug comments. I will fix that soon as mentioned in the comments. Sorry for the noise.

            and journalctl output for mariadb service after an attempted restart

            Logs begin at Tue 2022-06-07 18:03:06 UTC, end at Tue 2022-06-07 20:26:33 UTC. --
            Jun 07 18:14:06 hostname systemd[1]: Starting MariaDB 10.3.35 database server...
            Jun 07 18:14:06 hostname mysqld[14583]: 2022-06-07 18:14:06 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 14583 ...
            Jun 07 18:14:07 hostname systemd[1]: Started MariaDB 10.3.35 database server.
            Jun 07 18:14:19 hostname systemd[1]: Stopping MariaDB 10.3.35 database server...
            Jun 07 18:14:20 hostname systemd[1]: Stopped MariaDB 10.3.35 database server.
            Jun 07 18:14:20 hostname systemd[1]: Starting MariaDB 10.3.35 database server...
            Jun 07 18:14:21 hostname mysqld[15497]: 2022-06-07 18:14:21 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 15497 ...
            Jun 07 18:14:22 hostname systemd[1]: Started MariaDB 10.3.35 database server.
            Jun 07 18:14:22 hostname systemd[1]: Stopping MariaDB 10.3.35 database server...
            Jun 07 18:14:24 hostname systemd[1]: Stopped MariaDB 10.3.35 database server.
            Jun 07 18:14:34 hostname systemd[1]: Starting MariaDB 10.3.35 database server...
            Jun 07 18:14:34 hostname mysqld[15809]: 2022-06-07 18:14:34 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 15809 ...
            Jun 07 18:14:35 hostname systemd[1]: Started MariaDB 10.3.35 database server.
            Jun 07 18:14:41 hostname systemd[1]: Stopping MariaDB 10.3.35 database server...
            Jun 07 18:14:43 hostname systemd[1]: Stopped MariaDB 10.3.35 database server.
            Jun 07 18:14:43 hostname systemd[1]: Starting MariaDB 10.3.35 database server...
            Jun 07 18:14:44 hostname mysqld[16354]: 2022-06-07 18:14:44 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 16354 ...
            Jun 07 18:14:44 hostname systemd[1]: Started MariaDB 10.3.35 database server.
            Jun 07 18:21:25 hostname systemd[1]: Starting MariaDB 10.3.35 database server...
            Jun 07 18:21:25 hostname mysqld[56541]: 2022-06-07 18:21:25 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 56541 ...
            Jun 07 18:21:59 hostname systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:22:00 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:22:00 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known
            Jun 07 18:22:00 hostname systemd[1]: Failed to start MariaDB 10.3.35 database server.
            Jun 07 18:22:00 hostname systemd[1]: Unit mariadb.service entered failed state.
            Jun 07 18:22:00 hostname systemd[1]: mariadb.service failed.
            Jun 07 18:22:04 hostname systemd[1]: Starting MariaDB 10.3.35 database server...
            Jun 07 18:22:04 hostname mysqld[56935]: 2022-06-07 18:22:04 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 56935 ...
            Jun 07 18:22:06 hostname systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
            Jun 07 18:22:06 hostname systemd[1]: Failed to start MariaDB 10.3.35 database server.
            Jun 07 18:22:06 hostname systemd[1]: Unit mariadb.service entered failed state.
            Jun 07 18:22:06 hostname systemd[1]: mariadb.service failed.
            Jun 07 18:25:44 hostname systemd[1]: Starting MariaDB 10.3.35 database server...
            Jun 07 18:25:44 hostname mysqld[58444]: 2022-06-07 18:25:44 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 58444 ...
            Jun 07 18:25:45 hostname systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
            Jun 07 18:25:45 hostname systemd[1]: Failed to start MariaDB 10.3.35 database server.
            Jun 07 18:25:45 hostname systemd[1]: Unit mariadb.service entered failed state.
            Jun 07 18:25:45 hostname systemd[1]: mariadb.service failed.
            Jun 07 18:30:31 hostname systemd[1]: Starting MariaDB 10.3.35 database server...
            Jun 07 18:30:32 hostname mysqld[58794]: 2022-06-07 18:30:32 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 58794 ...
            Jun 07 18:30:33 hostname systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE
            Jun 07 18:30:33 hostname systemd[1]: Failed to start MariaDB 10.3.35 database server.
            Jun 07 18:30:33 hostname systemd[1]: Unit mariadb.service entered failed state.
            Jun 07 18:30:33 hostname systemd[1]: mariadb.service failed.
            

            p4guru George L added a comment - - edited Seems this is sometimes happening still on CentOS 7.9 with Mariadb official YUM repo even with 10.3.35. Maybe this assessment is applicable https://serverfault.com/a/835756/34814 To the background, the SD_NOTIFY was implemented recently (to allow systemd to follow the service status), but the notification is send wrongly also from the per-user processes, which causes this message from systemd. The fix is already prepared and testing build is available in the bug comments. I will fix that soon as mentioned in the comments. Sorry for the noise. and journalctl output for mariadb service after an attempted restart Logs begin at Tue 2022-06-07 18:03:06 UTC, end at Tue 2022-06-07 20:26:33 UTC. -- Jun 07 18:14:06 hostname systemd[1]: Starting MariaDB 10.3.35 database server... Jun 07 18:14:06 hostname mysqld[14583]: 2022-06-07 18:14:06 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 14583 ... Jun 07 18:14:07 hostname systemd[1]: Started MariaDB 10.3.35 database server. Jun 07 18:14:19 hostname systemd[1]: Stopping MariaDB 10.3.35 database server... Jun 07 18:14:20 hostname systemd[1]: Stopped MariaDB 10.3.35 database server. Jun 07 18:14:20 hostname systemd[1]: Starting MariaDB 10.3.35 database server... Jun 07 18:14:21 hostname mysqld[15497]: 2022-06-07 18:14:21 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 15497 ... Jun 07 18:14:22 hostname systemd[1]: Started MariaDB 10.3.35 database server. Jun 07 18:14:22 hostname systemd[1]: Stopping MariaDB 10.3.35 database server... Jun 07 18:14:24 hostname systemd[1]: Stopped MariaDB 10.3.35 database server. Jun 07 18:14:34 hostname systemd[1]: Starting MariaDB 10.3.35 database server... Jun 07 18:14:34 hostname mysqld[15809]: 2022-06-07 18:14:34 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 15809 ... Jun 07 18:14:35 hostname systemd[1]: Started MariaDB 10.3.35 database server. Jun 07 18:14:41 hostname systemd[1]: Stopping MariaDB 10.3.35 database server... Jun 07 18:14:43 hostname systemd[1]: Stopped MariaDB 10.3.35 database server. Jun 07 18:14:43 hostname systemd[1]: Starting MariaDB 10.3.35 database server... Jun 07 18:14:44 hostname mysqld[16354]: 2022-06-07 18:14:44 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 16354 ... Jun 07 18:14:44 hostname systemd[1]: Started MariaDB 10.3.35 database server. Jun 07 18:21:25 hostname systemd[1]: Starting MariaDB 10.3.35 database server... Jun 07 18:21:25 hostname mysqld[56541]: 2022-06-07 18:21:25 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 56541 ... Jun 07 18:21:59 hostname systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:21:59 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:22:00 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:22:00 hostname systemd[1]: Got notification message from PID 16354, but reception only permitted for main PID which is currently not known Jun 07 18:22:00 hostname systemd[1]: Failed to start MariaDB 10.3.35 database server. Jun 07 18:22:00 hostname systemd[1]: Unit mariadb.service entered failed state. Jun 07 18:22:00 hostname systemd[1]: mariadb.service failed. Jun 07 18:22:04 hostname systemd[1]: Starting MariaDB 10.3.35 database server... Jun 07 18:22:04 hostname mysqld[56935]: 2022-06-07 18:22:04 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 56935 ... Jun 07 18:22:06 hostname systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE Jun 07 18:22:06 hostname systemd[1]: Failed to start MariaDB 10.3.35 database server. Jun 07 18:22:06 hostname systemd[1]: Unit mariadb.service entered failed state. Jun 07 18:22:06 hostname systemd[1]: mariadb.service failed. Jun 07 18:25:44 hostname systemd[1]: Starting MariaDB 10.3.35 database server... Jun 07 18:25:44 hostname mysqld[58444]: 2022-06-07 18:25:44 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 58444 ... Jun 07 18:25:45 hostname systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE Jun 07 18:25:45 hostname systemd[1]: Failed to start MariaDB 10.3.35 database server. Jun 07 18:25:45 hostname systemd[1]: Unit mariadb.service entered failed state. Jun 07 18:25:45 hostname systemd[1]: mariadb.service failed. Jun 07 18:30:31 hostname systemd[1]: Starting MariaDB 10.3.35 database server... Jun 07 18:30:32 hostname mysqld[58794]: 2022-06-07 18:30:32 0 [Note] /usr/sbin/mysqld (mysqld 10.3.35-MariaDB) starting as process 58794 ... Jun 07 18:30:33 hostname systemd[1]: mariadb.service: main process exited, code=exited, status=1/FAILURE Jun 07 18:30:33 hostname systemd[1]: Failed to start MariaDB 10.3.35 database server. Jun 07 18:30:33 hostname systemd[1]: Unit mariadb.service entered failed state. Jun 07 18:30:33 hostname systemd[1]: mariadb.service failed.
            danblack Daniel Black added a comment -

            Richard, you right that its a notify issue.

            Its a fairly serious one as any SST longer than the default system timeout of 90 seconds is going to fail.

            danblack Daniel Black added a comment - Richard , you right that its a notify issue. Its a fairly serious one as any SST longer than the default system timeout of 90 seconds is going to fail.

            Correct me if I'm wrong, but it looks like this fix cures the symptom of the erroneous message, but does not solve the real problem.

            The message is intended to tell systemd that an SST is in progress and that it shouldn't consider Maria as failing to start. The message is routed to the wrong place, causing the warning. Removing the message completely does fix the warning, but there's still the serious problem of a Maria that is receiving an SST being shut down after 90 seconds by systemd for failing to start.

            I've had to override my mariadb.service and give it hours to start up before timing out. This isn't a great solution because unless there's an SST going on, it should start up very quickly.

            xan@biblionix.com Xan Charbonnet added a comment - Correct me if I'm wrong, but it looks like this fix cures the symptom of the erroneous message, but does not solve the real problem. The message is intended to tell systemd that an SST is in progress and that it shouldn't consider Maria as failing to start. The message is routed to the wrong place, causing the warning. Removing the message completely does fix the warning, but there's still the serious problem of a Maria that is receiving an SST being shut down after 90 seconds by systemd for failing to start. I've had to override my mariadb.service and give it hours to start up before timing out. This isn't a great solution because unless there's an SST going on, it should start up very quickly.

            the fix solves the case of giving systemd spurious PIDs that systemd should not have been getting at all, in any circumstances.

            The fact that systemd thinks something is taking too long — that's a different issue, if you think it's a bug, please, create a bug report for this and we'll look into that.

            serg Sergei Golubchik added a comment - the fix solves the case of giving systemd spurious PIDs that systemd should not have been getting at all, in any circumstances. The fact that systemd thinks something is taking too long — that's a different issue, if you think it's a bug, please, create a bug report for this and we'll look into that.

            People

              serg Sergei Golubchik
              Richard Richard Stracke
              Votes:
              7 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.