Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-9202

Systemd timeout is not sufficient for larger servers

Details

    • Bug
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • 10.1.8, 10.1.9, 10.1(EOL)
    • N/A
    • Documentation
    • None
    • Debian Jessie
    • 10.1.13

    Description

      On larger servers (thousands of tables) the start process can be very long (on our server about 10 minutes), with previous init systems this wasn't much of a problem.

      With systemd it is:

      • Startup always fails, but after the normal start time the server does start
      • apt-get operations fail!
      • When upgrading the systemd.service file is overwritten removing any timeout settings, failing again

      Adding a large TimeoutStartSec fixes the problem but as noted gets overwritten every upgrade.

      Attachments

        Issue Links

          Activity

            serg, please review fix for this bug.

            svoj Sergey Vojtovich added a comment - serg , please review fix for this bug.

            I'd prefer to keep the default timeout. The default is supposed to be good for most cases. And users should be able to adjust it as needed. See, for example MDEV-8509 and MDEV-5068. With systemd one apparently can adjust the timeout (see above). I think the issue can be closed as is.

            serg Sergei Golubchik added a comment - I'd prefer to keep the default timeout. The default is supposed to be good for most cases. And users should be able to adjust it as needed. See, for example MDEV-8509 and MDEV-5068 . With systemd one apparently can adjust the timeout (see above). I think the issue can be closed as is.

            Hi ,

            If a node deadlock is caused by a bug please kill -9 and open a bug
            Using a system feature to hide our own mess can not be a justification , to make most cluster user regular SST feature to failed
            Is it possible to revisit this decision now that most old bugs are fixed

            /svar

            stephane@skysql.com VAROQUI Stephane added a comment - Hi , If a node deadlock is caused by a bug please kill -9 and open a bug Using a system feature to hide our own mess can not be a justification , to make most cluster user regular SST feature to failed Is it possible to revisit this decision now that most old bugs are fixed /svar

            It looks like MDEV-6113 will be merged for at least 10.2, this would alleviate the need for such settings greatly.
            But the fact remains that for larger servers the default timeout is on the short side (when a server boots it usually has more things to do than just start the database).

            Personally I'd recommend setting it to 3 minutes for 10.2+, the default behaviour appears to be to wait for a full MariaDB boot instead of a hard kill. Given this behaviour we only have to balance a logical timeout before warning the user that something Might be wrong (as the server will continue to boot and exit after it has done so, in a recovery situation a simple restart should suffice, for larger servers the override can be used).

            GieltjE Michiel Hazelhof added a comment - It looks like MDEV-6113 will be merged for at least 10.2, this would alleviate the need for such settings greatly. But the fact remains that for larger servers the default timeout is on the short side (when a server boots it usually has more things to do than just start the database). Personally I'd recommend setting it to 3 minutes for 10.2+, the default behaviour appears to be to wait for a full MariaDB boot instead of a hard kill. Given this behaviour we only have to balance a logical timeout before warning the user that something Might be wrong (as the server will continue to boot and exit after it has done so, in a recovery situation a simple restart should suffice, for larger servers the override can be used).
            danblack Daniel Black added a comment -

            RFE upstream in attempt to get workable solution: https://github.com/systemd/systemd/issues/5868

            danblack Daniel Black added a comment - RFE upstream in attempt to get workable solution: https://github.com/systemd/systemd/issues/5868

            People

              svoj Sergey Vojtovich
              GieltjE Michiel Hazelhof
              Votes:
              1 Vote for this issue
              Watchers:
              12 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.