Details
Description
When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog:
Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server...
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating.
|
Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state.
|
Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'.
|
galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of MDEV-14705. Despite that, server versions with the fix for MDEV-14705 still see timeouts during ExecStartPre. Is it likely that important long-running startup functions were missed?
See also MDEV-17571 as another case where systemd timeout extensions didn't seem to work as intended.
Attachments
Issue Links
- relates to
-
MDEV-9202 Systemd timeout is not sufficient for larger servers
-
- Closed
-
-
MDEV-9520 xtrabackup-v2 to support systemd node provisioning
-
- Closed
-
-
MDEV-15607 mysqld crashed few after node is being joined with sst
-
- Closed
-
-
MDEV-17003 service_manager_extend_timeout() being called too often
-
- Closed
-
-
MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts
-
- Closed
-
-
MDEV-15607 mysqld crashed few after node is being joined with sst
-
- Closed
-
-
MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Link |
This issue relates to |
Fix Version/s | 10.1 [ 16100 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Assignee | Rasmus Johansson [ ratzpo ] |
Description |
When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
https://github.com/MariaDB/server/blob/ce8716a1ed786ff971b5e15c88385d50b649ec7f/support-files/mariadb.service.in#L71 The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog: {noformat} Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server... Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating. Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server. Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state. Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'. {noformat} galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of See also |
When Galera is enabled, MariaDB's systemd service executes the "galera_recovery" script as an ExecStartPre operation. See the following:
https://github.com/MariaDB/server/blob/ce8716a1ed786ff971b5e15c88385d50b649ec7f/support-files/mariadb.service.in#L71 The MariaDB systemd service has a default TimeoutStartSec value of 90 seconds, so if this ExecStartPre step takes longer than that, then this can cause startup to fail. For example, see the following failure from a syslog: {noformat} Sep 13 15:48:28 server1 systemd[1]: Starting MariaDB 10.2.16 database server... Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Start-pre operation timed out. Terminating. Sep 13 15:49:58 server1 systemd[1]: Failed to start MariaDB 10.2.16 database server. Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Unit entered failed state. Sep 13 15:49:58 server1 systemd[1]: mariadb.service: Failed with result 'timeout'. {noformat} galera_recovery has to perform server startup, so this step can take a while, especially if the server previously crashed, and it has to perform crash recovery. However, it looks like systemd timeouts should have been extended during server startup as part of See also |
Status | Open [ 1 ] | In Progress [ 3 ] |
Status | In Progress [ 3 ] | Stalled [ 10000 ] |
Assignee | Rasmus Johansson [ ratzpo ] | Jan Lindström [ jplindst ] |
Assignee | Jan Lindström [ jplindst ] | Rasmus Johansson [ ratzpo ] |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Fix Version/s | 10.1 [ 16100 ] | |
Fix Version/s | 10.3 [ 22126 ] | |
Fix Version/s | 10.4 [ 22408 ] | |
Resolution | Duplicate [ 3 ] | |
Status | Stalled [ 10000 ] | Closed [ 6 ] |
Workflow | MariaDB v3 [ 91110 ] | MariaDB v4 [ 155321 ] |
Zendesk Related Tickets | 127027 156845 114054 |