Details

Type: Task
Status: Closed (View Workflow)
Priority: Critical
Resolution: Fixed
Fix Version/s: 10.1.33, 10.2.15, 10.3.6
Component/s: Storage Engine - InnoDB
Labels:

Sprint:
10.3.6-1

Description

~~MDEV-9202~~, ~~MDEV-8509~~ shows cases where the systemd timeout isn't sufficient to preform initialization/shut-down.

Since https://github.com/systemd/systemd/commit/a327431bd168b2f327f3cd422379e213c643f2a5 released in system v236 Type=notify service can now advice to the systemd service manager they are still working to avoid the service timing out.

The use of EXTEND_TIMEOUT_USEC= on older services has no effect and is therefore compatible.

This needs to be included in (feel free to correct/extend):

buffer pool dump - buf_dump (storage/innobase/buf/buf0dump.cc)
redo log recovery - log_group_read_log_seg (storage/innobase/log/log0log.cc)
undo recovery - recv_recover_page_func (storage/innobase/log/log0recv.cc)
change buffer?
merge buffer?

I was planning on making the 15 seconds of recv_sys_t->report() more general with respect to interval, and use a define INNODB_REPORT_INTERVAL (include/univ.i?) as the basis for this form of watchdog. I'd send notify messages of INNODB_REPORT_INTERVAL * 2 as an acceptable margin.

Anywhere else or other suggestions marko, jplindst?

galera SST scripts - donor and recipient

Any other server/engine slow points to account for?

Target 10.3 and then look at a backport?

Attachments

Issue Links

causes

MDEV-16149 Failing assertion: node->modification_counter == node->flush_counter with innodb_flush_method=O_DSYNC

Closed

MDEV-16150 Mariadb 10.3.6 Failing assertion on Docker

Closed

MDEV-17003 service_manager_extend_timeout() being called too often

Closed

relates to

MDEV-11027 InnoDB log recovery is too noisy

Closed

MDEV-12323 Rollback progress log messages during crash recovery are intermixed with unrelated log messages

Closed

MDEV-12352 InnoDB shutdown should not be blocked by a large transaction rollback

Closed

MDEV-15554 InnoDB page_cleaner shutdown sometimes hangs

Closed

MDEV-15832 With innodb_fast_shutdown=3, skip the rollback of connected transactions

Closed

MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs

Closed

MDEV-17934 Make systemd timeout behavior more compatible with longer Galera recovery times

Closed

MDEV-18224 MTR's internal check of the test case 'innodb.recovery_shutdown' failed due to extra #sql-ib*.ibd files

Closed

MDEV-9202 Systemd timeout is not sufficient for larger servers

Closed

MDEV-11035 Restore removed disallow-writes for Galera

Closed

MDEV-15606 Galera can't perform SST in 10.2.13 if systemd in use due to timeout at startup

Closed

MDEV-15607 mysqld crashed few after node is being joined with sst

Closed

MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs

Closed

(11 relates to)

Activity

Ascending order - Click to sort in descending order

Daniel Black created issue - 2017-12-19 01:05

Daniel Black made changes - 2017-12-19 01:05

Field	Original Value	New Value
Link		This issue relates to ~~MDEV-9202~~ [ ~~MDEV-9202~~ ]

Daniel Black made changes - 2017-12-19 01:08

Description

~~MDEV-9202~~, ~~MDEV-8509~~ shows cases where the systemd timeout isn't sufficient to preform initialization/shut-down.

Since https://github.com/systemd/systemd/commit/a327431bd168b2f327f3cd422379e213c643f2a5 released in system v236 Type=notify service can now advice to the systemd service manager they are still working to avoid the service timing out.

The use of EXTEND_TIMEOUT_USEC= on older services has no effect and is therefore compatible.

This needs to be included in (feel free to correct/extend):
* buffer pool dump - buf_dump (storage/innobase/buf/buf0dump.cc)
* redo log recovery - log_group_read_log_seg (storage/innobase/log/log0log.cc)
* undo(?) recovery - recv_recover_page_func (storage/innobase/log/log0recv.cc)
* change buffer?
* merge buffer?

I was planning on making the 15 seconds of recv_sys_t->report() more general with respect to interval, and use a define INNODB_REPORT_INTERVAL (include/univ.i?) as the basis for this form of watchdog. I'd send notify messages of INNODB_REPORT_INTERVAL * 2 as an acceptable margin.

Anywhere else or other suggestions [~marko], [~jplindst]?

Any other server/engine slow points to account for?

Target 10.3 and then look at a backport?

~~MDEV-9202~~, ~~MDEV-8509~~ shows cases where the systemd timeout isn't sufficient to preform initialization/shut-down.

Since https://github.com/systemd/systemd/commit/a327431bd168b2f327f3cd422379e213c643f2a5 released in system v236 Type=notify service can now advice to the systemd service manager they are still working to avoid the service timing out.

The use of EXTEND_TIMEOUT_USEC= on older services has no effect and is therefore compatible.

This needs to be included in (feel free to correct/extend):
* buffer pool dump - buf_dump (storage/innobase/buf/buf0dump.cc)
* redo log recovery - log_group_read_log_seg (storage/innobase/log/log0log.cc)
* undo(?) recovery - recv_recover_page_func (storage/innobase/log/log0recv.cc)
* change buffer?
* merge buffer?

I was planning on making the 15 seconds of recv_sys_t->report() more general with respect to interval, and use a define INNODB_REPORT_INTERVAL (include/univ.i?) as the basis for this form of watchdog. I'd send notify messages of INNODB_REPORT_INTERVAL * 2 as an acceptable margin.

Anywhere else or other suggestions [~marko], [~jplindst]?
* galera SST scripts - donor and recipient

Any other server/engine slow points to account for?

Target 10.3 and then look at a backport?

Daniel Black made changes - 2018-01-27 02:28

Fix Version/s		10.2 [ 14601 ]
Fix Version/s		10.1 [ 16100 ]

Daniel Black made changes - 2018-01-27 02:29

Labels

contribution foundation patch

Sergey Vojtovich made changes - 2018-01-29 10:36

Assignee

Marko Mäkelä [ marko ]

Sergey Vojtovich made changes - 2018-03-13 13:32

Priority

Major [ 3 ]

Critical [ 2 ]

Sergey Vojtovich added a comment - 2018-03-13 13:32

Overdue PR.

Sergey Vojtovich added a comment - 2018-03-13 13:32 Overdue PR.

Marko Mäkelä made changes - 2018-03-14 10:04

Link

This issue relates to ~~MDEV-11027~~ [ ~~MDEV-11027~~ ]

Marko Mäkelä added a comment - 2018-03-14 10:04

Startup and shutdown are quite different. I would consider them separately.

Startup

For startup, the biggest time consumers should be redo log apply and the buffer pool load. For the redo log apply, we cannot know in advance how much work there is to be done. ~~MDEV-11027~~ introduced progress reporting for this, including a call to sd_notifyf().

Undo recovery (rolling back incomplete transactions) takes place in a background thread. Starting with ~~MDEV-12323~~, we do report progress for that as well via sd_notifyf(). If the binlog is used, then the tc_heuristic_recover option can come into play, internally initiating XA COMMIT or XA ROLLBACK operations in some thread. I do not know if that would block user connections from starting at all. Commit is fast in InnoDB, while the rollback speed is proportional to the number of modified rows and the number of indexes.

Shutdown

In shutdown, there have been multiple hangs. Most recently, I fixed ~~MDEV-15554~~.

There are different innodb_fast_shutdown modes. By default (innodb_fast_shutdown=1, a fast shutdown is done: all dirty pages are flushed to the buffer pool and a log checkpoint is created.

Some progress reporting for the flush phase would be nice. We know exactly how many dirty pages there are, so the progress measure would be linear.

There is also the crash-like innodb_fast_shutdown=2 which is supposed to complete instantly.

The slowest shutdown is innodb_fast_shutdown=0. It will:

Wait for all transactions to complete (starting with ~~MDEV-12352~~, normal shutdown would abort the rollback of recovered transactions)
Purge the history of all completed transactions (well, except when it fails to do so due to ~~MDEV-11802~~)
Empty the change buffer (formerly known as the insert buffer) by merging changes to secondary index leaf pages.

Only after these tasks, it becomes feasible to flush the dirty pages into the data files and to create the final redo log checkpoint.

Marko Mäkelä added a comment - 2018-03-14 10:04 Startup and shutdown are quite different. I would consider them separately. Startup For startup, the biggest time consumers should be redo log apply and the buffer pool load. For the redo log apply, we cannot know in advance how much work there is to be done. MDEV-11027 introduced progress reporting for this, including a call to sd_notifyf() . Undo recovery (rolling back incomplete transactions) takes place in a background thread. Starting with MDEV-12323 , we do report progress for that as well via sd_notifyf() . If the binlog is used, then the tc_heuristic_recover option can come into play, internally initiating XA COMMIT or XA ROLLBACK operations in some thread. I do not know if that would block user connections from starting at all. Commit is fast in InnoDB, while the rollback speed is proportional to the number of modified rows and the number of indexes. Shutdown In shutdown, there have been multiple hangs. Most recently, I fixed MDEV-15554 . There are different innodb_fast_shutdown modes. By default ( innodb_fast_shutdown=1 , a fast shutdown is done: all dirty pages are flushed to the buffer pool and a log checkpoint is created. Some progress reporting for the flush phase would be nice. We know exactly how many dirty pages there are, so the progress measure would be linear. There is also the crash-like innodb_fast_shutdown=2 which is supposed to complete instantly. The slowest shutdown is innodb_fast_shutdown=0 . It will: Wait for all transactions to complete (starting with MDEV-12352 , normal shutdown would abort the rollback of recovered transactions) Purge the history of all completed transactions (well, except when it fails to do so due to MDEV-11802 ) Empty the change buffer (formerly known as the insert buffer) by merging changes to secondary index leaf pages. Only after these tasks, it becomes feasible to flush the dirty pages into the data files and to create the final redo log checkpoint.

Marko Mäkelä made changes - 2018-03-14 10:04

Link

This issue relates to ~~MDEV-15554~~ [ ~~MDEV-15554~~ ]

Marko Mäkelä made changes - 2018-03-14 10:04

Link

This issue relates to ~~MDEV-12352~~ [ ~~MDEV-12352~~ ]

Marko Mäkelä made changes - 2018-03-14 10:04

Link

This issue relates to ~~MDEV-12323~~ [ ~~MDEV-12323~~ ]

Marko Mäkelä made changes - 2018-03-14 10:06

Sprint

10.1.32 [ 235 ]

Marko Mäkelä made changes - 2018-03-14 10:06

Status

Open [ 1 ]

In Progress [ 3 ]

Daniel Black added a comment - 2018-03-14 10:52

quick note - for the purpose of systemd timing startup start from process creation and ends just before accepting the first connection.

startup time ends: https://github.com/MariaDB/server/blob/10.3/sql/mysqld.cc#L6680
The end shutdown time starts: https://github.com/MariaDB/server/blob/10.3/sql/mysqld.cc#L6882 until the final exit.

On startup I haven't included the buffer pool load because as I understand it it continues in the background while the server is processing queries.

I have included the buffer pool save on shutdown too.

Exact time need not be known. Only that a loop iteration can progress within a estimated maximium time.

Daniel Black added a comment - 2018-03-14 10:52 quick note - for the purpose of systemd timing startup start from process creation and ends just before accepting the first connection. startup time ends: https://github.com/MariaDB/server/blob/10.3/sql/mysqld.cc#L6680 The end shutdown time starts: https://github.com/MariaDB/server/blob/10.3/sql/mysqld.cc#L6882 until the final exit. On startup I haven't included the buffer pool load because as I understand it it continues in the background while the server is processing queries. I have included the buffer pool save on shutdown too. Exact time need not be known. Only that a loop iteration can progress within a estimated maximium time.

Rick Pizzi (Inactive) added a comment - 2018-03-20 12:45

@marko the startup has to consider Galera SST time - can be hours on larger datasets

Rick Pizzi (Inactive) added a comment - 2018-03-20 12:45 @marko the startup has to consider Galera SST time - can be hours on larger datasets

Marko Mäkelä made changes - 2018-03-20 14:33

Sprint

10.1.32 [ 235 ]

10.3.6 [ 237 ]

Marko Mäkelä made changes - 2018-03-20 18:35

Link

This issue relates to ~~MDEV-15607~~ [ ~~MDEV-15607~~ ]

Daniel Black added a comment - 2018-03-20 23:04 - edited

rpizzi, SST scripts will be able to communicate just as easily as the server. The tricky bit is identifying the lower bits of the SST and enacting on it. Like for rsync I was thinking of enabling a progress monitoring and getting a subfunction that takes any output on stdout and issues EXTEND_TIMEOUT=... at that point.

something like:

if systemd-notify --booted && [ -n "${NOTIFY_SOCKET}" ]

then

  rsync_output_extendtimeout()

    # some rate limiting / variable timeout progress

    while read input

do

       systemd-notify EXTEND_TIMEMOUT=100000

   done

else

  rsync_output_extendtimeout() { [ 1 ]; }  > /dev/null

fi

...

     rsync --progress   |  rsync_output_extendtimeout

...

Edit: $NOTIFY_SOCKET is a unix socket or an abstract socket. Probably best just to use systemd-notify and check if $NOTIFY_SOCKET is set

Assistance in identifying other slow SST components and how to identify progress would be much appreciated.

Daniel Black added a comment - 2018-03-20 23:04 - edited rpizzi , SST scripts will be able to communicate just as easily as the server. The tricky bit is identifying the lower bits of the SST and enacting on it. Like for rsync I was thinking of enabling a progress monitoring and getting a subfunction that takes any output on stdout and issues EXTEND_TIMEOUT=... at that point. something like: if systemd-notify --booted && [ -n "${NOTIFY_SOCKET}" ] then rsync_output_extendtimeout() { # some rate limiting / variable timeout progress while read input do systemd-notify EXTEND_TIMEMOUT=100000 done } else rsync_output_extendtimeout() { [ 1 ]; } > /dev/null fi ... rsync --progress | rsync_output_extendtimeout ... Edit: $NOTIFY_SOCKET is a unix socket or an abstract socket. Probably best just to use systemd-notify and check if $NOTIFY_SOCKET is set Assistance in identifying other slow SST components and how to identify progress would be much appreciated.

Rick Pizzi (Inactive) added a comment - 2018-03-21 08:26

I believe the script wsrep_sst_xtrabackup-v2 (or mariabackup equivalent) is just spawned via fork/exec, in this case in the parent process, run the loop you have mentioned until the child process completes.

Rick Pizzi (Inactive) added a comment - 2018-03-21 08:26 I believe the script wsrep_sst_xtrabackup-v2 (or mariabackup equivalent) is just spawned via fork/exec, in this case in the parent process, run the loop you have mentioned until the child process completes.

Rick Pizzi (Inactive) added a comment - 2018-03-21 08:57

Looks like this SST issue isn't new, just overlooked.
See item #8 of https://mariadb.com/kb/en/library/upgrading-from-mariadb-galera-cluster-100-to-mariadb-101/

Rick Pizzi (Inactive) added a comment - 2018-03-21 08:57 Looks like this SST issue isn't new, just overlooked. See item #8 of https://mariadb.com/kb/en/library/upgrading-from-mariadb-galera-cluster-100-to-mariadb-101/

Daniel Black made changes - 2018-03-23 11:59

Link

This issue relates to ~~MDEV-15606~~ [ ~~MDEV-15606~~ ]

Marko Mäkelä added a comment - 2018-04-06 07:27

Some cleanup, and testing

Marko Mäkelä added a comment - 2018-04-06 07:27 Some cleanup, and testing

Marko Mäkelä made changes - 2018-04-06 07:27

issue.field.resolutiondate

2018-04-06 07:27:36.0

2018-04-06 07:27:36.08

Marko Mäkelä made changes - 2018-04-06 07:27

Fix Version/s		10.1.33 [ 22909 ]
Fix Version/s		10.2.15 [ 23006 ]
Fix Version/s		10.3.6 [ 23003 ]
Fix Version/s	10.2 [ 14601 ]
Fix Version/s	10.1 [ 16100 ]
Fix Version/s	10.3 [ 22126 ]
Resolution		Fixed [ 1 ]
Status	In Progress [ 3 ]	Closed [ 6 ]

Marko Mäkelä added a comment - 2018-04-06 07:38

rpizzi, I added the timeout extension to InnoDB and XtraDB. Please file a separate ticket for the Galera snapshot transfer, maybe linked to ~~MDEV-11035~~ or some other ticket on innodb_disallow_writes.

Marko Mäkelä added a comment - 2018-04-06 07:38 rpizzi , I added the timeout extension to InnoDB and XtraDB. Please file a separate ticket for the Galera snapshot transfer, maybe linked to MDEV-11035 or some other ticket on innodb_disallow_writes .

Marko Mäkelä made changes - 2018-04-08 06:39

Link

This issue relates to ~~MDEV-11035~~ [ ~~MDEV-11035~~ ]

Marko Mäkelä added a comment - 2018-04-08 06:42

rpizzi, fraggeln in ~~MDEV-11035~~ mentioned systemd timeout on startup, related to Galera rsync snapshot transfer failure. I wonder if this fix would address those problems, or if some further change is needed. (If yes, please file a separate ticket, as I suggested earlier.)

Marko Mäkelä added a comment - 2018-04-08 06:42 rpizzi , fraggeln in MDEV-11035 mentioned systemd timeout on startup, related to Galera rsync snapshot transfer failure. I wonder if this fix would address those problems, or if some further change is needed. (If yes, please file a separate ticket, as I suggested earlier.)

Marko Mäkelä made changes - 2018-04-10 05:24

Link

This issue relates to ~~MDEV-15832~~ [ ~~MDEV-15832~~ ]

Rick Pizzi (Inactive) added a comment - 2018-04-11 15:30

"Please file a separate ticket for the Galera snapshot transfer"
already there.... https://jira.mariadb.org/browse/MDEV-15606

Rick Pizzi (Inactive) added a comment - 2018-04-11 15:30 "Please file a separate ticket for the Galera snapshot transfer" already there.... https://jira.mariadb.org/browse/MDEV-15606

Marko Mäkelä made changes - 2018-05-14 05:53

Link

This issue causes ~~MDEV-16149~~ [ ~~MDEV-16149~~ ]

Marko Mäkelä made changes - 2018-05-14 05:53

Link

This issue causes ~~MDEV-16150~~ [ ~~MDEV-16150~~ ]

Marko Mäkelä made changes - 2018-08-16 14:08

Link

This issue causes ~~MDEV-17003~~ [ ~~MDEV-17003~~ ]

Geoff Montee (Inactive) made changes - 2018-12-06 22:45

Link

This issue relates to ~~MDEV-17571~~ [ ~~MDEV-17571~~ ]

Geoff Montee (Inactive) made changes - 2018-12-07 21:49

Link

This issue relates to ~~MDEV-17934~~ [ ~~MDEV-17934~~ ]

Marko Mäkelä made changes - 2019-01-14 07:24

Link

This issue relates to ~~MDEV-18224~~ [ ~~MDEV-18224~~ ]

Marko Mäkelä made changes - 2020-03-25 07:13

Link

This issue relates to ~~MDEV-17571~~ [ ~~MDEV-17571~~ ]

Marko Mäkelä added a comment - 2020-03-25 07:13

It looks like the Galera issues were ultimately fixed in ~~MDEV-17571~~.

Marko Mäkelä added a comment - 2020-03-25 07:13 It looks like the Galera issues were ultimately fixed in MDEV-17571 .

Sergei Golubchik made changes - 2021-12-06 21:23

Workflow

MariaDB v3 [ 84505 ]

MariaDB v4 [ 133422 ]

People

Assignee:: Marko Mäkelä

Reporter:: Daniel Black

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 2017-12-19 01:05

Updated:: 2020-03-25 07:13

Resolved:: 2018-04-06 07:27

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts

Details

Description

Attachments

Issue Links

Activity

Startup

Shutdown

People

Dates

Git Integration