[MDEV-15606] Galera can't perform SST in 10.2.13 if systemd in use due to timeout at startup - Jira

Details

Type: Bug
Status: Closed (View Workflow)
Priority: Critical
Resolution: Duplicate
Affects Version/s: 10.2.13, 10.2.14, 10.3.6, 10.1(EOL)
Fix Version/s: N/A
Component/s: Configuration
Labels:
None
Environment:
CentOS Linux release 7.4.1708 (Core)

Description

The second node can't join the first node because SST will get killed by systemd after the default timeout hits.

systemctl show mariadb.service | grep Timeout will show timeout set to 1m 30s for startup, but an SST can last hours with large dataset and/or slow disks and/or slow networks.

In fact, it is common for an SST to take several hours in production.

Setting TimeoutSec=0 under Services in the mariadb.service config file under systemd fixes the problem.

Right now, it is impossible to deploy Galera Cluster under 10.2.13 and CentOS 7 unless the above workaround is in place.

Attachments

Issue Links

relates to

MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts

Closed

MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs

Closed

Activity

Ascending order - Click to sort in descending order

Rick Pizzi (Inactive) created issue - 2018-03-20 13:38

Rick Pizzi (Inactive) made changes - 2018-03-20 13:39

Field	Original Value	New Value
Description	The second node can't join the first node because SST will get killed by systemd after the default timeout hits. systemctl show mariadb.service \| grep Timeout will show timeout set to 1m 30s for startup, but an SST can last hours with large dataset and/or slow disks and/or slow networks. In fact, it is common for an SST to take several hours in production. Setting TimeoutSec=0 under Services in the mariadb.service config file under systemd fixes the problem. Right now, it is impossible to deploy Galera Cluster under 10.2.13 and CentOS 7	The second node can't join the first node because SST will get killed by systemd after the default timeout hits. systemctl show mariadb.service \| grep Timeout will show timeout set to 1m 30s for startup, but an SST can last hours with large dataset and/or slow disks and/or slow networks. In fact, it is common for an SST to take several hours in production. Setting TimeoutSec=0 under Services in the mariadb.service config file under systemd fixes the problem. Right now, it is impossible to deploy Galera Cluster under 10.2.13 and CentOS 7 unless the above workaround is in place.

Zdravelina Sokolovska (Inactive) added a comment - 2018-03-20 16:14

the same issue was observed with data set of ~12G when 3rd Node was joining
sst failed with wsrep_sst_method=mariabackup but also with set rsync
joiner: => Rate:[ 39MiB/s] Avg:[32.9MiB/s] Elapsed:0:01:20
WSREP_SST: [ERROR] Removing /var/lib/mysql//.sst/xtrabackup_galera_info file due to signal (20180320 16:13:50.761)
WSREP_SST: [ERROR] Cleanup after exit with status:143 (20180320 16:13:50.765)
2018-03-20 16:13:50 140406339643136 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.193' --datadir '/var/lib/mysql/' --parent '13420' '' : 4 (Interrupted system call)

Zdravelina Sokolovska (Inactive) added a comment - 2018-03-20 16:14 the same issue was observed with data set of ~12G when 3rd Node was joining sst failed with wsrep_sst_method=mariabackup but also with set rsync joiner: => Rate:[ 39MiB/s] Avg: [32.9MiB/s] Elapsed:0:01:20 WSREP_SST: [ERROR] Removing /var/lib/mysql//.sst/xtrabackup_galera_info file due to signal (20180320 16:13:50.761) WSREP_SST: [ERROR] Cleanup after exit with status:143 (20180320 16:13:50.765) 2018-03-20 16:13:50 140406339643136 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.193' --datadir '/var/lib/mysql/' --parent '13420' '' : 4 (Interrupted system call)

Aurélien LEQUOY added a comment - 2018-03-20 17:38

read this : https://jira.mariadb.org/browse/MDEV-15383?focusedCommentId=108624&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108624

Aurélien LEQUOY added a comment - 2018-03-20 17:38 read this : https://jira.mariadb.org/browse/MDEV-15383?focusedCommentId=108624&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108624

Aurélien LEQUOY added a comment - 2018-03-20 18:05 - edited

i am not sure you can make a SST even without that, or you keeped your version of Client, but this version will fuck your IST and SST "libmariadbclient18 10.2.13"

Aurélien LEQUOY added a comment - 2018-03-20 18:05 - edited i am not sure you can make a SST even without that, or you keeped your version of Client, but this version will fuck your IST and SST "libmariadbclient18 10.2.13"

Aurélien LEQUOY added a comment - 2018-03-22 15:06 - edited

i confirm this bug too on Debian 9.4 : i made a SST with a node of 1 To.

[....] Starting mysql (via systemctl): mysql.serviceJob for mariadb.service failed because a timeout was exceeded.

i add

TimeoutSec=0
in /etc/systemd/system/mysqld.service

echo 'TimeoutSec=0' >> /etc/systemd/system/mysqld.service

systemctl daemon-reload

Aurélien LEQUOY added a comment - 2018-03-22 15:06 - edited i confirm this bug too on Debian 9.4 : i made a SST with a node of 1 To. [....] Starting mysql (via systemctl): mysql.serviceJob for mariadb.service failed because a timeout was exceeded. i add TimeoutSec=0 in /etc/systemd/system/mysqld.service echo 'TimeoutSec=0' >> /etc/systemd/system/mysqld .service systemctl daemon-reload

Elena Stepanova made changes - 2018-03-23 09:34

Fix Version/s		10.2 [ 14601 ]
Assignee		Sachin Setiya [ sachin.setiya.007 ]
Priority	Blocker [ 1 ]	Critical [ 2 ]

Daniel Black made changes - 2018-03-23 11:59

Link

This issue relates to ~~MDEV-14705~~ [ ~~MDEV-14705~~ ]

Rick Pizzi (Inactive) added a comment - 2018-04-11 15:28

Guys, this needs a fix, just being bitten by this in a newly installed 10.2.14... please...

Rick Pizzi (Inactive) added a comment - 2018-04-11 15:28 Guys, this needs a fix, just being bitten by this in a newly installed 10.2.14... please...

Rick Pizzi (Inactive) made changes - 2018-04-11 15:29

Affects Version/s

10.2.14 [ 22911 ]

Alex Vorona added a comment - 2018-04-12 05:56

Same problem affects 10.1 version

Alex Vorona added a comment - 2018-04-12 05:56 Same problem affects 10.1 version

Zdravelina Sokolovska (Inactive) made changes - 2018-05-03 09:34

Affects Version/s

10.1 [ 16100 ]

Zdravelina Sokolovska (Inactive) made changes - 2018-05-03 09:34

Affects Version/s

10.3.6 [ 23003 ]

Wayne Workman added a comment - 2018-06-12 15:53

These are the same:

Wayne Workman added a comment - 2018-06-12 15:53 These are the same: MDEV-16425 MDEV-15606

Sachin Setiya (Inactive) made changes - 2018-07-18 13:05

Assignee

Sachin Setiya [ sachin.setiya.007 ]

Sachin Setiya (Inactive) made changes - 2018-07-18 13:26

Assignee

Sachin Setiya [ sachin.setiya.007 ]

Sachin Setiya (Inactive) made changes - 2018-07-18 13:27

Assignee

Sachin Setiya [ sachin.setiya.007 ]

Seppo Jaakola [ seppo ]

Jan Lindström (Inactive) added a comment - 2018-09-12 10:52

~~MDEV-15607~~ should fix this issue.

Jan Lindström (Inactive) added a comment - 2018-09-12 10:52 MDEV-15607 should fix this issue.

Jan Lindström (Inactive) made changes - 2018-09-12 10:52

Fix Version/s		N/A [ 14700 ]
Fix Version/s	10.2 [ 14601 ]
Assignee	Seppo Jaakola [ seppo ]	Jan Lindström [ jplindst ]
Resolution		Duplicate [ 3 ]
Status	Open [ 1 ]	Closed [ 6 ]

brianr added a comment - 2018-10-22 13:49 - edited

This will not work any longer:

echo 'TimeoutSec=0' >> /etc/systemd/system/mysqld.service
systemctl daemon-reload

systemd will apparently silently ignore the fact that it only reacts now, to "TimeoutSec=infinity" , not =0

DAHMIKT

brianr added a comment - 2018-10-22 13:49 - edited This will not work any longer: echo 'TimeoutSec=0' >> /etc/systemd/system/mysqld.service systemctl daemon-reload systemd will apparently silently ignore the fact that it only reacts now, to "TimeoutSec=infinity" , not =0 DAHMIKT

Geoff Montee (Inactive) made changes - 2018-12-06 22:07

Link

This issue relates to ~~MDEV-17571~~ [ ~~MDEV-17571~~ ]

Sergei Golubchik made changes - 2021-12-06 21:46

Workflow

MariaDB v3 [ 86106 ]

MariaDB v4 [ 153984 ]

People

Assignee:: Jan Lindström (Inactive)

Reporter:: Rick Pizzi (Inactive)

Votes:: 4 Vote for this issue

Watchers:: 14 Start watching this issue

Dates

Created:: 2018-03-20 13:38

Updated:: 2018-12-06 22:07

Resolved:: 2018-09-12 10:52

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.

MariaDB Server

Details

Description

Attachments

Issue Links

Activity

People

Dates

Git Integration