Details
-
Bug
-
Status: Closed (View Workflow)
-
Critical
-
Resolution: Duplicate
-
10.2.13, 10.2.14, 10.3.6, 10.1(EOL)
-
None
-
CentOS Linux release 7.4.1708 (Core)
Description
The second node can't join the first node because SST will get killed by systemd after the default timeout hits.
systemctl show mariadb.service | grep Timeout will show timeout set to 1m 30s for startup, but an SST can last hours with large dataset and/or slow disks and/or slow networks.
In fact, it is common for an SST to take several hours in production.
Setting TimeoutSec=0 under Services in the mariadb.service config file under systemd fixes the problem.
Right now, it is impossible to deploy Galera Cluster under 10.2.13 and CentOS 7 unless the above workaround is in place.
Attachments
Issue Links
- relates to
-
MDEV-14705 systemd: EXTEND_TIMEOUT_USEC= to avoid startup and shutdown timeouts
-
- Closed
-
-
MDEV-17571 Make systemd timeout behavior more compatible with long Galera SSTs
-
- Closed
-
Activity
Field | Original Value | New Value |
---|---|---|
Description |
The second node can't join the first node because SST will get killed by systemd after the default timeout hits.
systemctl show mariadb.service | grep Timeout will show timeout set to 1m 30s for startup, but an SST can last hours with large dataset and/or slow disks and/or slow networks. In fact, it is common for an SST to take several hours in production. Setting TimeoutSec=0 under Services in the mariadb.service config file under systemd fixes the problem. Right now, it is impossible to deploy Galera Cluster under 10.2.13 and CentOS 7 |
The second node can't join the first node because SST will get killed by systemd after the default timeout hits.
systemctl show mariadb.service | grep Timeout will show timeout set to 1m 30s for startup, but an SST can last hours with large dataset and/or slow disks and/or slow networks. In fact, it is common for an SST to take several hours in production. Setting TimeoutSec=0 under Services in the mariadb.service config file under systemd fixes the problem. Right now, it is impossible to deploy Galera Cluster under 10.2.13 and CentOS 7 unless the above workaround is in place. |
Fix Version/s | 10.2 [ 14601 ] | |
Assignee | Sachin Setiya [ sachin.setiya.007 ] | |
Priority | Blocker [ 1 ] | Critical [ 2 ] |
Link |
This issue relates to |
Affects Version/s | 10.2.14 [ 22911 ] |
Affects Version/s | 10.1 [ 16100 ] |
Affects Version/s | 10.3.6 [ 23003 ] |
Assignee | Sachin Setiya [ sachin.setiya.007 ] |
Assignee | Sachin Setiya [ sachin.setiya.007 ] |
Assignee | Sachin Setiya [ sachin.setiya.007 ] | Seppo Jaakola [ seppo ] |
Fix Version/s | N/A [ 14700 ] | |
Fix Version/s | 10.2 [ 14601 ] | |
Assignee | Seppo Jaakola [ seppo ] | Jan Lindström [ jplindst ] |
Resolution | Duplicate [ 3 ] | |
Status | Open [ 1 ] | Closed [ 6 ] |
Link |
This issue relates to |
Workflow | MariaDB v3 [ 86106 ] | MariaDB v4 [ 153984 ] |
the same issue was observed with data set of ~12G when 3rd Node was joining
sst failed with wsrep_sst_method=mariabackup but also with set rsync
joiner: => Rate:[ 39MiB/s] Avg:[32.9MiB/s] Elapsed:0:01:20
WSREP_SST: [ERROR] Removing /var/lib/mysql//.sst/xtrabackup_galera_info file due to signal (20180320 16:13:50.761)
WSREP_SST: [ERROR] Cleanup after exit with status:143 (20180320 16:13:50.765)
2018-03-20 16:13:50 140406339643136 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.193' --datadir '/var/lib/mysql/' --parent '13420' '' : 4 (Interrupted system call)