Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15606

Galera can't perform SST in 10.2.13 if systemd in use due to timeout at startup

Details

    • Bug
    • Status: Closed (View Workflow)
    • Critical
    • Resolution: Duplicate
    • 10.2.13, 10.2.14, 10.3.6, 10.1(EOL)
    • N/A
    • Configuration
    • None
    • CentOS Linux release 7.4.1708 (Core)

    Description

      The second node can't join the first node because SST will get killed by systemd after the default timeout hits.

      systemctl show mariadb.service | grep Timeout will show timeout set to 1m 30s for startup, but an SST can last hours with large dataset and/or slow disks and/or slow networks.

      In fact, it is common for an SST to take several hours in production.

      Setting TimeoutSec=0 under Services in the mariadb.service config file under systemd fixes the problem.

      Right now, it is impossible to deploy Galera Cluster under 10.2.13 and CentOS 7 unless the above workaround is in place.

      Attachments

        Issue Links

          Activity

            the same issue was observed with data set of ~12G when 3rd Node was joining
            sst failed with wsrep_sst_method=mariabackup but also with set rsync
            joiner: => Rate:[ 39MiB/s] Avg:[32.9MiB/s] Elapsed:0:01:20
            WSREP_SST: [ERROR] Removing /var/lib/mysql//.sst/xtrabackup_galera_info file due to signal (20180320 16:13:50.761)
            WSREP_SST: [ERROR] Cleanup after exit with status:143 (20180320 16:13:50.765)
            2018-03-20 16:13:50 140406339643136 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.193' --datadir '/var/lib/mysql/' --parent '13420' '' : 4 (Interrupted system call)

            winstone Zdravelina Sokolovska (Inactive) added a comment - the same issue was observed with data set of ~12G when 3rd Node was joining sst failed with wsrep_sst_method=mariabackup but also with set rsync joiner: => Rate:[ 39MiB/s] Avg: [32.9MiB/s] Elapsed:0:01:20 WSREP_SST: [ERROR] Removing /var/lib/mysql//.sst/xtrabackup_galera_info file due to signal (20180320 16:13:50.761) WSREP_SST: [ERROR] Cleanup after exit with status:143 (20180320 16:13:50.765) 2018-03-20 16:13:50 140406339643136 [ERROR] WSREP: Process completed with error: wsrep_sst_mariabackup --role 'joiner' --address '192.168.104.193' --datadir '/var/lib/mysql/' --parent '13420' '' : 4 (Interrupted system call)
            Aurelien_LEQUOY Aurélien LEQUOY added a comment - read this : https://jira.mariadb.org/browse/MDEV-15383?focusedCommentId=108624&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-108624
            Aurelien_LEQUOY Aurélien LEQUOY added a comment - - edited

            i am not sure you can make a SST even without that, or you keeped your version of Client, but this version will fuck your IST and SST "libmariadbclient18 10.2.13"

            Aurelien_LEQUOY Aurélien LEQUOY added a comment - - edited i am not sure you can make a SST even without that, or you keeped your version of Client, but this version will fuck your IST and SST "libmariadbclient18 10.2.13"
            Aurelien_LEQUOY Aurélien LEQUOY added a comment - - edited

            i confirm this bug too on Debian 9.4 : i made a SST with a node of 1 To.

            [....] Starting mysql (via systemctl): mysql.serviceJob for mariadb.service failed because a timeout was exceeded.
            

            i add

            TimeoutSec=0
            in /etc/systemd/system/mysqld.service

            echo 'TimeoutSec=0' >> /etc/systemd/system/mysqld.service
            systemctl daemon-reload
            

            Aurelien_LEQUOY Aurélien LEQUOY added a comment - - edited i confirm this bug too on Debian 9.4 : i made a SST with a node of 1 To. [....] Starting mysql (via systemctl): mysql.serviceJob for mariadb.service failed because a timeout was exceeded. i add TimeoutSec=0 in /etc/systemd/system/mysqld.service echo 'TimeoutSec=0' >> /etc/systemd/system/mysqld .service systemctl daemon-reload

            Guys, this needs a fix, just being bitten by this in a newly installed 10.2.14... please...

            rpizzi Rick Pizzi (Inactive) added a comment - Guys, this needs a fix, just being bitten by this in a newly installed 10.2.14... please...
            av Alex Vorona added a comment -

            Same problem affects 10.1 version

            av Alex Vorona added a comment - Same problem affects 10.1 version
            wworkman@neces.com Wayne Workman added a comment -

            These are the same:

            wworkman@neces.com Wayne Workman added a comment - These are the same: MDEV-16425 MDEV-15606

            MDEV-15607 should fix this issue.

            jplindst Jan Lindström (Inactive) added a comment - MDEV-15607 should fix this issue.
            brianryberg brianr added a comment - - edited

            This will not work any longer:

            echo 'TimeoutSec=0' >> /etc/systemd/system/mysqld.service
            systemctl daemon-reload

            systemd will apparently silently ignore the fact that it only reacts now, to "TimeoutSec=infinity" , not =0

            DAHMIKT

            brianryberg brianr added a comment - - edited This will not work any longer: echo 'TimeoutSec=0' >> /etc/systemd/system/mysqld.service systemctl daemon-reload systemd will apparently silently ignore the fact that it only reacts now, to "TimeoutSec=infinity" , not =0 DAHMIKT

            People

              jplindst Jan Lindström (Inactive)
              rpizzi Rick Pizzi (Inactive)
              Votes:
              4 Vote for this issue
              Watchers:
              14 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.