Uploaded image for project: 'MariaDB Server'
  1. MariaDB Server
  2. MDEV-15891

SIGHUP during rsync SST causes SST to fail

Details

    Description

      This is similar to MDEV-14282, but it seems slightly different.

      If a SIGHUP is received during an rsync SST, then the SST will fail:

      Hangup
      WSREP_SST: [INFO] Joiner cleanup. rsync PID: 11637 (20180411 18:48:23.814)
      WSREP_SST: [INFO] Joiner cleanup done. (20180411 18:48:24.318)
      2018-04-11 18:48:24 140510861715200 [Warning] WSREP: 0.0 (144-70-2-195.domain.com): State transfer to 1.0 (144-70-17-147.domain.com) failed: -255 (Unknown error 255)
      2018-04-11 18:48:24 140510861715200 [ERROR] WSREP: gcs/src/gcs_group.cpp:gcs_group_handle_join_msg():737: Will never receive state. Need to abort.
      2018-04-11 18:48:24 140510861715200 [Note] WSREP: gcomm: terminating thread
      2018-04-11 18:48:24 140510861715200 [Note] WSREP: gcomm: joining thread
      2018-04-11 18:48:24 140510832359168 [ERROR] WSREP: Process completed with error: wsrep_sst_rsync --role 'joiner' --address 'server1.domain.com' --datadir '/app/mysql/galera/' --defaults-file '/app/mysql/config/galera.cnf' --parent '11591' --binlog '/app/mysql/galera/server1-binlog' : 32 (Broken pipe)
      2018-04-11 18:48:24 140510832359168 [ERROR] WSREP: Failed to read uuid:seqno and wsrep_gtid_domain_id from joiner script.
      2018-04-11 18:48:24 140521767868672 [ERROR] WSREP: SST failed: 32 (Broken pipe)
      

      Jenkins and/or Ansible seems to send SIGHUP signals for some reason, so that's when this issue occurs.

      Attachments

        Issue Links

          Activity

            would running mysqld under nohup be a workaround?

            serg Sergei Golubchik added a comment - would running mysqld under nohup be a workaround?
            seppo Seppo Jaakola added a comment -

            SST processes are configured with following signals as enabled:

            /* make sure the following signals are not ignored in child process */
            sigset_t default_signals; sigemptyset(&default_signals);
            sigaddset(&default_signals, SIGHUP);
            sigaddset(&default_signals, SIGINT);
            sigaddset(&default_signals, SIGQUIT);
            sigaddset(&default_signals, SIGPIPE);
            sigaddset(&default_signals, SIGTERM);
            sigaddset(&default_signals, SIGCHLD);

            These can be changed, of course, but what actual problem does it cause if SST process can be interrupted by SIGHUP?
            Note, that the related MDEV-14282 has lowered priority to "Minor", and this one has "Critical" priority

            seppo Seppo Jaakola added a comment - SST processes are configured with following signals as enabled: /* make sure the following signals are not ignored in child process */ sigset_t default_signals; sigemptyset(&default_signals); sigaddset(&default_signals, SIGHUP); sigaddset(&default_signals, SIGINT); sigaddset(&default_signals, SIGQUIT); sigaddset(&default_signals, SIGPIPE); sigaddset(&default_signals, SIGTERM); sigaddset(&default_signals, SIGCHLD); These can be changed, of course, but what actual problem does it cause if SST process can be interrupted by SIGHUP? Note, that the related MDEV-14282 has lowered priority to "Minor", and this one has "Critical" priority

            seppo,

            The main problem that we've seen is that Ansible seems to raise SIGHUPs at strange times, so if a DBA starts a Galera node with Ansible and the node SSTs, then SST failures are common due to SIGHUPs. I do not know why Ansible is raising the signal to begin with though.

            GeoffMontee Geoff Montee (Inactive) added a comment - seppo , The main problem that we've seen is that Ansible seems to raise SIGHUPs at strange times, so if a DBA starts a Galera node with Ansible and the node SSTs, then SST failures are common due to SIGHUPs. I do not know why Ansible is raising the signal to begin with though.

            10.1 is EOL.

            janlindstrom Jan Lindström added a comment - 10.1 is EOL.

            People

              seppo Seppo Jaakola
              GeoffMontee Geoff Montee (Inactive)
              Votes:
              2 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.