Uploaded image for project: 'MariaDB Foundation Development'
  1. MariaDB Foundation Development
  2. MDBF-415

Monitor SSH status for libvirt workers

    XMLWordPrintable

Details

    Description

      When the libvirt master starts, it creates a ssh connection to the worker machine for each defined worker. If for any reason that ssh connection drops, then there will be build failures. The master doesn't handle at all this and a master restart is needed.

      Ways to reproduce:
      1. Look into the running processes on hz-bbm1 and there should be several entries like

      buildma+ 3576703 3576701  0 14:45 ?        00:00:00 ssh -p 65001 -l buildbot -T -e none -- 100.64.100.20 sh -c 'which virt-ssh-helper 1>/dev/null 2>&1; if test $? = 0; then     virt-ssh-helper 'qemu:///system'; else    if 'nc' -q 2>&1 | grep "requires an argument" >/dev/null 2>&1; then ARG=-q0;else ARG=;fi;'nc' $ARG -U /var/run/libvirt/libvirt-sock; fi'
      

      2. Libvirt restart on the worker machine (hz-bbw5)
      3. Go back to 1 and there should be no active ssh connection
      4. Master restart

      Ideas for monitoring:
      faust would it be possible to monotor if there is at least one ssh connection similar to the one from above?

      Attachments

        Issue Links

          Activity

            People

              faust Faustin Lammler
              vladbogo Vlad Bogolin
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0d
                  0d
                  Logged:
                  Time Spent - 3h
                  3h