Uploaded image for project: 'MariaDB Foundation Development'
  1. MariaDB Foundation Development
  2. MDBF-989

Refactor buildbot internals to drop docker-latent-worker

Details

    Description

      Scope

      This task covers the refactoring work required to move from DockerLatentWorker to an equivalent of "RunVM", only for Docker containers.

      Motivation

      The use of DockerLatentWorker as a base for defining environments in buildbot was a bad design decision. There are numerous limitations with this approach:

      • It is impossible to define a single DockerLatentWorker as an "environment" (for instance Debian 12 Build Environment), then use it across multiple physical machines.
        • To get this to work, one must define "environment" clones for each physical machine we allocate to use "DockerLatentWorkers". That is why we currently have:
          • aarch64-bbw[1-6]-docker-debian-11
          • aarch64-bbw[1-6]-docker-debian-12
            If one more physical machine is added to buildbot, all these DockerLatentWorkers must be cloned with a 7 in their name.
      • Given the limitation from above, using LatentWorkers creates a long list of "worker" machines in the buildbot interface, one worker for each environment that actually can run on any particular physical machine. This list offers no real value and causes confusion. It also makes it very hard to:
        • Isolate a physical machine temporarily from the build system (for other uses, such as performance testing).
        • Dynamically scale resources up and down without rewriting the configuration.
      • Balancing resources becomes problematic, as one must use buildbot Locks, but locks are not shared across multiple masters. This can lead to over allocating builds to machines from different masters, timing out the system.
      • It also makes it hard to properly prioritise certain builds, unless they also get allocated to specific physical machines (as is the case for branch-protection builders).
      • Requires the use of "Trigger" steps to be able to change environments. This forces creation of more builders when an environment change is needed during a test step.

      Technical changes

      This task will require the addition of a "wrapper" class RunInDocker that wraps any particular shell command inside a:
      docker run <env-parameters> <command>

      This task does not cover migrating all builders to use this class. For evaluating completeness, there will be two builders that will be migrated to this class first:

      • Latest Debian at time of implementation (Debian 12 or 13)
      • Latest RHEL at time of implementation (RHEL 9 or RHEL 10)

      File sharing between test steps

      As with this architecture, the environment is preserved between test steps, there will need to be another method employed.

      • We will be leveraging specific folders on the physical machines, that will be mounted as volumes for all builder test steps.
      • For certain cases there will be an autofs system mount, to share build artifacts across builders, much like it happens now with pushing packages to https://ci.mariadb.org.

      New challenges introduced by dropping DockerLatentWorker

      Cleanup pre and post builder runs

      We have the following guarantee given buildbot's internal builder architecture:

      • Any builder can not run in parallel on the same physical machine. Thus containers can have a particular name prefix to identify the corresponding builder.
        • For example, for amd64-debian-12, we can use container names:
          debian-12_<uuid>

      Given this, the solution is:

      • When a test starts we clean up any potentially hanging containers by prefix matching their name.
      • When a test ends (or is canceled gracefully), we clean up any remaining running containers by prefix matching their name.
      Not overwriting other build files
      • Shared artifacts between test steps will use a unique path, referencing the builder name and build number.
      • Cleanup of previous artifacts will be done similarly to how containers are cleaned and removed.
      Changes in requirements on physical machines environment

      Compared to previous setup, a worker machine will have to have the following packages installed:

      1. Docker daemon, with the socket accessible to the buildbot worker process. (Docker was previously required as well, but was controlled via the buildbot master)
      2. Buildbot worker process running (python3 virtualenv running internally, possibly twisted too)

      Attachments

        Issue Links

          Activity

            There are no comments yet on this issue.

            People

              rvarzaru Varzaru Razvan-Liviu
              cvicentiu Vicențiu Ciorbaru
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:

                Time Tracking

                  Estimated:
                  Original Estimate - 15d
                  15d
                  Remaining:
                  Remaining Estimate - 15d
                  15d
                  Logged:
                  Time Spent - Not Specified
                  Not Specified