[MDEV-7439] Power8 builders running out of disk space Created: 2015-01-12  Updated: 2015-09-13  Resolved: 2015-09-13

Status: Closed
Project: MariaDB Server
Component/s: Tests
Affects Version/s: N/A
Fix Version/s: N/A

Type: Bug Priority: Critical
Reporter: Sergey Vojtovich Assignee: Elena Stepanova
Resolution: Fixed Votes: 0
Labels: None

Issue Links:
Relates
relates to MDEV-6478 MariaDB on Power8 Closed

 Description   

Power8 VMs have really little available disk space, like 30-50Gb (minus space taken by OS). Each of them keep 3 build directories: release, debug and packages. Each directory is up to 7Gb, which gives 21Gb.

This makes builds hit out of disk errors frequently. Please add a step to bb configuration to cleanup build dir when build is completed. Or solve this problem any other reasonable way.

Also please remove "xtra" step, it is covered anyway by "xtra-big".

There's also AT7.x and AT8.x installed on p8-rhel7, which take almost 5Gb. Worth to remove one version?



 Comments   
Comment by Elena Stepanova [ 2015-09-09 ]

I've made the following changes to get it work for now:

1) Added a step to all p8_* factories:

f_p8_rhel6_bintar.addStep(RemoveDirectory(
        name="remove_build", 
        dir=WithProperties("%(distdirname)s"),
        alwaysRun=True));

It is to remove the <builder>/build subdir after the test.
It is the very last step, after all archiving, uploads etc., so it should be safe; and it's set to be executed always, even when the build fails.
Before that, the directory was only removed at the beginning of the corresponding test.

2) Added the following auxiliary step to p8_* package factories:

f_p8_trusty_deb.addStep(SetPropertyFromCommand(
        property="distdirname",
        command=["sh", "-c", WithProperties("pwd")],
        ))

It is to facilitate the previous change. I suppose it's needed since the bintar factories had it already.

3) For each p8_* factory, Added git clean in getCompileStep before running cmake:

    getCompileStep(["sh", "-c", "git clean -dfX && export PATH=/opt/at8.0/bin:$PATH && cmake . -DCMAKE_BUILD_TYPE=Debug -DMYSQL_MAINTAINER_MODE=ON && make -j4 package"],
...

Explanation:
For each builder (but not for each branch!) there is a persistent folder <common_path>/<builder_name>/source. When a build for a particular revision is run, it goes to the folder, resets it to the required revision, fetches etc.. When it's done, it copies the entire <common_path>/<builder_name>/source dir to <common_path>/<builder_name>/build and builds there.
So, technically the source tree should be clean of all build stuff. Apparently, it is not always the case, e.g. it's possible that somebody builds there manually. If it happens, the folder will contain previously built binaries and CMakeCache.txt, all of which can conflict with the current build attempt. If it conflicts badly, e.g. CMakeCache.txt belongs to a different version, it's not that bad, because the build just fails right away with a proper message; it gets more complicated when the folder contains unexpected binaries, e.g. somebody ran a build in source with default options, so it built all engines as dynamic libraries; then, the buildbot build attempts to build without Connect engine, and really does so, but ha_connect.so still exists in the folder, gets picked up by MTR, which causes all kinds of oddities. Both situations actually happened already. So, adding git clean is supposed to protect from this.

4) Commented out xtra test, as requested.

Comment by Elena Stepanova [ 2015-09-09 ]

Step 3) above is necessary anyway, step 4) was requested, and step 2) is innocent; but I consider step 1) a temporary solution.
Even though it's set to alwaysRun, it's not completely safe – if a few builds in a row are interrupted abruptly, they can still leave stuff there, and we'll get the space problem again.
I think currently the space usage there is generally excessive.

Lets take power8-vlp01 as an example.

It is set to run one build at a time, which is natural, since it's not VMs.
It runs three builders: p8-rhel6-bintar, p8-rhel6-bintar-debug and p8-rhel6-rpm, for a number of branches.
For each builder, it has a "home" folder (a.k.a. builddir, configured in the buildbot config) /home/buildbot/maria-slave/power8-vlp01-bintar, /home/buildbot/maria-slave/power8-vlp01-bintar-debug, /home/buildbot/maria-slave/power8-vlp03-rpm.

Each folder contains a persistent cloned copy of the source tree ( /home/buildbot/maria-slave/power8-vlp01-bintar/source etc.).
When a particular builder is active, it goes into its own builddir, removes the stale build subfolder, updates its own source folder, copies it into its own new build folder, builds there, runs the tests, archives and uploads if necessary, exits.
It makes no sense. Since they run one at a time, they can just as well use one common builddir. It would solve the problem of having six source trees and three binary dirs on the machine simultaneously – there would have been only one persistent source dir, and one build dir which would be re-created on each build anyway. Then the step 1) above would be unnecessary.

So, I suggest after we release current urgent releases, to do the following (first on power8-vlp01):

  • copy /home/buildbot/maria-slave/power8-vlp01-bintar into /home/buildbot/maria-slave/power8-vlp01 (it should still contain the source subfolder as it does now);
  • reconfigure p8-rhel6-* builders to use builddir: "power8-vlp01" instead of builddir: "power8-vlp01-bintar" etc. as it does now;
  • let it run once for each builder to see it works;
  • remove /home/buildbot/maria-slave/power8-vlp01-* folders;
  • repeat the exercise for other p8 slaves;
  • remove the step 1) that I added earlier.

dbart, svoj, any objections?

Comment by Sergey Vojtovich [ 2015-09-09 ]

No objections from my side.

Comment by Daniel Bartholomew [ 2015-09-11 ]

No objections from me.

Comment by Elena Stepanova [ 2015-09-13 ]

Sadly, the one-dir approach turned out to be impossible, buildbot just does not allow it:

2015-09-13 02:35:15+0300 [-] duplicate builder builddir 'power8-vlp01'
2015-09-13 02:35:15+0300 [-] duplicate builder builddir 'power8-vlp01'
2015-09-13 02:35:15+0300 [-] reconfig aborted without making any changes

So, I'm keeping the previously described (and introduced) solution with removing the builddir after a test, lets see how it goes.

Generated at Thu Feb 08 07:19:36 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.