[MDBF-333] Buildbot: Save entire var/ contents when test fails Created: 2022-02-08  Updated: 2022-11-21  Resolved: 2022-11-21

Status: Closed
Project: MariaDB Foundation Development
Component/s: Buildbot
Affects Version/s: N/A
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Vicențiu Ciorbaru Assignee: Vlad Bogolin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 0d
Time Spent: 1d
Original Estimate: Not Specified


 Description   

In order to help debug race conditions that happen sporadically, we need the contents of var/ directory to be saved.

As this folder can be rather big, save it only when the run has failed. We can discuss smarter approaches later if space is a big issue.

Identify a time frame for how long these files need to be stored.

Comments:
Note that tests are now run with --max-save-core=1 --max-save-datadir=1
Saving one core is probably good enough. Having even one failed test directory is much better than nothing, but it would be good to have at least 3 or 4. This will of course put some constraints on shared memory, which needs to be considered.
Time from of 48 hours should be enough, or even better a round robin of as many tests that fits into a given size, like 10G or so.



 Comments   
Comment by Elena Stepanova [ 2022-02-08 ]

There are some things in vardir which can be removed before you archive it. They are either copies of files from elsewhere, or just useless.
My probably incomplete and possibly incorrect list currently is:

  • std_data
  • plugins (<- for source builds)
  • run
  • tmp
  • install.db
  • mysqld.N (<- these look important, but in reality if the test failed, they would be under log/testname/, not in vardir directly)
Comment by Vlad Bogolin [ 2022-06-28 ]

One example of required file is:

mysql-test/var/log/mariabackup.huge_lsn

Since this can be quite large, we probably need to define a separate service that uploads the files to make them available. Potential flow:

1. Buildbot creates the archive and stores it on the worker side.
2. A separate service uploads the archives to be available to download.
3. On the builder page, there needs to be a link to the appropriate archive.

Problems: since the upload will not be made during the actual build step, there will be a timeframe where the link to download the archive does not work.

Comment by Faustin Lammler [ 2022-06-29 ]

vladbogo I suggest we try to skip the 2/, since all workers are connected to BB master (wireguard), it should not be too difficult to make this available from ci.mariadb.org (or something else) with a nginx reverse proxy (or haproxy).

So that would be:
1. Buildbot creates the archive and stores it on the worker side;
2. On the builder page, there needs to be a link to the appropriate archive;
3. Define retention of such archive (warning not all workers have the same storage capacity).

Generated at Thu Feb 08 03:37:07 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.