[MDBF-326] buildbot - kernel diversity in host machines of latent workers Created: 2022-01-27  Updated: 2022-02-11

Status: Open
Project: MariaDB Foundation Development
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Major
Reporter: Daniel Black Assignee: Vlad Bogolin
Resolution: Unresolved Votes: 0
Labels: None
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: PNG File screenshot_20220128_101229.png    

 Description   

The current use of mainly debian based distros as the base of docker latent workers has created a common kernel for all distro builds.

This lack of diversity can miss imporant aspects of a user's workload. i.e. a Red Hat package user isn't going to use a debian kernel.

We need more variety of host based distros to ensure a test base that represents user workloads.

the lack of variety introduces major testing gaps for the same reasons as MDBF-321



 Comments   
Comment by Daniel Black [ 2022-01-27 ]

also include kernel version in logs (uname -a) sufficient.

Comment by Faustin Lammler [ 2022-01-27 ]

Hi Daniel!
Could you please explain a bit more.
I don't understand very well why this a problem to build our images on Debian based kernels, but I am maybe missing knowledge how containerization works.

Thanks

Comment by Daniel Black [ 2022-01-27 ]

Containers share a kernel layer underneath (which I'm sure you knew). In most cases the container libc and a few other user space library wrappers have enough to be rather flexible with the kernel version available.

For RHEL, Ubuntu and other distros there is a different kernel. We'd like to be sure that MariaDB is working with those kernels as well as the Debian kernel.

MDEV-26674 was found in a non-stable Debian kernel (and Fedora kernels). Being able to see kernel version differences on test case failures make it possible to identify kernel bugs sooner.

We may also accidentally use kernel interface that was incomplete/not available on older versions by mistake.

Without testing more kernels we're relying on the variety and attention of developers to see test case behaviours. This distributed lack of visible kernel versions makes it harder to correlate.

Without a bb variety we getting testing on our users systems.

This isn't a request to pull down workers, just keep it in mind as new hardware is added/deployed/major renovations.

So for most converge think of adding:

  • Something old (RHEL7 - how many more years?)
  • Something new - Fedora or even Debian unstable
  • Common use RHEL8
  • And Ubuntu because its commonly used. (LTS - their other ones move too quick)
Comment by Faustin Lammler [ 2022-01-28 ]

vladbogo, to clarify and as discussed with Daniel, the problem is not where we create the CI images (GitHub Action runner, that is Ubuntu). The diversity problem that is mentioned here concerns our workers, and I agree that it would be good to have more diversity of OS hosts that run our docker builders.

Our Ansible deployment repo already fully handle Debian based distribution and handle most of deployments on RedHat based distribution.
So this does not represents a big overhead for next time we add a new worker.

danblack, keep in mind that we are also limited by cloud providers/sponsors and the OS that they provide for their bare metal machines (and I don't want to spend much time to install something that is not proposed, this on the other end is time consuming).

See for instance above what HZ propose. If this needs becomes more urgent, we can reinstall one of our HZ builder with one of those OS. That's for the AMD64 part, for PPC and ARM we already have more diversity.

Generated at Thu Feb 08 03:37:04 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.