[MDBF-300] Buildbot multi-master and load balancing Created: 2021-12-21  Updated: 2022-08-09  Resolved: 2022-08-09

Status: Closed
Project: MariaDB Foundation Development
Component/s: None
Affects Version/s: N/A
Fix Version/s: N/A

Type: Task Priority: Major
Reporter: Vlad Bogolin Assignee: Vlad Bogolin
Resolution: Fixed Votes: 0
Labels: None
Remaining Estimate: 0d
Time Spent: 9.5d
Original Estimate: 3d

Attachments: PNG File Screenshot 2022-03-08 at 10.24.57.png     File zabbix_alert.py    
Issue Links:
PartOf
includes MDBF-452 Re-factor master Galera Closed
is part of MDBF-41 Milestone 5: Desirable fixes Open

 Description   

Due to the increased number of builders, buildbot needs to run multiple master. However, each master has access to all worker machines, so a load balancing scheme is needed in order to ensure that the worker machines are not overloaded. Since the workers used are latent, that means that there is no process running on each worker machine. So, a way of communicating which each machine is needed in order to get the current load and decide if a new build can be started. Since we already have a Zabbix monitoring in place, it makes sense to use Zabbix in order to get the load.

Step 1. Change the master.cfg to easily support multi master
Step 2. Use the Zabbix Python API to add load balancing.



 Comments   
Comment by Vlad Bogolin [ 2022-01-11 ]

The first steps were implemented in https://github.com/MariaDB/mariadb.org-tools/commit/ec81862bfd38bc3a1e90cc7ec3918a990dfa1f3a#diff-fac968c018924bf8d32deb3a4dc17bd2ba2eebb8da7d04aeddc70bd6c2b722ee. The next steps are to get more familiar with the Zabbix API and launch multiple Docker master processes.

Comment by Faustin Lammler [ 2022-02-22 ]

Here are the items that you can use:
https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/templates/os/linux_active?at=refs%2Fheads%2Frelease%2F5.4

If I am correct the actual version use the 5 min load (https://github.com/fauust/mariadb.org-tools/blob/master/buildbot.mariadb.org/utils.py#L71-L79).

On zabbix it would be:

system.cpu.load[all,avg5]

Now, is this the best metric to use is another question. Maybe a good start and we could then try to tune it better. It may also depend on the builder.

Comment by Faustin Lammler [ 2022-02-22 ]

There is another metric that could be very clever to use IMO, that is:

system.cpu.util[,iowait]

This is a very good indicator that the machine is loaded (it generally moves along with the loadavg though).

Comment by Faustin Lammler [ 2022-03-22 ]

vladbogo remember that we should test and prepare the scenario where zabbix server is not available.

Generated at Thu Feb 08 03:36:53 UTC 2024 using Jira 8.20.16#820016-sha1:9d11dbea5f4be3d4cc21f03a88dd11d8c8687422.