[MCOL-6060] Update process of running multiburza - Jira

XML

Word

Printable

Details

Type: Task
Status: Closed (View Workflow)
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: N/A
Component/s: burza
Labels:
None

Epic Link:
CI
Sprint:
2025-6, 2025-9

Description

Please update how burza is run in buildbot using Ansible deployment description as a reference: https://github.com/mariadb-corporation/burza/blob/main/burza/deploy/
Most of the changes are related only to multinode mode (have a look at multinode.yml)

Starting Ray before running burza

The only change that is common to both monoburza and multiburza is that we must start Ray before running burza and stop it after running it. It will help us avoid giant errors that were occurring even after successful runs of burza.
So in monoburza:

poetry run ray start --head --port=6379 --disable-usage-stats

<wait for ray status --address=127.0.0.1:6379 to return 0>

poetry run burza run-tests...

poetry run ray stop --force

In multiburza we must also add this option to ray start command on every node: --resources="

{\"node_name:<node_name>\": 1}

RAY_NODE_NAME is now CLUSTER_NODE_NAME

Turns out that Ray has its own meaning for envvar RAY_NODE_NAME. So now we must pass CLUSTER_NODE_NAME (head,replica1,replica2,...)

Options to set on head

 CLUSTER_NODE_NAME: "head"

 SECONDARY_NODE_NAMES: "replica1,replica2"

 RESTART_WITH_MCS_CLUSTER: "true"

Options to set on replicas

 CLUSTER_NODE_NAME: "replicaN"

 TEST_RUNNER: "freeloader_test_runner"

 DATA_POINT_GENERATORS: "cpu_load,memory_stats"

  # All exporters/report generators are run on primary

 REPORT_GENERATORS: []

 EXPORTERS: []

 # Replicas have a reduced set of MCS services

 CPU_LOAD_PROCESS_NAMES: "PrimProc,StorageManager,WriteEngineServer,workernode"

 MEM_USAGE_PROCESS_NAMES: "PrimProc,StorageManager,WriteEngineServer,workernode"

 # Cluster is restarted by primary

 RESTART_DB_BEFORE_TEST_CASE: "false"

Starting Ray on replicas

poetry run ray start \
--address=<head_ip:6379> \
--resources="

{\"node_name:<replicaN>\": 1}

Attachments

Issue Links

is blocked by

MCOL-6089 Wrong routing on the Noble Hetzner machines

Closed

relates to

MCOL-5845 Deployment of Burza in multinode mode

Closed

MCOL-6024 Deployment of burza using columnstore-ansible-aws

Closed

Activity

People

Assignee:: Timofey Turenko

Reporter:: Alexander Presniakov

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Dates

Created:: 2025-06-12 16:43

Updated:: 2025-09-05 12:38

Resolved:: 2025-09-05 12:38

Git Integration

Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.