Uploaded image for project: 'MariaDB ColumnStore'
  1. MariaDB ColumnStore
  2. MCOL-6060

Update process of running multiburza

    XMLWordPrintable

Details

    • Task
    • Status: Closed (View Workflow)
    • Major
    • Resolution: Fixed
    • None
    • N/A
    • burza
    • None
    • 2025-6, 2025-9

    Description

      Please update how burza is run in buildbot using Ansible deployment description as a reference: https://github.com/mariadb-corporation/burza/blob/main/burza/deploy/
      Most of the changes are related only to multinode mode (have a look at multinode.yml)

      Starting Ray before running burza

      The only change that is common to both monoburza and multiburza is that we must start Ray before running burza and stop it after running it. It will help us avoid giant errors that were occurring even after successful runs of burza.
      So in monoburza:

      poetry run ray start --head --port=6379 --disable-usage-stats
      <wait for ray status --address=127.0.0.1:6379 to return 0>
      poetry run burza run-tests...
      poetry run ray stop --force
      

      In multiburza we must also add this option to ray start command on every node: --resources="

      {\"node_name:<node_name>\": 1}

      "

      RAY_NODE_NAME is now CLUSTER_NODE_NAME

      Turns out that Ray has its own meaning for envvar RAY_NODE_NAME. So now we must pass CLUSTER_NODE_NAME (head,replica1,replica2,...)

      Options to set on head

       CLUSTER_NODE_NAME: "head"
       SECONDARY_NODE_NAMES: "replica1,replica2"
       RESTART_WITH_MCS_CLUSTER: "true"
      

      Options to set on replicas

       CLUSTER_NODE_NAME: "replicaN"
       TEST_RUNNER: "freeloader_test_runner"
       DATA_POINT_GENERATORS: "cpu_load,memory_stats"
        # All exporters/report generators are run on primary
       REPORT_GENERATORS: []
       EXPORTERS: []
       # Replicas have a reduced set of MCS services
       CPU_LOAD_PROCESS_NAMES: "PrimProc,StorageManager,WriteEngineServer,workernode"
       MEM_USAGE_PROCESS_NAMES: "PrimProc,StorageManager,WriteEngineServer,workernode"
       # Cluster is restarted by primary
       RESTART_DB_BEFORE_TEST_CASE: "false"
      

      Starting Ray on replicas

      poetry run ray start \
      --address=<head_ip:6379> \
      --resources="

      {\"node_name:<replicaN>\": 1}

      "

      Attachments

        Issue Links

          Activity

            People

              tturenko Timofey Turenko
              AlexanderPresniakov Alexander Presniakov
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved:

                Git Integration

                  Error rendering 'com.xiplink.jira.git.jira_git_plugin:git-issue-webpanel'. Please contact your Jira administrators.