Close Menu
SkytikSkytik

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    What's Hot

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025
    Facebook X (Twitter) Instagram
    • About Us
    • Contact Us
    SkytikSkytik
    • Home
    • AI Tools
    • Online Tools
    • Tech News
    • Guides
    • Reviews
    • SEO & Marketing
    • Social Media Tools
    SkytikSkytik
    Home»AI Tools»Ray: Distributed Computing For All, Part 2
    AI Tools

    Ray: Distributed Computing For All, Part 2

    AwaisBy AwaisJanuary 26, 2026No Comments11 Mins Read0 Views
    Facebook Twitter Pinterest LinkedIn Telegram Tumblr Email
    Ray: Distributed Computing For All, Part 2
    Share
    Facebook Twitter LinkedIn Pinterest Email

    instalment in my two-part series on the Ray library, a Python framework created by AnyScale for distributed and parallel computing. Part 1 covered how to parallelise CPU-intensive Python jobs on your local PC by distributing the workload across all available cores, resulting in marked improvements in runtime. I’ll leave a link to Part 1 at the end of this article.

    This part deals with a similar theme, except we take distributing Python workloads to the next level by using Ray to parallelise them across multi-server clusters in the cloud.

    If you’ve come to this without having read Part 1, the TL;DR of Ray is that it is an open-source distributed computing framework designed to make it easy to scale Python programs from a laptop to a cluster with minimal code changes. That alone should hopefully be enough to pique your interest. In my own test, on my desktop PC, I took a straightforward, relatively simple Python program that finds prime numbers and reduced its runtime by a factor of 10 by adding just four lines of code.

    Where can you run Ray clusters?

    Ray clusters can be set up on the following:

    • AWS and GCP Cloud, although unofficial integrations exist for other providers, too, such as Azure
    • AnyScale, a fully managed platform developed by the creators of Ray.
    • Kubernetes can also be used via the officially supported KubeRay project.

    Prerequisites

    To follow along with my process, you’ll need a few things set up beforehand. I’ll be using AWS for my demo, as I have an existing account there; however, I expect the setup for other cloud providers and platforms to be very similar. You should have:

    • Credentials set up to run Cloud CLI commands from your chosen provider.
    • A default VPC and at least one public subnet associated with it that has a publicly reachable IP address.
    • An SSH Key pair file (.pem) that you can download to your local system so that Ray (and you) can connect to the nodes in your cluster
    • You have enough quotas to satisfy the requested number of nodes and vCPUs in whichever cluster you set up.

    If you want to do some local testing of your Ray code before deploying it to a cluster, you’ll also need to install the Ray library. We can do that using pip.

    $ pip install ray

    I’ll be running everything from a WSL2 Ubuntu shell on my Windows desktop.

    To verify that Ray has been installed correctly, you should be able to use its command-line interpreter. In a terminal window, type in the following command.

    $ ray --help
    
    Usage: ray [OPTIONS] COMMAND [ARGS]...
    
    Options:
      --logging-level TEXT   The logging level threshold, choices=['debug',
                             'info', 'warning', 'error', 'critical'],
                             default='info'
      --logging-format TEXT  The logging format.
                             default="%%(asctime)s\t%%(levelname)s
                             %%(filename)s:%%(lineno)s -- %%(message)s"
      --version              Show the version and exit.
      --help                 Show this message and exit.
    
    Commands:
      attach               Create or attach to a SSH session to a Ray cluster.
      check-open-ports     Check open ports in the local Ray cluster.
      cluster-dump         Get log data from one or more nodes.
    ...
    ...
    ...

    If you don’t see this, something has gone wrong, and you should double-check the output of your install command.

    Assuming everything is OK, we’re good to go. 

    One last important point, though. Creating resources, such as compute clusters, on a cloud provider like AWS will incur costs, so it’s essential you bear this in mind. The good news is that Ray has a built-in command that will tear down any infrastructure you create, but to be safe, you should double-check that no unused and potentially costly services get left “switched on” by mistake.

    Our example Python code

    The first step is to modify our existing Ray code from Part 1 to run on a cluster. Here is the original code for your reference. Recall that we are trying to count the number of prime numbers within a specific numeric range.

    import math
    import time
    
    # -----------------------------------------
    # Change No. 1
    # -----------------------------------------
    import ray
    ray.init()
    
    def is_prime(n: int) -> bool:
        if n < 2: return False
        if n == 2: return True
        if n % 2 == 0: return False
        r = int(math.isqrt(n)) + 1
        for i in range(3, r, 2):
            if n % i == 0:
                return False
        return True
    
    # -----------------------------------------
    # Change No. 2
    # -----------------------------------------
    @ray.remote(num_cpus=1)  # pure-Python loop → 1 CPU per task
    def count_primes(a: int, b: int) -> int:
        c = 0
        for n in range(a, b):
            if is_prime(n):
                c += 1
        return c
    
    if __name__ == "__main__":
        A, B = 10_000_000, 20_000_000
        total_cpus = int(ray.cluster_resources().get("CPU", 1))
    
        # Start "chunky"; we can sweep this later
        chunks = max(4, total_cpus * 2)
        step = (B - A) // chunks
    
        print(f"nodes={len(ray.nodes())}, CPUs~{total_cpus}, chunks={chunks}")
        t0 = time.time()
        refs = []
        for i in range(chunks):
            s = A + i * step
            e = s + step if i < chunks - 1 else B
            # -----------------------------------------
            # Change No. 3
            # -----------------------------------------
            refs.append(count_primes.remote(s, e))
    
        # -----------------------------------------
        # Change No. 4
        # -----------------------------------------
        total = sum(ray.get(refs))
    
        print(f"total={total}, time={time.time() - t0:.2f}s")

    What modifications are needed to run it on a cluster? The answer is that just one minor change is required. 

    Change 
    
    ray.init() 
    
    to
    
    ray.init(address=auto)

    That’s one of the beauties of Ray. The same code runs almost unmodified on your local PC, and anywhere else you care to run it, including large, multi-server cloud clusters.

    Setting up our cluster

    On the cloud, a Ray cluster consists of a head node and one or more worker nodes. In AWS, all these nodes are simply EC2 instances. Ray clusters can be fixed-size or autoscale up and down based on the resources requested by applications running on the cluster. The head node is started first, and the worker nodes are configured with the head node’s address to form the cluster. If auto-scaling is enabled, worker nodes automatically scale up or down based on the application’s load and will scale down after a user-specified period (5 minutes by default).

    Ray uses YAML files to set up clusters. A YAML file is just a plain-text file with a JSON-like syntax used for system configuration.

    Here is the YAML file I’ll be using to set up my cluster. I found that the closest EC2 instance to my desktop PC, in terms of CPU core count and performance, was a c7g.8xlarge. For simplicity, I’m having the head node be the same server type as all the workers, but you can mix and match different EC2 types if desired.

    cluster_name: ray_test
    
    provider:
      type: aws
      region: eu-west-1
      availability_zone: eu-west-1a
    
    auth:
      # For Amazon Linux AMIs the SSH user is 'ec2-user'.
      # If you switch to an Ubuntu AMI, change this to 'ubuntu'.
      ssh_user: ec2-user
      ssh_private_key: ~/.ssh/ray-autoscaler_eu-west-1.pem
    
    max_workers: 10
    idle_timeout_minutes: 10
    
    head_node_type: head_node
    
    available_node_types:
      head_node:
        node_config:
          InstanceType: c7g.8xlarge
          ImageId: ami-06687e45b21b1fca9
          KeyName: ray-autoscaler_eu-west-1
    
      worker_node:
        min_workers: 5
        max_workers: 5
        node_config:
          InstanceType: c7g.8xlarge
          ImageId: ami-06687e45b21b1fca9
          KeyName: ray-autoscaler_eu-west-1
          InstanceMarketOptions:
            MarketType: spot
    
    # =========================
    # Setup commands (run on head + workers)
    # =========================
    setup_commands:
      - |
        set -euo pipefail
    
        have_cmd() { command -v "$1" >/dev/null 2>&1; }
        have_pip_py() {
          python3 -c 'import importlib.util, sys; sys.exit(0 if importlib.util.find_spec("pip") else 1)'
        }
    
        # 1) Ensure Python 3 is present
        if ! have_cmd python3; then
          if have_cmd dnf; then
            sudo dnf install -y python3
          elif have_cmd yum; then
            sudo yum install -y python3
          elif have_cmd apt-get; then
            sudo apt-get update -y
            sudo apt-get install -y python3
          else
            echo "No supported package manager found to install python3." >&2
            exit 1
          fi
        fi
    
        # 2) Ensure pip exists
        if ! have_pip_py; then
          python3 -m ensurepip --upgrade >/dev/null 2>&1 || true
        fi
        if ! have_pip_py; then
          if have_cmd dnf; then
            sudo dnf install -y python3-pip || true
          elif have_cmd yum; then
            sudo yum install -y python3-pip || true
          elif have_cmd apt-get; then
            sudo apt-get update -y || true
            sudo apt-get install -y python3-pip || true
          fi
        fi
        if ! have_pip_py; then
          curl -fsS https://bootstrap.pypa.io/get-pip.py -o /tmp/get-pip.py
          python3 /tmp/get-pip.py
        fi
    
        # 3) Upgrade packaging tools and install Ray
        python3 -m pip install -U pip setuptools wheel
        python3 -m pip install -U "ray[default]"

    Here is a brief explanation of each critical YAML section.

    cluster_name: Assigns a name to the cluster, allowing Ray to track and manage 
    it separately from others.
    
    provider:  Specifies which cloud to use (AWS here), along with the region and 
    availability zone for launching instances.
    
    auth:  Defines how Ray connects to instances over SSH - the user name and the 
    private key used for authentication.
    
    max_workers:  Sets the maximum number of worker nodes Ray can scale up to when 
    more compute is needed.
    
    idle_timeout_minutes:  Tells Ray how long to wait before automatically terminating 
    idle worker nodes.
    
    available_node_types:  Describes the different node types (head and workers), their 
    instance sizes, AMI images, and scaling limits.
    
    head_node_type:  Identifies which of the node types acts as the cluster's controller
    (the head node).
    
    setup_commands:  Lists shell commands that run once on each node when it's first 
    created, typically to install software or set up the environment.

    To start the cluster creation, use this ray command from the terminal.

    $ ray up -y ray_test.yaml

    Ray will do its thing, creating all the necessary infrastructure, and after a few minutes, you should see something like this in your terminal window.

    ...
    ...
    ...
    Next steps
      To add another node to this Ray cluster, run
        ray start --address='10.0.9.248:6379'
    
      To connect to this Ray cluster:
        import ray
        ray.init()
    
      To submit a Ray job using the Ray Jobs CLI:
        RAY_ADDRESS='http://10.0.9.248:8265' ray job submit --working-dir . -- python my_script.py
    
      See https://docs.ray.io/en/latest/cluster/running-applications/job-submission/index.html
      for more information on submitting Ray jobs to the Ray cluster.
    
      To terminate the Ray runtime, run
        ray stop
    
      To view the status of the cluster, use
        ray status
    
      To monitor and debug Ray, view the dashboard at
        10.0.9.248:8265
    
      If connection to the dashboard fails, check your firewall settings and network configuration.
    Shared connection to 108.130.38.255 closed.
      New status: up-to-date
    
    Useful commands:
      To terminate the cluster:
        ray down /mnt/c/Users/thoma/ray_test.yaml
    
      To retrieve the IP address of the cluster head:
        ray get-head-ip /mnt/c/Users/thoma/ray_test.yaml
    
      To port-forward the cluster's Ray Dashboard to the local machine:
        ray dashboard /mnt/c/Users/thoma/ray_test.yaml
    
      To submit a job to the cluster, port-forward the Ray Dashboard in another terminal and run:
        ray job submit --address http://localhost: --working-dir . -- python my_script.py
    
      To connect to a terminal on the cluster head for debugging:
        ray attach /mnt/c/Users/thoma/ray_test.yaml
    
      To monitor autoscaling:
        ray exec /mnt/c/Users/thoma/ray_test.yaml 'tail -n 100 -f /tmp/ray/session_latest/logs/monitor*'

    Running a Ray job on a cluster

    At this stage, the cluster has been built, and we are ready to submit our Ray job to it. To give the cluster something more substantial to work with, I increased the range for the prime search in my code from 10,000,000 to 20,000,000 to 10,000,000–60,000,000. On my local desktop, Ray ran this in 18 seconds. 

    I waited a short time for all the cluster nodes to initialise fully, then ran the code on the cluster with this command. 

    $  ray exec ray_test.yaml 'python3 ~/ray_test.py'

    Here is my output.

    (base) tom@tpr-desktop:/mnt/c/Users/thoma$ ray exec ray_test2.yaml 'python3 ~/primes_ray.py'
    2025-11-01 13:44:22,983 INFO util.py:389 -- setting max workers for head node type to 0
    Loaded cached provider configuration
    If you experience issues with the cloud provider, try re-running the command with --no-config-cache.
    Fetched IP: 52.213.155.130
    Warning: Permanently added '52.213.155.130' (ED25519) to the list of known hosts.
    2025-11-01 13:44:26,469 INFO worker.py:1832 -- Connecting to existing Ray cluster at address: 10.0.5.86:6379...
    2025-11-01 13:44:26,477 INFO worker.py:2003 -- Connected to Ray cluster. View the dashboard at http://10.0.5.86:8265
    nodes=6, CPUs~192, chunks=384
    (autoscaler +2s) Tip: use `ray status` to view detailed cluster status. To disable these messages, set RAY_SCHEDULER_EVENTS=0.
    (autoscaler +2s) No available node types can fulfill resource requests {'CPU': 1.0}*160. Add suitable node types to this cluster to resolve this issue.
    total=2897536, time=5.71s
    Shared connection to 52.213.155.130 closed.

    As you can see the time taken to run on the cluster was just over 5 seconds. So, five worker nodes ran the same job in less than a third of the time it took on my local PC. Not too shabby.

    When you’re finished with your cluster, please run the following Ray command to tear it down.

    $ ray down -y ray_test.yaml

    As I mentioned before, you should always double-check your account to ensure this command has worked as expected. 

    Summary

    This article, the second in a two-part series, demonstrates how to run CPU-intensive Python code on cloud-based clusters using the Ray library. By spreading the workload across all available vCPUs, Ray ensures our code delivers fast performance and runtimes. 

    I described and showed how to create a cluster using a YAML file and how to utilise the Ray command-line interface to submit code for execution on the cluster.

    Using AWS as an example platform, I took Ray Python code, which had been running on my local PC and ran it — almost unchanged — on a 6-node EC2 cluster. This showed significant performance improvements (3x) over the non-cluster run time.

    Finally, I showed how to use the ray command-line tool to tear down the AWS cluster infrastructure Ray had created.

    If you haven’t already read my first article in this series, click on the link below to check it out.

    Please note that other than being a some-time user of their services, I have no affiliation with AnyScale or AWS or any other organisation mentioned in this article.

    computing Distributed Part Ray
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    Awais
    • Website

    Related Posts

    DynaTrust: Defending Multi-Agent Systems Against Sleeper Agents via Dynamic Trust Graphs

    March 19, 2026

    Linear Regression Is Actually a Projection Problem, Part 1: The Geometric Intuition

    March 19, 2026

    Did You Check the Right Pocket? Cost-Sensitive Store Routing for Memory-Augmented Agents

    March 19, 2026

    Efficient High-Resolution Visual Understanding for Vision-Language Models

    March 19, 2026

    Large Language Model Enhanced Greybox Fuzzing

    March 19, 2026

    Why You Should Stop Worrying About AI Taking Data Science Jobs

    March 19, 2026
    Leave A Reply Cancel Reply

    Top Posts

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 20250 Views

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 20250 Views

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 20250 Views
    Don't Miss

    Easy Fish Curry With Coconut Milk Recipe

    March 19, 2026

    This fish curry recipe is flavorful, flexible, and ready in under an hour. Built on…

    DynaTrust: Defending Multi-Agent Systems Against Sleeper Agents via Dynamic Trust Graphs

    March 19, 2026

    Introducing new collaboration features for Inoreader Teams

    March 19, 2026

    Stop competing with your own content

    March 19, 2026
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Latest Reviews

    AI frameworks: building business intelligence

    March 19, 2026

    Potato Chips Are My Chicest Party Trick

    March 19, 2026
    Most Popular

    13 Trending Songs on TikTok in Nov 2025 (+ How to Use Them)

    November 18, 20257 Views

    How to watch the 2026 GRAMMY Awards online from anywhere

    February 1, 20263 Views

    Corporate Reputation Management Strategies | Sprout Social

    November 19, 20252 Views
    Our Picks

    At Least 32 People Dead After a Mine Bridge Collapsed Due to Overcrowding

    November 17, 2025

    Here’s how I turned a Raspberry Pi into an in-car media server

    November 17, 2025

    Beloved SF cat’s death fuels Waymo criticism

    November 17, 2025

    Subscribe to Updates

    Get the latest creative news from FooBar about art, design and business.

    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • About Us
    • Contact Us
    • Privacy Policy
    • Terms & Conditions
    • Disclaimer

    © 2025 skytik.cc. All rights reserved.

    Type above and press Enter to search. Press Esc to cancel.