Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Quick Start (Remote Workers)

This guide walks you through running a Torc workflow on multiple remote machines via SSH. Jobs are distributed across workers without requiring an HPC scheduler like Slurm.

For local execution, see Quick Start (Local). For HPC/Slurm execution, see Quick Start (HPC).

Prerequisites

  • SSH key-based authentication to all remote machines (no password prompts)
  • Torc installed on all machines with matching versions
  • Torc server accessible from all machines

Start the Server

Start a Torc server. By default, it binds to 0.0.0.0 so it's accessible from remote machines:

torc-server run --database torc.db --port 8080

Security Note: The server starts without authentication and is accessible from any machine that can reach this host. For networks with untrusted users, see Authentication to secure your server.

Create a Workflow

Save this as workflow.yaml:

name: distributed_hello
description: Distributed hello world workflow

jobs:
  - name: job 1
    command: echo "Hello from $(hostname)!"
  - name: job 2
    command: echo "Hello again from $(hostname)!"
  - name: job 3
    command: echo "And once more from $(hostname)!"

Create the Workflow on the Server

torc workflows create workflow.yaml

Note the workflow ID in the output.

Add Remote Workers

Add remote machines as workers. Each address uses the format [user@]hostname[:port]:

torc remote add-workers <workflow-id> user@host1 user@host2 user@host3

Or add workers from a file (one address per line, # for comments):

torc remote add-workers-from-file workers.txt <workflow-id>

Run Workers on Remote Machines

Start workers on all registered remote machines via SSH:

torc remote run <workflow-id>

This will:

  1. Check SSH connectivity to all machines
  2. Verify all machines have the same torc version
  3. Start a worker process on each machine (detached via nohup)
  4. Report which workers started successfully

Check Worker Status

Monitor which workers are still running:

torc remote status <workflow-id>

View Workflow Progress

Check job status from any machine:

torc jobs list <workflow-id>

Or use the interactive TUI:

torc tui

Collect Logs

After the workflow completes, collect logs from all workers:

torc remote collect-logs <workflow-id> --local-output-dir ./logs

This creates a tarball for each worker containing:

  • Worker logs: torc_worker_<workflow_id>.log
  • Job stdout/stderr: job_stdio/job_*.o and job_stdio/job_*.e
  • Resource utilization data (if enabled): resource_utilization/resource_metrics_*.db

Stop Workers

If you need to stop workers before the workflow completes:

torc remote stop <workflow-id>

Add --force to send SIGKILL instead of SIGTERM.

Next Steps