Quick Start (Remote Workers)
This guide walks you through running a Torc workflow on multiple remote machines via SSH. Jobs are distributed across workers without requiring an HPC scheduler like Slurm.
For local execution, see Quick Start (Local). For HPC/Slurm execution, see Quick Start (HPC).
Prerequisites
- SSH key-based authentication to all remote machines (no password prompts)
- Torc installed on all machines with matching versions
- Torc server accessible from all machines
Start the Server
Start a Torc server that's accessible from the remote machines. This typically means binding to a network interface (not just localhost):
torc-server run --database torc.db --host 0.0.0.0 --port 8080
Create a Worker File
Create a file listing the remote machines. Each line contains one machine in the format
[user@]hostname[:port]:
# workers.txt
worker1.example.com
alice@worker2.example.com
admin@192.168.1.10:2222
Lines starting with # are comments. Empty lines are ignored.
Create a Workflow
Save this as workflow.yaml:
name: distributed_hello
description: Distributed hello world workflow
jobs:
- name: job 1
command: echo "Hello from $(hostname)!"
- name: job 2
command: echo "Hello again from $(hostname)!"
- name: job 3
command: echo "And once more from $(hostname)!"
Create the Workflow on the Server
torc workflows create workflow.yaml
Note the workflow ID in the output.
Run Workers on Remote Machines
Start workers on all remote machines. Each worker will poll for available jobs and execute them:
torc remote run --workers workers.txt <workflow-id> --poll-interval 5
This will:
- Check SSH connectivity to all machines
- Verify all machines have the same torc version
- Start a worker process on each machine (detached via
nohup) - Report which workers started successfully
Check Worker Status
Monitor which workers are still running:
torc remote status <workflow-id>
View Workflow Progress
Check job status from any machine:
torc jobs list <workflow-id>
Or use the interactive TUI:
torc tui
Collect Logs
After the workflow completes, collect logs from all workers:
torc remote collect-logs <workflow-id> --local-output-dir ./logs
This creates a tarball for each worker containing:
- Worker logs:
torc_worker_<workflow_id>.log - Job stdout/stderr:
job_stdio/job_*.oandjob_stdio/job_*.e - Resource utilization data (if enabled):
resource_utilization/resource_metrics_*.db
Stop Workers
If you need to stop workers before the workflow completes:
torc remote stop <workflow-id>
Add --force to send SIGKILL instead of SIGTERM.
Next Steps
- Remote Workers Guide - Detailed configuration and troubleshooting
- Creating Workflows - Workflow specification format
- Resource Monitoring - Track CPU/memory usage per job