Quick Start (Remote Workers)
This guide walks you through running a Torc workflow on multiple remote machines via SSH. Jobs are distributed across workers without requiring an HPC scheduler like Slurm.
For local execution, see Quick Start (Local). For HPC/Slurm execution, see Quick Start (HPC).
Prerequisites
- SSH key-based authentication to all remote machines (no password prompts)
- Torc installed on all machines with matching versions
- Torc server accessible from all machines
Start the Server
Start a Torc server. By default, it binds to 0.0.0.0 so it's accessible from remote machines:
torc-server run --database torc.db --port 8080
Security Note: The server starts without authentication and is accessible from any machine that can reach this host. For networks with untrusted users, see Authentication to secure your server.
Create a Workflow
Save this as workflow.yaml:
name: distributed_hello
description: Distributed hello world workflow
jobs:
- name: job 1
command: echo "Hello from $(hostname)!"
- name: job 2
command: echo "Hello again from $(hostname)!"
- name: job 3
command: echo "And once more from $(hostname)!"
Create the Workflow on the Server
torc workflows create workflow.yaml
Note the workflow ID in the output.
Add Remote Workers
Add remote machines as workers. Each address uses the format [user@]hostname[:port]:
torc remote add-workers <workflow-id> user@host1 user@host2 user@host3
Or add workers from a file (one address per line, # for comments):
torc remote add-workers-from-file workers.txt <workflow-id>
Run Workers on Remote Machines
Start workers on all registered remote machines via SSH:
torc remote run <workflow-id>
This will:
- Check SSH connectivity to all machines
- Verify all machines have the same torc version
- Start a worker process on each machine (detached via
nohup) - Report which workers started successfully
Check Worker Status
Monitor which workers are still running:
torc remote status <workflow-id>
View Workflow Progress
Check job status from any machine:
torc jobs list <workflow-id>
Or use the interactive TUI:
torc tui
Collect Logs
After the workflow completes, collect logs from all workers:
torc remote collect-logs <workflow-id> --local-output-dir ./logs
This creates a tarball for each worker containing:
- Worker logs:
torc_worker_<workflow_id>.log - Job stdout/stderr:
job_stdio/job_*.oandjob_stdio/job_*.e - Resource utilization data (if enabled):
resource_utilization/resource_metrics_*.db
Stop Workers
If you need to stop workers before the workflow completes:
torc remote stop <workflow-id>
Add --force to send SIGKILL instead of SIGTERM.
Next Steps
- CLI Cheat Sheet - Quick reference for all common commands
- Remote Workers Guide - Detailed configuration and troubleshooting
- Creating Workflows - Workflow specification format
- Resource Monitoring - Track CPU/memory usage per job