HTTP API Design
This document describes the design principles and conventions of Torc's HTTP API.
Design Philosophy
The API follows REST conventions where appropriate, with pragmatic deviations for workflow orchestration operations that don't map cleanly to CRUD semantics.
Core principles:
- Resource-oriented: Primary entities (workflows, jobs, files) have standard CRUD endpoints
- Predictable URLs: Consistent naming and structure across all resources
- JSON everywhere: All request and response bodies use
application/json - Explicit over implicit: Required fields are marked required; optional fields have sensible defaults
Base URL and Versioning
The API is served under a versioned base path:
/torc-service/v1
Versioning strategy:
- The version in the URL path (
v1) represents the major API version - The detailed version (e.g.,
0.12.0) is in the OpenAPI spec and server responses - Breaking changes increment the major version; non-breaking changes increment minor/patch
- The version is single-sourced in
src/api_version.rsand propagates to all artifacts
URL Structure
Resource Collections
GET /resources # List all (with pagination)
POST /resources # Create new
Individual Resources
GET /resources/{id} # Get by ID
PUT /resources/{id} # Update (full replacement)
PATCH /resources/{id} # Partial update (where supported)
DELETE /resources/{id} # Delete
Nested Resources
Resources that belong to a parent use nested URLs:
GET /workflows/{id}/jobs # Jobs in workflow
GET /workflows/{id}/files # Files in workflow
GET /access_groups/{id}/members # Members in group
Action Endpoints (RPC-Style)
Operations that don't map to CRUD use verb-based paths under the resource:
POST /workflows/{id}/initialize_jobs # Build dependency graph
POST /workflows/{id}/claim_next_jobs # Atomically claim ready jobs
POST /workflows/{id}/cancel # Cancel workflow execution
POST /workflows/{id}/reset_status # Reset workflow state
POST /workflows/{id}/process_changed_job_inputs # Detect and handle input changes
POST /jobs/{id}/complete # Mark job completed
POST /jobs/{id}/manage_status_change # Transition job status
GET /tasks/{id} # Poll async task status
When to use action endpoints:
- Operations with side effects beyond simple CRUD
- Operations requiring atomicity (like
claim_next_jobs) - State machine transitions
- Batch operations
Asynchronous Actions
Some actions are long-running and can be invoked asynchronously by passing ?async=true. The server
persists a task row, returns 202 Accepted with a TaskModel, and performs the work in the
background. Currently supported on POST /workflows/{id}/initialize_jobs.
POST /workflows/{id}/initialize_jobs?async=true
→ 202 Accepted { id, workflow_id, operation, status: "queued", created_at_ms, ... }
→ 409 Conflict if an active task already exists for this (workflow, operation)
Clients then either poll GET /tasks/{id} or listen on the workflow SSE stream for a
task_completed event (the event's data.task_id identifies the task). A partial unique index
scoped to status IN ('queued', 'running') enforces at most one active task per workflow_id.
Different async operations on the same workflow would conflict on overlapping state, so they are
serialized at the workflow level rather than per-operation.
Repeated async requests of the same operation with the same parameters (e.g. two
initialize_jobs?async=true&only_uninitialized=false calls on the same workflow) are idempotent:
the server returns the existing task with 202 Accepted rather than starting a new one.
409 Conflict is returned when a task is already active and the new request can't safely be folded
into it:
- A different async operation is active (future-proofing for when more async operations exist).
- The same operation is active but with different parameters — silently returning it would mean the second caller gets different semantics than it asked for.
The 409 response body includes existing_task_id, existing_operation, and a human-readable
message explaining which case fired.
Probing without mutating
On a successful 200 response, GET /workflows/{id}/active_task returns an object of the form
{ "task": TaskModel | null }. Clients use this to detect an in-flight task before running their
own pre-steps, so they don't double-apply side effects (like bumping the workflow's run_id) on top
of someone else's task. Within that 200 response, the body distinguishes "active task exists" from
"workflow is idle" via task == null. The endpoint can still return 404 if the workflow doesn't
exist or isn't accessible to the caller, and 500 for unexpected server errors.
If the server restarts while a task is in-flight, the task is reconciled to failed on startup so
clients never see it stuck in running.
Task status progresses through queued → running → succeeded | failed.
HTTP Methods
| Method | Semantics | Idempotent | Request Body |
|---|---|---|---|
| GET | Read resource(s) | Yes | No |
| POST | Create resource or trigger action | No | Yes |
| PUT | Replace resource entirely | Yes | Yes |
| PATCH | Partial update | No | Yes |
| DELETE | Remove resource | Yes | No |
Notes:
PUTexpects the complete resource representationPATCHaccepts partial updates (only fields to change)DELETEon non-existent resources returns 404 (not 204)
Request Format
All request bodies use JSON with Content-Type: application/json.
Creating Resources
POST /workflows
{
"name": "my-workflow",
"user": "dthom",
"description": "Example workflow"
}
Bulk Operations
Some endpoints accept arrays for batch creation:
POST /bulk_jobs
{
"jobs": [
{"name": "job1", "workflow_id": 1, "command": "echo hello"},
{"name": "job2", "workflow_id": 1, "command": "echo world"}
]
}
Response Format
Success Responses
Single resource:
{
"id": 1,
"name": "my-workflow",
"user": "dthom",
"status": "ready"
}
List response (with pagination metadata):
{
"items": [...],
"offset": 0,
"count": 10,
"total_count": 42,
"max_limit": 10000,
"has_more": true
}
Error Responses
All errors use the ErrorResponse schema:
{
"error": {
"error": "NotFound",
"message": "Workflow 999 not found"
}
}
Or with additional context:
{
"error": {
"error": "ValidationError",
"message": "Invalid job status transition"
},
"errorMessage": "Cannot transition from 'completed' to 'ready'",
"code": 422
}
HTTP Status Codes
| Code | Meaning | When Used |
|---|---|---|
| 200 | OK | Successful GET, PUT, PATCH, DELETE, or POST action |
| 201 | Created | Resource created (some POST endpoints) |
| 202 | Accepted | Async action queued; response body is a TaskModel |
| 400 | Bad Request | Malformed JSON, missing required fields |
| 403 | Forbidden | User lacks permission for this resource |
| 404 | Not Found | Resource doesn't exist |
| 409 | Conflict | Async action already has an active task for this resource |
| 422 | Unprocessable Entity | Valid JSON but invalid semantics (e.g., bad status transition) |
| 500 | Internal Server Error | Unexpected server failure |
Pagination
All list endpoints support offset-based pagination:
| Parameter | Type | Default | Description |
|---|---|---|---|
offset | integer | 0 | Number of records to skip |
limit | integer | 10000 | Maximum records to return |
Constraints:
- Maximum
limit: 10,000 records (enforced server-side) - Response includes
has_moreboolean for client-side iteration - Response includes
total_countfor progress indication
Example:
GET /workflows?offset=0&limit=50
GET /workflows?offset=50&limit=50 # Next page
Filtering and Sorting
Filtering
List endpoints support query parameters for filtering:
GET /workflows?user=dthom&is_archived=false
GET /jobs?workflow_id=1&status=ready
GET /compute_nodes?workflow_id=1&is_active=true
Common filter parameters:
workflow_id: Filter by parent workflow (required for nested resources)name: Filter by name (often substring match)user: Filter by ownerstatus: Filter by status value
Sorting
GET /workflows?sort_by=created_at&reverse_sort=true
GET /jobs?sort_by=name&reverse_sort=false
| Parameter | Type | Description |
|---|---|---|
sort_by | string | Field name to sort by |
reverse_sort | boolean | If true, sort descending |
Authentication
The server supports multiple authentication modes:
HTTP Basic Auth
Authorization: Basic base64(username:password)
Credentials are validated against an htpasswd file when --htpasswd-file is specified.
Anonymous Access
When authentication is not enforced (--no-auth or no htpasswd file), requests are accepted with
the username derived from the X-Remote-User header or defaulting to "anonymous".
Authorization Model
Access control is resource-based:
- Workflow ownership: Users can access workflows they own
- Group membership: Users can access workflows shared with their groups
- System administrators: Full access to all resources
The enforce_access_control server flag controls whether authorization is checked.
Resource Organization
The API is organized into logical resource groups (OpenAPI tags):
| Tag | Resources | Description |
|---|---|---|
workflows | Workflows, workflow operations | Core workflow management |
jobs | Jobs, job status, job operations | Job execution and tracking |
files | File records | Input/output file tracking |
user_data | User data records | Key-value data dependencies |
events | Workflow events | Audit log and event stream |
compute_nodes | Compute node records | Worker node tracking |
slurm_schedulers | Slurm scheduler configs | Slurm integration |
remote_workers | Remote worker registrations | Distributed execution |
access_control | Groups, memberships, permissions | Authorization management |
workflow_actions | Scheduled actions | Automated workflow operations |
failure_handlers | Failure handler configs | Error handling rules |
ro_crate_entities | RO-Crate metadata | Research object packaging |
system | Health, version | Server status |
Thread Safety and Concurrency
Certain endpoints are designed for concurrent access from multiple workers:
claim_next_jobs
POST /workflows/{id}/claim_next_jobs?limit=5
This endpoint uses database-level write locks (BEGIN IMMEDIATE TRANSACTION) to ensure that
multiple workers calling simultaneously will not receive the same jobs. Each job is allocated to
exactly one worker.
claim_jobs_based_on_resources
Similar to claim_next_jobs but factors in resource requirements (CPU, memory, GPU) and available
capacity on the requesting worker.
Content Types
| Content-Type | Usage |
|---|---|
application/json | All request and response bodies |
text/event-stream | Server-Sent Events (dashboard real-time updates) |
Data Type Conventions
IDs
All resource IDs are 64-bit integers (int64 in OpenAPI).
Timestamps
Timestamps use Unix epoch format as float64 (seconds with fractional milliseconds).
Durations
Runtime durations use ISO 8601 format: PT30M (30 minutes), PT2H (2 hours).
Memory Sizes
Memory specifications use string format with units: "512m", "2g", "100k".
Job Status
Job status is stored and transmitted as integers (0-10):
| Value | Status |
|---|---|
| 0 | uninitialized |
| 1 | blocked |
| 2 | ready |
| 3 | pending |
| 4 | running |
| 5 | completed |
| 6 | failed |
| 7 | canceled |
| 8 | terminated |
| 9 | disabled |
| 10 | pending_failed |
API Evolution
When evolving the API:
- Additive changes (new fields, new endpoints) don't require version bumps
- Breaking changes (removed fields, changed semantics) require major version increment
- Deprecation should be communicated via documentation before removal
- The OpenAPI spec is the authoritative contract; regenerate clients after spec changes
See API Generation Architecture for the code-first workflow that maintains the API contract.