Workflow Specification Reference
This page documents all data models used in workflow specification files. Workflow specs can be written in YAML, JSON, JSON5, or KDL formats.
WorkflowSpec
The top-level container for a complete workflow definition.
| Name | Type | Default | Description |
|---|---|---|---|
name | string | required | Name of the workflow |
user | string | current user | User who owns this workflow |
description | string | none | Description of the workflow |
parameters | map<string, string> | none | Shared parameters that can be used by jobs and files via use_parameters |
jobs | [JobSpec] | required | Jobs that make up this workflow |
files | [FileSpec] | none | Files associated with this workflow |
user_data | [UserDataSpec] | none | User data associated with this workflow |
resource_requirements | [ResourceRequirementsSpec] | none | Resource requirements available for this workflow |
failure_handlers | [FailureHandlerSpec] | none | Failure handlers available for this workflow |
slurm_schedulers | [SlurmSchedulerSpec] | none | Slurm schedulers available for this workflow |
slurm_defaults | SlurmDefaultsSpec | none | Default Slurm parameters to apply to all schedulers |
resource_monitor | ResourceMonitorConfig | none | Resource monitoring configuration |
actions | [WorkflowActionSpec] | none | Actions to execute based on workflow/job state transitions |
use_pending_failed | boolean | false | Use PendingFailed status for failed jobs (enables AI-assisted recovery) |
compute_node_expiration_buffer_seconds | integer | none | Shut down compute nodes this many seconds before expiration |
compute_node_wait_for_new_jobs_seconds | integer | none | Compute nodes wait for new jobs this long before exiting |
compute_node_ignore_workflow_completion | boolean | false | Compute nodes hold allocations even after workflow completes |
compute_node_wait_for_healthy_database_minutes | integer | none | Compute nodes wait this many minutes for database recovery |
jobs_sort_method | ClaimJobsSortMethod | none | Method for sorting jobs when claiming them |
JobSpec
Defines a single computational task within a workflow.
| Name | Type | Default | Description |
|---|---|---|---|
name | string | required | Name of the job |
command | string | required | Command to execute for this job |
invocation_script | string | none | Optional script for job invocation |
resource_requirements | string | none | Name of a ResourceRequirementsSpec to use |
failure_handler | string | none | Name of a FailureHandlerSpec to use |
scheduler | string | none | Name of the scheduler to use for this job |
cancel_on_blocking_job_failure | boolean | false | Cancel this job if a blocking job fails |
supports_termination | boolean | false | Whether this job supports graceful termination |
depends_on | [string] | none | Job names that must complete before this job runs (exact matches) |
depends_on_regexes | [string] | none | Regex patterns for job dependencies |
input_files | [string] | none | File names this job reads (exact matches) |
input_file_regexes | [string] | none | Regex patterns for input files |
output_files | [string] | none | File names this job produces (exact matches) |
output_file_regexes | [string] | none | Regex patterns for output files |
input_user_data | [string] | none | User data names this job reads (exact matches) |
input_user_data_regexes | [string] | none | Regex patterns for input user data |
output_user_data | [string] | none | User data names this job produces (exact matches) |
output_user_data_regexes | [string] | none | Regex patterns for output user data |
parameters | map<string, string> | none | Local parameters for generating multiple jobs |
parameter_mode | string | "product" | How to combine parameters: "product" (Cartesian) or "zip" |
use_parameters | [string] | none | Workflow parameter names to use for this job |
FileSpec
Defines input/output file artifacts that establish implicit job dependencies.
| Name | Type | Default | Description |
|---|---|---|---|
name | string | required | Name of the file (used for referencing in jobs) |
path | string | required | File system path |
parameters | map<string, string> | none | Parameters for generating multiple files |
parameter_mode | string | "product" | How to combine parameters: "product" (Cartesian) or "zip" |
use_parameters | [string] | none | Workflow parameter names to use for this file |
UserDataSpec
Arbitrary JSON data that can establish dependencies between jobs.
| Name | Type | Default | Description |
|---|---|---|---|
name | string | none | Name of the user data (used for referencing in jobs) |
data | JSON | none | The data content as a JSON value |
is_ephemeral | boolean | false | Whether the user data is ephemeral |
ResourceRequirementsSpec
Defines compute resource requirements for jobs.
| Name | Type | Default | Description |
|---|---|---|---|
name | string | required | Name of this resource configuration (referenced by jobs) |
num_cpus | integer | required | Number of CPUs required |
memory | string | required | Memory requirement (e.g., "1m", "2g", "512k") |
num_gpus | integer | 0 | Number of GPUs required |
num_nodes | integer | 1 | Number of nodes required |
runtime | string | "PT1H" | Runtime limit in ISO8601 duration format (e.g., "PT30M", "PT2H") |
FailureHandlerSpec
Defines error recovery strategies for jobs.
| Name | Type | Default | Description |
|---|---|---|---|
name | string | required | Name of the failure handler (referenced by jobs) |
rules | [FailureHandlerRuleSpec] | required | Rules for handling different exit codes |
FailureHandlerRuleSpec
A single rule within a failure handler for handling specific exit codes.
| Name | Type | Default | Description |
|---|---|---|---|
exit_codes | [integer] | [] | Exit codes that trigger this rule |
match_all_exit_codes | boolean | false | If true, matches any non-zero exit code |
recovery_script | string | none | Optional script to run before retrying |
max_retries | integer | 3 | Maximum number of retry attempts |
SlurmSchedulerSpec
Defines a Slurm HPC job scheduler configuration.
| Name | Type | Default | Description |
|---|---|---|---|
name | string | none | Name of the scheduler (used for referencing) |
account | string | required | Slurm account |
partition | string | none | Slurm partition name |
nodes | integer | 1 | Number of nodes to allocate |
walltime | string | "01:00:00" | Wall time limit |
mem | string | none | Memory specification |
gres | string | none | Generic resources (e.g., GPUs) |
qos | string | none | Quality of service |
ntasks_per_node | integer | none | Number of tasks per node |
tmp | string | none | Temporary storage specification |
extra | string | none | Additional Slurm parameters |
SlurmDefaultsSpec
Workflow-level default parameters applied to all Slurm schedulers. This is a map of parameter names to values.
Any valid sbatch long option can be specified (without the leading --), except for parameters
managed by torc: partition, nodes, walltime, time, mem, gres, name, job-name.
The account parameter is allowed as a workflow-level default.
Example:
slurm_defaults:
qos: "high"
constraint: "cpu"
mail-user: "user@example.com"
mail-type: "END,FAIL"
WorkflowActionSpec
Defines conditional actions triggered by workflow or job state changes.
| Name | Type | Default | Description |
|---|---|---|---|
trigger_type | string | required | When to trigger: "on_workflow_start", "on_workflow_complete", "on_jobs_ready", "on_jobs_complete" |
action_type | string | required | What to do: "run_commands", "schedule_nodes" |
jobs | [string] | none | For job triggers: exact job names to match |
job_name_regexes | [string] | none | For job triggers: regex patterns to match job names |
commands | [string] | none | For run_commands: commands to execute |
scheduler | string | none | For schedule_nodes: scheduler name |
scheduler_type | string | none | For schedule_nodes: scheduler type ("slurm", "local") |
num_allocations | integer | none | For schedule_nodes: number of node allocations |
start_one_worker_per_node | boolean | none | For schedule_nodes: start one worker per allocated node |
max_parallel_jobs | integer | none | For schedule_nodes: maximum parallel jobs |
persistent | boolean | false | Whether the action persists and can be claimed by multiple workers |
ResourceMonitorConfig
Configuration for resource usage monitoring.
| Name | Type | Default | Description |
|---|---|---|---|
enabled | boolean | false | Enable resource monitoring |
granularity | MonitorGranularity | "Summary" | Level of detail for metrics collection |
sample_interval_seconds | integer | 5 | Sampling interval in seconds |
generate_plots | boolean | false | Generate resource usage plots |
MonitorGranularity
Enum specifying the level of detail for resource monitoring.
| Value | Description |
|---|---|
Summary | Collect summary statistics only |
TimeSeries | Collect detailed time series data |
ClaimJobsSortMethod
Enum specifying how jobs are sorted when being claimed by workers.
| Value | Description |
|---|---|
none | No sorting (default) |
gpus_runtime_memory | Sort by GPUs, then runtime, then memory |
gpus_memory_runtime | Sort by GPUs, then memory, then runtime |
Parameter Formats
Parameters support several formats for generating multiple jobs or files:
| Format | Example | Description |
|---|---|---|
| Integer range | "1:100" | Inclusive range from 1 to 100 |
| Integer range with step | "0:100:10" | Range with step size |
| Float range | "0.0:1.0:0.1" | Float range with step |
| Integer list | "[1,5,10,100]" | Explicit list of integers |
| Float list | "[0.1,0.5,0.9]" | Explicit list of floats |
| String list | "['adam','sgd','rmsprop']" | Explicit list of strings |
Template substitution in strings:
- Basic:
{param_name}- Replace with parameter value - Formatted integer:
{i:03d}- Zero-padded (001, 042, 100) - Formatted float:
{lr:.4f}- Precision (0.0010, 0.1000)
See the Job Parameterization reference for more details.