Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Workflow Specification Reference

This page documents all data models used in workflow specification files. Workflow specs can be written in YAML, JSON, JSON5, or KDL formats.

WorkflowSpec

The top-level container for a complete workflow definition.

NameTypeDefaultDescription
namestringrequiredName of the workflow
userstringcurrent userUser who owns this workflow
descriptionstringnoneDescription of the workflow
parametersmap<string, string>noneShared parameters that can be used by jobs and files via use_parameters
jobs[JobSpec]requiredJobs that make up this workflow
files[FileSpec]noneFiles associated with this workflow
user_data[UserDataSpec]noneUser data associated with this workflow
resource_requirements[ResourceRequirementsSpec]noneResource requirements available for this workflow
failure_handlers[FailureHandlerSpec]noneFailure handlers available for this workflow
slurm_schedulers[SlurmSchedulerSpec]noneSlurm schedulers available for this workflow
slurm_defaultsSlurmDefaultsSpecnoneDefault Slurm parameters to apply to all schedulers
resource_monitorResourceMonitorConfignoneResource monitoring configuration
actions[WorkflowActionSpec]noneActions to execute based on workflow/job state transitions
use_pending_failedbooleanfalseUse PendingFailed status for failed jobs (enables AI-assisted recovery)
compute_node_expiration_buffer_secondsintegernoneShut down compute nodes this many seconds before expiration
compute_node_wait_for_new_jobs_secondsintegernoneCompute nodes wait for new jobs this long before exiting
compute_node_ignore_workflow_completionbooleanfalseCompute nodes hold allocations even after workflow completes
compute_node_wait_for_healthy_database_minutesintegernoneCompute nodes wait this many minutes for database recovery
jobs_sort_methodClaimJobsSortMethodnoneMethod for sorting jobs when claiming them

JobSpec

Defines a single computational task within a workflow.

NameTypeDefaultDescription
namestringrequiredName of the job
commandstringrequiredCommand to execute for this job
invocation_scriptstringnoneOptional script for job invocation
resource_requirementsstringnoneName of a ResourceRequirementsSpec to use
failure_handlerstringnoneName of a FailureHandlerSpec to use
schedulerstringnoneName of the scheduler to use for this job
cancel_on_blocking_job_failurebooleanfalseCancel this job if a blocking job fails
supports_terminationbooleanfalseWhether this job supports graceful termination
depends_on[string]noneJob names that must complete before this job runs (exact matches)
depends_on_regexes[string]noneRegex patterns for job dependencies
input_files[string]noneFile names this job reads (exact matches)
input_file_regexes[string]noneRegex patterns for input files
output_files[string]noneFile names this job produces (exact matches)
output_file_regexes[string]noneRegex patterns for output files
input_user_data[string]noneUser data names this job reads (exact matches)
input_user_data_regexes[string]noneRegex patterns for input user data
output_user_data[string]noneUser data names this job produces (exact matches)
output_user_data_regexes[string]noneRegex patterns for output user data
parametersmap<string, string>noneLocal parameters for generating multiple jobs
parameter_modestring"product"How to combine parameters: "product" (Cartesian) or "zip"
use_parameters[string]noneWorkflow parameter names to use for this job

FileSpec

Defines input/output file artifacts that establish implicit job dependencies.

NameTypeDefaultDescription
namestringrequiredName of the file (used for referencing in jobs)
pathstringrequiredFile system path
parametersmap<string, string>noneParameters for generating multiple files
parameter_modestring"product"How to combine parameters: "product" (Cartesian) or "zip"
use_parameters[string]noneWorkflow parameter names to use for this file

UserDataSpec

Arbitrary JSON data that can establish dependencies between jobs.

NameTypeDefaultDescription
namestringnoneName of the user data (used for referencing in jobs)
dataJSONnoneThe data content as a JSON value
is_ephemeralbooleanfalseWhether the user data is ephemeral

ResourceRequirementsSpec

Defines compute resource requirements for jobs.

NameTypeDefaultDescription
namestringrequiredName of this resource configuration (referenced by jobs)
num_cpusintegerrequiredNumber of CPUs required
memorystringrequiredMemory requirement (e.g., "1m", "2g", "512k")
num_gpusinteger0Number of GPUs required
num_nodesinteger1Number of nodes required
runtimestring"PT1H"Runtime limit in ISO8601 duration format (e.g., "PT30M", "PT2H")

FailureHandlerSpec

Defines error recovery strategies for jobs.

NameTypeDefaultDescription
namestringrequiredName of the failure handler (referenced by jobs)
rules[FailureHandlerRuleSpec]requiredRules for handling different exit codes

FailureHandlerRuleSpec

A single rule within a failure handler for handling specific exit codes.

NameTypeDefaultDescription
exit_codes[integer][]Exit codes that trigger this rule
match_all_exit_codesbooleanfalseIf true, matches any non-zero exit code
recovery_scriptstringnoneOptional script to run before retrying
max_retriesinteger3Maximum number of retry attempts

SlurmSchedulerSpec

Defines a Slurm HPC job scheduler configuration.

NameTypeDefaultDescription
namestringnoneName of the scheduler (used for referencing)
accountstringrequiredSlurm account
partitionstringnoneSlurm partition name
nodesinteger1Number of nodes to allocate
walltimestring"01:00:00"Wall time limit
memstringnoneMemory specification
gresstringnoneGeneric resources (e.g., GPUs)
qosstringnoneQuality of service
ntasks_per_nodeintegernoneNumber of tasks per node
tmpstringnoneTemporary storage specification
extrastringnoneAdditional Slurm parameters

SlurmDefaultsSpec

Workflow-level default parameters applied to all Slurm schedulers. This is a map of parameter names to values.

Any valid sbatch long option can be specified (without the leading --), except for parameters managed by torc: partition, nodes, walltime, time, mem, gres, name, job-name.

The account parameter is allowed as a workflow-level default.

Example:

slurm_defaults:
  qos: "high"
  constraint: "cpu"
  mail-user: "user@example.com"
  mail-type: "END,FAIL"

WorkflowActionSpec

Defines conditional actions triggered by workflow or job state changes.

NameTypeDefaultDescription
trigger_typestringrequiredWhen to trigger: "on_workflow_start", "on_workflow_complete", "on_jobs_ready", "on_jobs_complete"
action_typestringrequiredWhat to do: "run_commands", "schedule_nodes"
jobs[string]noneFor job triggers: exact job names to match
job_name_regexes[string]noneFor job triggers: regex patterns to match job names
commands[string]noneFor run_commands: commands to execute
schedulerstringnoneFor schedule_nodes: scheduler name
scheduler_typestringnoneFor schedule_nodes: scheduler type ("slurm", "local")
num_allocationsintegernoneFor schedule_nodes: number of node allocations
start_one_worker_per_nodebooleannoneFor schedule_nodes: start one worker per allocated node
max_parallel_jobsintegernoneFor schedule_nodes: maximum parallel jobs
persistentbooleanfalseWhether the action persists and can be claimed by multiple workers

ResourceMonitorConfig

Configuration for resource usage monitoring.

NameTypeDefaultDescription
enabledbooleanfalseEnable resource monitoring
granularityMonitorGranularity"Summary"Level of detail for metrics collection
sample_interval_secondsinteger5Sampling interval in seconds
generate_plotsbooleanfalseGenerate resource usage plots

MonitorGranularity

Enum specifying the level of detail for resource monitoring.

ValueDescription
SummaryCollect summary statistics only
TimeSeriesCollect detailed time series data

ClaimJobsSortMethod

Enum specifying how jobs are sorted when being claimed by workers.

ValueDescription
noneNo sorting (default)
gpus_runtime_memorySort by GPUs, then runtime, then memory
gpus_memory_runtimeSort by GPUs, then memory, then runtime

Parameter Formats

Parameters support several formats for generating multiple jobs or files:

FormatExampleDescription
Integer range"1:100"Inclusive range from 1 to 100
Integer range with step"0:100:10"Range with step size
Float range"0.0:1.0:0.1"Float range with step
Integer list"[1,5,10,100]"Explicit list of integers
Float list"[0.1,0.5,0.9]"Explicit list of floats
String list"['adam','sgd','rmsprop']"Explicit list of strings

Template substitution in strings:

  • Basic: {param_name} - Replace with parameter value
  • Formatted integer: {i:03d} - Zero-padded (001, 042, 100)
  • Formatted float: {lr:.4f} - Precision (0.0010, 0.1000)

See the Job Parameterization reference for more details.