Intelligent Restart
When you modify input files or configuration after a workflow has run, you need a way to re-execute
only the affected jobs. Intelligent restart — implemented by torc workflows reinit — detects what
changed and resets just the jobs whose inputs are now stale, then propagates that reset through the
dependency graph.
Note: Intelligent restart is for correct workflows that need to be rerun with different inputs. For failed workflows, use
torc recoverinstead — it diagnoses failures, adjusts resources, and resubmits jobs.
Motivating Example
Suppose you have three input files, each consumed by a work job, that fan in to a postprocess
job. The workflow completes successfully, but you discover a bug in input_2. After fixing it, you
run:
torc workflows reinit <workflow_id>
Torc detects that input_2 was modified, resets work_2 (which reads it) and postprocess (which
depends on work_2's output), and leaves work_1 and work_3 alone:
flowchart TD
in1([input_1])
in2([input_2<br/>modified])
in3([input_3])
work1[work_1<br/>unchanged]
work2[work_2<br/>reset → ready]
work3[work_3<br/>unchanged]
post[postprocess<br/>reset → ready]
in1 --> work1 --> post
in2 --> work2 --> post
in3 --> work3 --> post
style in1 fill:#d4edda,stroke:#28a745,color:#155724
style in3 fill:#d4edda,stroke:#28a745,color:#155724
style in2 fill:#f8d7da,stroke:#dc3545,color:#721c24
style work1 fill:#d4edda,stroke:#28a745,color:#155724
style work3 fill:#d4edda,stroke:#28a745,color:#155724
style work2 fill:#fff3cd,stroke:#ffc107,color:#856404
style post fill:#fff3cd,stroke:#ffc107,color:#856404
When to Use
Use torc workflows reinit when:
- Input files changed — You modified an input file and want dependent jobs to rerun
- Configuration updated — You changed
user_dataparameters - Output files missing — Output files were deleted and need regeneration
- Job definition changed — You modified a job's command or other attributes
- Iterative development — You're refining a workflow and need quick iteration
Basic Usage
# Preview what would change (recommended first step)
torc workflows reinit <workflow_id> --dry-run
# Reinitialize the workflow
torc workflows reinit <workflow_id>
# Force reinitialization even with warnings
torc workflows reinit <workflow_id> --force
How Change Detection Works
Reinitialization detects changes through three mechanisms:
1. File Modification Times
For files tracked in the workflow, Torc compares the current st_mtime (modification time) against
the stored value. If a file was modified since the last run, jobs that use it as input are marked
for re-execution.
# Modify an input file
echo "new data" > input.json
# Reinitialize detects the change
torc workflows reinit <workflow_id>
# Output: Reset 3 jobs due to changed inputs
2. Job Attribute and User Data Hashing
Torc computes SHA256 hashes of critical job attributes (such as the command) and user_data input
values. If any hash differs from the stored value, the job is marked for re-execution. This detects
changes like modified commands, updated scripts, or changed configuration parameters.
3. Missing Output Files
If a job's output file no longer exists on disk, the job is marked for re-execution regardless of whether inputs changed.
The Reinitialization Process
When you run reinit, Torc performs these steps:
- Bump run_id — Increments the workflow's run counter for tracking
- Reset workflow status — Clears the previous run's completion state
- Check file modifications — Compares current
st_mtimevalues to stored values - Check missing outputs — Identifies jobs whose output files no longer exist
- Check user_data changes — Computes and compares input hashes
- Mark affected jobs — Sets jobs needing re-execution to
uninitialized - Re-evaluate dependencies — Runs
initialize_jobsto set jobs toreadyorblocked
Dependency Propagation
When a job is marked for re-execution, all downstream jobs that depend on its outputs are also marked. This ensures the entire dependency chain is re-executed:
preprocess (input changed) → marked for rerun
↓
process (depends on preprocess output) → also marked
↓
postprocess (depends on process output) → also marked
Dry Run Mode
Always use --dry-run first to preview changes without modifying anything:
torc workflows reinit <workflow_id> --dry-run
Example output:
Dry run: 5 jobs would be reset due to changed inputs
- preprocess
- analyze_batch_1
- analyze_batch_2
- merge_results
- generate_report
Async Reinitialization
Reinitialization runs asynchronously on the server: even though torc workflows reinit feels
synchronous, it is implemented by queuing a task and waiting for it to finish. For large workflows
the dependency graph rebuild can take seconds to minutes, which is why the work is offloaded.
By default the CLI blocks on the task using the workflow SSE stream (with polling as a fallback) and
exits non-zero if it fails. Pass --async to skip the wait and get the task handle back for
scripting:
# Kick off a reinit and resume later
task_id=$(torc -f json workflows reinit <workflow_id> --async | jq -r .task_id)
# ... do other work ...
torc tasks wait --timeout 300 "$task_id"
A few properties worth knowing:
- One active task per workflow. At most one async task is in-flight for a given workflow at a
time (different async operations would conflict on overlapping state, so they are serialized).
Calling
reinitwhile a previous reinit is still running is idempotent: the client checks for an active task first and returns it, so client-side pre-steps (run_id bump, status reset, changed-file processing) don't double-apply on top of whatever that task is doing. - Mismatched parameters return 409. If a reinit is active with one set of parameters and a
second caller asks with different parameters (e.g. a different
only_uninitialized), the server refuses with409 Conflictrather than silently returning the running task with the wrong semantics. - Crash-safe. Tasks are persisted server-side. If the server restarts while a reinit is
in-flight, the task is marked
failedwith an explanatory error on startup, so clients polling or waiting receive a terminal state rather than hanging. - Timeout leaves the task running. If the CLI's wait times out (via
--wait-timeout), the server-side task keeps going. Resume withtorc tasks wait <id>. - Dry-run is synchronous.
--dry-rundoes not create a task; it returns the preview directly.
See Also
- Automatic Failure Recovery — Use
torc recoverto rerun failed jobs (different from intelligent restart, which is for changed inputs) - Rerun Failed Jobs — Quick how-to for retrying failures
- Dependency Resolution — How Torc tracks which jobs depend on which files