Workflow Reinitialization
When you modify input files or configuration after a workflow has run, you need a way to re-execute only the affected jobs. Reinitialization handles this by detecting what changed and marking the appropriate jobs for re-execution.
When to Use Reinitialization
Use torc workflows reinitialize when:
- Input files changed — You modified an input file and want dependent jobs to rerun
- Configuration updated — You changed user_data parameters
- Output files missing — Output files were deleted and need regeneration
- Job definition changed — You modified a job's command or other attributes
- Iterative development — You're refining a workflow and need quick iteration
Basic Usage
# Preview what would change (recommended first step)
torc workflows reinitialize <workflow_id> --dry-run
# Reinitialize the workflow
torc workflows reinitialize <workflow_id>
# Force reinitialization even with warnings
torc workflows reinitialize <workflow_id> --force
How Change Detection Works
Reinitialization detects changes through three mechanisms:
1. File Modification Times
For files tracked in the workflow, Torc compares the current st_mtime (modification time) against
the stored value. If a file was modified since the last run, jobs that use it as input are marked
for re-execution.
# Modify an input file
echo "new data" > input.json
# Reinitialize detects the change
torc workflows reinitialize <workflow_id>
# Output: Reset 3 jobs due to changed inputs
2. Job Attribute and User Data Hashing
Torc computes SHA256 hashes of critical job attributes (such as the command) and user_data input
values. If any hash differs from the stored value, the job is marked for re-execution. This detects
changes like modified commands, updated scripts, or changed configuration parameters.
3. Missing Output Files
If a job's output file no longer exists on disk, the job is marked for re-execution regardless of whether inputs changed.
The Reinitialization Process
When you run reinitialize, Torc performs these steps:
- Bump run_id — Increments the workflow's run counter for tracking
- Reset workflow status — Clears the previous run's completion state
- Check file modifications — Compares current
st_mtimevalues to stored values - Check missing outputs — Identifies jobs whose output files no longer exist
- Check user_data changes — Computes and compares input hashes
- Mark affected jobs — Sets jobs needing re-execution to
uninitialized - Re-evaluate dependencies — Runs
initialize_jobsto set jobs toreadyorblocked
Dependency Propagation
When a job is marked for re-execution, all downstream jobs that depend on its outputs are also marked. This ensures the entire dependency chain is re-executed:
preprocess (input changed) → marked for rerun
↓
process (depends on preprocess output) → also marked
↓
postprocess (depends on process output) → also marked
Dry Run Mode
Always use --dry-run first to preview changes without modifying anything:
torc workflows reinitialize <workflow_id> --dry-run
Example output:
Dry run: 5 jobs would be reset due to changed inputs
- preprocess
- analyze_batch_1
- analyze_batch_2
- merge_results
- generate_report
Retrying Failed Jobs
Important: Reinitialization does not automatically retry failed jobs. To retry failed jobs, use
reset-status:
# Reset failed jobs to ready status, then reinitialize to check for other changes
torc workflows reset-status <workflow_id> --failed-only --reinitialize
# Or just reset failed jobs without reinitialization
torc workflows reset-status <workflow_id> --failed-only
Comparison with Full Reset
| Scenario | Use reinitialize | Use reset-status |
|---|---|---|
| Input file changed | Yes | No |
| Job command changed | Yes | No |
| Want to rerun everything | No | Yes |
| Retry failed jobs only | No | Yes (--failed-only) |
| Iterative development | Yes | Depends |
| Changed workflow spec | Create new workflow | Create new workflow |