Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Workflow Graph

The WorkflowGraph module provides a directed acyclic graph (DAG) representation of workflow jobs and their dependencies. It serves as the core data structure for dependency analysis, scheduler planning, and execution visualization.

Purpose

The graph abstraction addresses several key challenges:

  • Unified Representation: Works with both workflow specifications (at creation time) and database models (at runtime), providing a consistent interface for graph algorithms
  • Dependency Analysis: Enables topological sorting, level computation, and critical path detection
  • Scheduler Planning: Groups jobs by resource requirements and dependency status for efficient scheduler generation
  • Sub-workflow Detection: Identifies connected components that can be scheduled independently

Data Structures

JobNode

Represents a single job (or parameterized job template) in the graph:

#![allow(unused)]
fn main() {
pub struct JobNode {
    pub name: String,                           // Job name (may contain {param} placeholders)
    pub resource_requirements: Option<String>,  // Resource requirements name
    pub instance_count: usize,                  // 1 for normal jobs, N for parameterized
    pub name_pattern: String,                   // Regex pattern matching all instances
    pub scheduler: Option<String>,              // Assigned scheduler
    pub command: String,                        // Command to execute
}
}

WorkflowGraph

The main graph structure with bidirectional edges for efficient traversal:

#![allow(unused)]
fn main() {
pub struct WorkflowGraph {
    nodes: HashMap<String, JobNode>,           // Jobs indexed by name
    depends_on: HashMap<String, HashSet<String>>,   // Forward edges (blockers)
    depended_by: HashMap<String, HashSet<String>>,  // Reverse edges (dependents)
    levels: Option<Vec<Vec<String>>>,          // Cached topological levels
    components: Option<Vec<WorkflowComponent>>, // Cached connected components
}
}

SchedulerGroup

Groups jobs that share scheduling characteristics:

#![allow(unused)]
fn main() {
pub struct SchedulerGroup {
    pub resource_requirements: String,    // Common RR name
    pub has_dependencies: bool,           // Whether jobs have blockers
    pub job_count: usize,                 // Total instances across jobs
    pub job_name_patterns: Vec<String>,   // Regex patterns for matching
    pub job_names: Vec<String>,           // Job names in this group
}
}

Construction Methods

From Workflow Specification

#![allow(unused)]
fn main() {
WorkflowGraph::from_spec(&spec) -> Result<Self, Error>
}

Builds the graph at workflow creation time:

  1. Creates nodes for each job specification
  2. Resolves explicit dependencies (depends_on)
  3. Resolves regex dependencies (depends_on_regexes)
  4. Computes implicit dependencies from input/output files and user data

From Database Models

#![allow(unused)]
fn main() {
WorkflowGraph::from_jobs(jobs, resource_requirements) -> Result<Self, Error>
}

Builds the graph from fetched database records (used for recovery and visualization):

  1. Creates nodes from JobModel records
  2. Resolves dependencies via depends_on_job_ids (if available)
  3. Falls back to computing dependencies from file relationships

Key Operations

Topological Levels

Groups jobs by dependency depth for parallel execution planning:

  • Level 0: Jobs with no dependencies (can start immediately)
  • Level N: Jobs whose dependencies are all in levels < N

Used for execution ordering and TUI visualization.

Connected Components

Identifies independent sub-workflows using BFS traversal:

  • Each component can be scheduled independently
  • Enables parallel execution of unrelated job pipelines
  • Useful for large workflows with multiple independent processing chains

Scheduler Groups

Groups jobs by (resource_requirements, has_dependencies) for scheduler generation:

  • Jobs without dependencies: Submitted at workflow start
  • Jobs with dependencies: Submitted on-demand when jobs become ready
  • Enables the shared generate_scheduler_plan() function used by both torc slurm generate and torc slurm regenerate

Critical Path

Finds the longest path through the graph (by instance count):

  • Identifies bottleneck jobs that limit parallelism
  • Used for estimating minimum execution time
  • Helps prioritize optimization efforts

Integration Points

Scheduler Plan Generation

The SchedulerPlan module uses WorkflowGraph::scheduler_groups() to generate Slurm schedulers:

#![allow(unused)]
fn main() {
let graph = WorkflowGraph::from_spec(&spec)?;
let groups = graph.scheduler_groups();
let plan = generate_scheduler_plan(&graph, &resource_requirements, &profile, ...);
}

Execution Plan Visualization

The execution plan display uses WorkflowGraph::from_jobs() for runtime visualization:

#![allow(unused)]
fn main() {
let graph = WorkflowGraph::from_jobs(&jobs, &resource_requirements)?;
let levels = graph.topological_levels()?;
// Render DAG visualization in TUI
}

Recovery Scenarios

The regenerate command uses the graph to determine scheduler groupings for failed workflows:

#![allow(unused)]
fn main() {
let graph = WorkflowGraph::from_jobs(&jobs, &resource_requirements)?;
let plan = generate_scheduler_plan(&graph, ...);
// Apply plan to recreate schedulers and actions
}

Design Decisions

Bidirectional Edges

The graph maintains both depends_on and depended_by maps for O(1) traversal in either direction. This is critical for:

  • Finding what becomes ready when a job completes
  • Computing connected components efficiently
  • Building subgraphs for partial analysis

Lazy Computation with Caching

Topological levels and connected components are computed on-demand and cached. This avoids unnecessary computation for simple queries while ensuring efficient repeated access.

Parameterized Job Handling

Parameterized jobs are represented as single nodes with instance_count > 1. The name_pattern field provides a regex for matching expanded instances, enabling scheduler grouping without full expansion.