Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

HPC Profiles Reference

Complete reference for HPC profile system and CLI commands.

Overview

HPC profiles contain pre-configured knowledge about High-Performance Computing systems, enabling automatic Slurm scheduler generation based on job resource requirements.

CLI Commands

torc hpc list

List all available HPC profiles.

torc hpc list [OPTIONS]

Options:

OptionDescription
-f, --format <FORMAT>Output format: table or json

Output columns:

  • Name: Profile identifier used in commands
  • Display Name: Human-readable name
  • Partitions: Number of configured partitions
  • Detected: Whether current system matches this profile

torc hpc detect

Detect the current HPC system.

torc hpc detect [OPTIONS]

Options:

OptionDescription
-f, --format <FORMAT>Output format: table or json

Returns the detected profile name, or indicates no match.


torc hpc show

Display detailed information about an HPC profile.

torc hpc show <PROFILE> [OPTIONS]

Arguments:

ArgumentDescription
<PROFILE>Profile name (e.g., kestrel)

Options:

OptionDescription
-f, --format <FORMAT>Output format: table or json

torc hpc partitions

List partitions for an HPC profile.

torc hpc partitions <PROFILE> [OPTIONS]

Arguments:

ArgumentDescription
<PROFILE>Profile name (e.g., kestrel)

Options:

OptionDescription
-f, --format <FORMAT>Output format: table or json

Output columns:

  • Name: Partition name
  • CPUs/Node: CPU cores per node
  • Mem/Node: Memory per node
  • Max Walltime: Maximum job duration
  • GPUs: GPU count and type (if applicable)
  • Shared: Whether partition supports shared jobs
  • Notes: Special requirements or features

torc hpc match

Find partitions matching resource requirements.

torc hpc match <PROFILE> [OPTIONS]

Arguments:

ArgumentDescription
<PROFILE>Profile name (e.g., kestrel)

Options:

OptionDescription
--cpus <N>Required CPU cores
--memory <SIZE>Required memory (e.g., 64g, 512m)
--walltime <DURATION>Required walltime (e.g., 2h, 4:00:00)
--gpus <N>Required GPUs
-f, --format <FORMAT>Output format: table or json

Memory format: <number><unit> where unit is k, m, g, or t (case-insensitive).

Walltime formats:

  • HH:MM:SS (e.g., 04:00:00)
  • <N>h (e.g., 4h)
  • <N>m (e.g., 30m)
  • <N>s (e.g., 3600s)

torc hpc generate

Generate an HPC profile configuration from the current Slurm cluster.

torc hpc generate [OPTIONS]

Options:

OptionDescription
--name <NAME>Profile name (defaults to cluster name or hostname)
--display-name <NAME>Human-readable display name
-o, --output <FILE>Output file path (prints to stdout if not specified)
--skip-stdbySkip standby partitions (names ending in -stdby)

How it works:

  1. Queries sinfo to get partition names, CPUs, memory, time limits, and GRES
  2. Queries scontrol show partition for each partition to get additional details
  3. Parses GRES strings to extract GPU count and type
  4. Generates hostname-based detection pattern from current hostname
  5. Outputs TOML configuration ready to add to your config file

Example:

# Generate profile from current cluster
torc hpc generate

# Output:
# [client.hpc.custom_profiles.mycluster]
# display_name = "Mycluster"
# detect_hostname = ".*\\.mycluster\\.edu"
#
# [[client.hpc.custom_profiles.mycluster.partitions]]
# name = "compute"
# cpus_per_node = 64
# memory_mb = 256000
# max_walltime_secs = 172800
# ...

Fields extracted automatically:

  • Partition name, CPUs per node, memory (MB), max walltime (seconds)
  • GPU count and type from GRES (e.g., gpu:a100:4)
  • Shared node support from OverSubscribe setting

Fields that may need manual adjustment:

  • requires_explicit_request: Defaults to false; set to true for partitions that shouldn't be auto-selected
  • description: Not available from Slurm; add human-readable descriptions
  • gpu_memory_gb: Not available from Slurm; add if known

torc slurm generate

Generate Slurm schedulers for a workflow based on job resource requirements.

torc slurm generate [OPTIONS] --account <ACCOUNT> <WORKFLOW_FILE>

Arguments:

ArgumentDescription
<WORKFLOW_FILE>Path to workflow specification file (YAML, JSON, or JSON5)

Options:

OptionDescription
--account <ACCOUNT>Slurm account to use (required)
--profile <PROFILE>HPC profile to use (auto-detected if not specified)
-o, --output <FILE>Output file path (prints to stdout if not specified)
--no-actionsDon't add workflow actions for scheduling nodes
--forceOverwrite existing schedulers in the workflow

Generated artifacts:

  1. Slurm schedulers: One for each unique resource requirement
  2. Job scheduler assignments: Each job linked to appropriate scheduler
  3. Workflow actions: on_workflow_start/schedule_nodes actions (unless --no-actions)

Scheduler naming: <resource_requirement_name>_scheduler


Built-in Profiles

NREL Kestrel

Profile name: kestrel

Detection: Environment variable NREL_CLUSTER=kestrel

Partitions:

PartitionCPUsMemoryMax WalltimeGPUsNotes
debug104240 GB1h-Quick testing
short104240 GB4h-Short jobs
standard104240 GB48h-General workloads
long104240 GB240h-Extended jobs
medmem104480 GB48h-Medium memory
bigmem1042048 GB48h-High memory
shared104240 GB48h-Shared node access
hbw104240 GB48h-High-bandwidth memory, min 10 nodes
nvme104240 GB48h-NVMe local storage
gpu-h1002240 GB48h4x H100GPU compute

Node specifications:

  • Standard nodes: 104 cores (2x Intel Xeon Sapphire Rapids), 240 GB RAM
  • GPU nodes: 4x NVIDIA H100 80GB HBM3, 128 cores, 2 TB RAM

Configuration

Custom Profiles

Don't see your HPC? Please request built-in support so everyone benefits. See the Custom HPC Profile Tutorial for creating a profile while you wait.

Define custom profiles in your Torc configuration file:

# ~/.config/torc/config.toml

[client.hpc.custom_profiles.mycluster]
display_name = "My Cluster"
description = "Description of the cluster"
detect_env_var = "CLUSTER_NAME=mycluster"
detect_hostname = ".*\\.mycluster\\.org"
default_account = "myproject"

[[client.hpc.custom_profiles.mycluster.partitions]]
name = "compute"
cpus_per_node = 64
memory_mb = 256000
max_walltime_secs = 172800
shared = false

[[client.hpc.custom_profiles.mycluster.partitions]]
name = "gpu"
cpus_per_node = 32
memory_mb = 128000
max_walltime_secs = 86400
gpus_per_node = 4
gpu_type = "A100"
shared = false

Profile Override

Override settings for built-in profiles:

[client.hpc.profile_overrides.kestrel]
default_account = "my_default_account"

Configuration Options

[client.hpc] Section:

OptionTypeDescription
profile_overridestableOverride settings for built-in profiles
custom_profilestableDefine custom HPC profiles

Profile override options:

OptionTypeDescription
default_accountstringDefault Slurm account for this profile

Custom profile options:

OptionTypeRequiredDescription
display_namestringNoHuman-readable name
descriptionstringNoProfile description
detect_env_varstringNoEnvironment variable for detection (NAME=value)
detect_hostnamestringNoRegex pattern for hostname detection
default_accountstringNoDefault Slurm account
partitionsarrayYesList of partition configurations

Partition options:

OptionTypeRequiredDescription
namestringYesPartition name
cpus_per_nodeintYesCPU cores per node
memory_mbintYesMemory per node in MB
max_walltime_secsintYesMaximum walltime in seconds
gpus_per_nodeintNoGPUs per node
gpu_typestringNoGPU model (e.g., "H100")
sharedboolNoWhether partition supports shared jobs
min_nodesintNoMinimum required nodes
requires_explicit_requestboolNoMust be explicitly requested

Resource Matching Algorithm

When generating schedulers, Torc uses this algorithm to match resource requirements to partitions:

  1. Filter by resources: Partitions must satisfy:

    • CPUs >= required CPUs
    • Memory >= required memory
    • GPUs >= required GPUs (if specified)
    • Max walltime >= required runtime
  2. Exclude debug partitions: Unless no other partition matches

  3. Prefer best fit:

    • Partitions that exactly match resource needs
    • Non-shared partitions over shared
    • Shorter max walltime over longer
  4. Handle special requirements:

    • GPU jobs only match GPU partitions
    • Respect requires_explicit_request flag
    • Honor min_nodes constraints

Generated Scheduler Format

Example generated Slurm scheduler:

slurm_schedulers:
  - name: medium_scheduler
    account: myproject
    nodes: 1
    mem: 64g
    walltime: 04:00:00
    gres: null
    partition: null  # Let Slurm choose based on resources

Corresponding workflow action:

actions:
  - trigger_type: on_workflow_start
    action_type: schedule_nodes
    scheduler: medium_scheduler
    scheduler_type: slurm
    num_allocations: 1

Runtime Format Parsing

Resource requirements use ISO 8601 duration format for runtime:

FormatExampleMeaning
PTnHPT4H4 hours
PTnMPT30M30 minutes
PTnSPT3600S3600 seconds
PTnHnMPT2H30M2 hours 30 minutes
PnDTnHP1DT12H1 day 12 hours

Generated walltime uses HH:MM:SS format (e.g., 04:00:00).


See Also