Working with Logs
Torc provides tools for bundling and analyzing workflow logs. These are useful for:
- Sharing logs with colleagues for help debugging
- Archiving completed workflow logs for later reference
- Scanning for errors across all log files at once
Log File Overview
Torc generates several types of log files during workflow execution:
| Log Type | Path Pattern | Contents |
|---|---|---|
| Job stdout | output/job_stdio/job_wf<id>_j<job>_r<run>_a<attempt>.o | Standard output from job commands |
| Job stderr | output/job_stdio/job_wf<id>_j<job>_r<run>_a<attempt>.e | Error output, stack traces |
| Job combined | output/job_stdio/job_wf<id>_j<job>_r<run>_a<attempt>.log | Combined stdout+stderr (combined mode) |
| Job runner | output/job_runner_*.log | Torc job runner internal logs |
| Slurm stdout | output/slurm_output_wf<id>_sl<slurm_id>.o | Slurm job allocation output |
| Slurm stderr | output/slurm_output_wf<id>_sl<slurm_id>.e | Slurm-specific errors |
| Slurm env | output/slurm_env_*.log | Slurm environment variables |
| dmesg | output/dmesg_slurm_*.log | Kernel messages (on failure) |
Note: The file extensions depend on the
stdioconfiguration. Inseparatemode (default), jobs produce.oand.efiles. Incombinedmode, a single.logfile is created. Modes likeno_stdout,no_stderr, ornonesuppress some or all output files. Ifdelete_on_successis enabled, files are removed when a job completes with exit code 0.
For detailed information about log file contents, see Debugging Workflows and Debugging Slurm Workflows.
Bundling Logs
The torc logs bundle command packages all logs for a workflow into a compressed tarball:
# Bundle all logs for a workflow
torc logs bundle <workflow_id>
# Specify custom output directory (where logs are located)
torc logs bundle <workflow_id> --output-dir /path/to/output
# Save bundle to a specific directory
torc logs bundle <workflow_id> --bundle-dir /path/to/bundles
This creates a wf<id>.tar.gz file containing:
- All job stdout/stderr files (
job_wf*_j*_r*.o/e) - Job runner logs (
job_runner_*.log) - Slurm output files (
slurm_output_wf*_sl*.o/e) - Slurm environment logs (
slurm_env_wf*_sl*.log) - dmesg logs (
dmesg_slurm_wf*_sl*.log) - Bundle metadata (workflow info, collection timestamp)
Example: Sharing Logs
# Bundle workflow logs
torc logs bundle 123 --bundle-dir ./bundles
# Share the bundle
ls ./bundles/
# wf123.tar.gz
# Recipient can extract and analyze
tar -xzf wf123.tar.gz
torc logs analyze wf123/
Analyzing Logs
The torc logs analyze command scans log files for known error patterns:
# Analyze a log bundle tarball
torc logs analyze wf123.tar.gz
# Analyze a log directory directly (auto-detects workflow if only one present)
torc logs analyze output/
# Analyze a directory with multiple workflows (specify which one)
torc logs analyze output/ --workflow-id 123
Detected Error Patterns
The analyzer scans for common failure patterns including:
Memory Errors:
- Out of memory, OOM kills
std::bad_alloc(C++)MemoryError(Python)
Slurm Errors:
- Time limit exceeded
- Node failures
- Preemption
GPU/CUDA Errors:
- CUDA out of memory
- GPU memory exceeded
Crashes:
- Segmentation faults
- Bus errors
- Signal kills
Python Errors:
- Tracebacks
- Import errors
File System Errors:
- No space left on device
- Permission denied
Network Errors:
- Connection refused/timed out
Example Output
Log Analysis Results
====================
Analyzing: output/
Files with detected errors:
output/job_stdio/job_wf123_j456_r1_a1.e
Line 42: MemoryError: Unable to allocate 8.00 GiB
Severity: critical
Type: Python Memory Error
output/slurm_output_wf123_sl789.e
Line 15: slurmstepd: error: Detected 1 oom-kill event(s)
Severity: critical
Type: Out of Memory (OOM) Kill
Summary:
Total files scanned: 24
Files with errors: 2
Error types found: MemoryError, OOM Kill
Excluding Files
Environment variable files (slurm_env_*.log) are automatically excluded from error analysis since
they contain configuration data, not error logs.
Workflow: Bundle and Share
A common pattern when asking for help:
# 1. Bundle the workflow logs
torc logs bundle <workflow_id>
# 2. Analyze locally first to understand the issue
torc logs analyze wf<id>.tar.gz
# 3. Share the bundle with your colleague/support
# They can extract and analyze:
tar -xzf wf<id>.tar.gz
torc logs analyze wf<id>/
Related Commands
torc reports results: Generate JSON report with all log file pathstorc results list: View summary table of job return codestorc slurm parse-logs: Parse Slurm logs for error patterns (Slurm-specific)torc slurm sacct: Collect Slurm accounting data
See Also
- Debugging Workflows — General debugging workflow and log file details
- Debugging Slurm Workflows — Slurm-specific debugging tools