# How to Select Compute Nodes

Selecting the right compute nodes is critical for Spark cluster performance, especially for jobs
that perform large shuffles (joins, aggregations, sorting).

## Storage Speed Requirements

Spark shuffle operations write intermediate data to local storage. If this storage is slow,
shuffles become a bottleneck. The minimum requirement is approximately **500 MB/s** sequential
write speed. Drives that can deliver **2 GB/s** are ideal.

## Storage Options (Best to Worst)

| Storage Type | Speed | Capacity | Notes |
|--------------|-------|----------|-------|
| NVMe SSD | 2-7 GB/s | 1-10 TB | Best choice for shuffle-heavy workloads |
| RAM disk (`/dev/shm`) | 10+ GB/s | Limited by RAM | Excellent speed but limited capacity |
| HPC shared filesystem (Lustre, GPFS) | 500 MB/s - 2 GB/s | Unlimited | Shared bandwidth; slower than local SSD |
| Spinning disk (HDD) | 100-200 MB/s | Large | Too slow for shuffle-heavy workloads |

## Recommendations by Workload

### Shuffle-Heavy Workloads (Joins, Group-Bys)
- Prefer nodes with local NVMe storage
- Configure sparkctl to use the NVMe path for shuffle storage
- Allocate more nodes if shuffle data exceeds local SSD capacity

### Read-Heavy Workloads (Scans, Filters)
- Storage speed is less critical
- Focus on memory capacity and network bandwidth
- Shared filesystem storage is often acceptable

### Small Data / Prototyping
- RAM disk (`/dev/shm`) works well
- Many HPC compute nodes have `/dev/shm` available (typically half of RAM)
- Fastest option but limited by memory

## Configuring Shuffle Storage Location

Specify local storage for shuffle writes (for systems where sparkctl knows how to identify
local storage such as NLR's Kestrel cluster):

```console
$ sparkctl configure --local-storage
```

Or for RAM disk (or other custom location accessible on each compute node):

```console
$ sparkctl configure --spark-scratch /dev/shm
```

## Capacity Planning

If your shuffle data exceeds local storage capacity, you have two options:

1. **Add more nodes**: Distributes shuffle data across more local SSDs
2. **Use shared filesystem**: Configure `--spark-scratch` to point to Lustre/GPFS (slower but unlimited capacity)

## Checking Node Storage

On most HPCs, you can check available local storage during an interactive session:

```console
$ df -h /dev/nvme0
$ df -h /dev/shm
```

Consult your HPC documentation for node-specific storage configurations.

## Network Considerations

Shuffle operations also transfer data between nodes. Ensure your nodes have:
- High-bandwidth, low-latency interconnect
- Nodes in the same rack or network segment if possible