Tutorials

These tutorials guide you through running Spark jobs on HPC clusters using sparkctl. Each tutorial covers a different workflow - choose the one that best fits your needs.

Which Tutorial Should I Use?

Tutorial

Best For

Client Install

Interface

Python Library

Programmatic control, automation scripts

sparkctl

Python

Interactive Development

Exploratory analysis, debugging

sparkctl

Python REPL

Ibis

Ibis users, portable dataframe code

sparkctl + ibis-framework[pyspark]

Python

Spark Connect CLI

Lightweight client, remote connectivity

sparkctl

CLI

spark-submit / pyspark

Traditional Spark users, production jobs

sparkctl[pyspark] (full)

CLI

Decision Guide

Start here if you’re new to sparkctl: Python Library - this uses sparkctl’s managed cluster to handle the Spark lifecycle automatically.

Choose by use case: