Tutorials¶
These tutorials guide you through running Spark jobs on HPC clusters using sparkctl. Each tutorial covers a different workflow - choose the one that best fits your needs.
Which Tutorial Should I Use?¶
Tutorial |
Best For |
Client Install |
Interface |
|---|---|---|---|
Programmatic control, automation scripts |
|
Python |
|
Exploratory analysis, debugging |
|
Python REPL |
|
Ibis users, portable dataframe code |
|
Python |
|
Lightweight client, remote connectivity |
|
CLI |
|
Traditional Spark users, production jobs |
|
CLI |
Decision Guide¶
Start here if you’re new to sparkctl: Python Library - this uses sparkctl’s managed cluster to handle the Spark lifecycle automatically.
Choose by use case:
“I want to control the cluster from Python code” → Python Library
“I want to explore data interactively” → Interactive Development
“I use Ibis and want portable dataframe code” → Ibis
“I want a minimal client installation” → Spark Connect CLI
“I want to submit batch jobs” → spark-submit / pyspark