InstallationΒΆ
Create a virtual environment with Python 3.11 or later. These examples create a virtual environment in your home directory.
If you are running on an HPC, you may need to
module load pythonfirst.This uses the
venvmodule in the standard library. You may prefercondaormamba.$ python -m venv ~/python-envs/sparkctl
uv creates the environment quickly and can install the requested Python version for you.
$ uv venv --python 3.11 ~/python-envs/sparkctl
Activate the virtual environment.
$ source ~/python-envs/sparkctl/bin/activate
Whenever you are done using sparkctl, you can deactivate the environment by running
deactivate.Install the Python package
sparkctl.If you will be using Spark Connect to run Spark jobs, the base installation is sufficient.
Note
This does not include spark-submit or pyspark.
$ pip install sparkctl
$ uv pip install sparkctl
If you will be running Spark jobs with
spark-submitorpyspark, you will need to install the fullpysparkpackage:$ pip install "sparkctl[pyspark]"
$ uv pip install "sparkctl[pyspark]"
Tip
If you only need the
sparkctlcommand-line tool (and not the Python API), you can install it as a standalone, isolated tool with uv. This does not require creating or activating a virtual environment:$ uv tool install sparkctl
Optional, install from the main branch (or substitute another branch or tag).
$ pip install git+https://github.com/NatLabRockies/sparkctl.git@main
$ uv pip install git+https://github.com/NatLabRockies/sparkctl.git@main
Create a one-time sparkctl default configuration file. The parameters will vary based on your environment. If no one has deployed the required dependencies in your environment, please refer to Deploy sparkctl in an HPC environment.
$ sparkctl default-config \ /datasets/images/apache_spark/spark-4.1.1-bin-hadoop3 \ /datasets/images/apache_spark/jdk-21.0.7 \ --compute-environment slurm
Wrote sparkctl settings to /Users/dthom/.sparkctl.tomlRefer to
sparkctl default-config --helpfor additional options.The paths to the Spark binaries will likely not change often. This file will also seed the default values for your
sparkctl configurecommands, and so you may want to manually edit those settings.