How to expose Spark metrics in Prometheus format

sparkctl can configure Spark’s built-in PrometheusServlet so that Spark’s internal metrics (JVM, scheduler, shuffle, executor, and task metrics) are exposed in Prometheus format. This complements resource monitoring, which captures host-level CPU, memory, disk, and network utilization.

The servlet reuses the existing web UI ports, so no additional ports are opened.

Enable Prometheus metrics

$ sparkctl configure --prometheus

This writes a metrics.properties file into the cluster’s conf directory (Spark loads it automatically) and sets spark.ui.prometheus.enabled true.

Scrape endpoints

Component

Endpoint

Master

http://<master>:8080/metrics/master/prometheus

Worker

http://<worker>:8081/metrics/prometheus

Driver / application

http://<driver>:4040/metrics/executors/prometheus

Point a Prometheus scraper at these endpoints, or fetch them directly with curl for a quick look:

$ curl http://localhost:4040/metrics/executors/prometheus

Tip

Combine this with the reverse proxy to reach the worker and application endpoints through the master node on an HPC cluster.

Write metrics to CSV files

The Prometheus sink is pull-based: it only exposes metrics over HTTP and keeps nothing on disk, so the data is gone once the cluster shuts down. To keep a durable record, enable the CSV sink, which periodically writes one CSV file per metric:

$ sparkctl configure --metrics-csv

This writes the metrics to <base>/metrics-csv (alongside stats-output) and survives cluster teardown, which fits the ephemeral-allocation model better than relying on a live Prometheus scraper. Change the sampling interval with --metrics-csv-period (seconds):

$ sparkctl configure --metrics-csv --metrics-csv-period 30

The two sinks are independent and can be combined; they share a single metrics.properties:

$ sparkctl configure --prometheus --metrics-csv