How to expose Spark metrics in Prometheus format¶
sparkctl can configure Spark’s built-in PrometheusServlet so that Spark’s internal metrics
(JVM, scheduler, shuffle, executor, and task metrics) are exposed in Prometheus format. This
complements resource monitoring, which captures host-level CPU, memory,
disk, and network utilization.
The servlet reuses the existing web UI ports, so no additional ports are opened.
Enable Prometheus metrics¶
$ sparkctl configure --prometheus
This writes a metrics.properties file into the cluster’s conf directory (Spark loads it
automatically) and sets spark.ui.prometheus.enabled true.
Scrape endpoints¶
Component |
Endpoint |
|---|---|
Master |
|
Worker |
|
Driver / application |
|
Point a Prometheus scraper at these endpoints, or fetch them directly with curl for a quick look:
$ curl http://localhost:4040/metrics/executors/prometheus
Tip
Combine this with the reverse proxy to reach the worker and application endpoints through the master node on an HPC cluster.
Write metrics to CSV files¶
The Prometheus sink is pull-based: it only exposes metrics over HTTP and keeps nothing on disk, so the data is gone once the cluster shuts down. To keep a durable record, enable the CSV sink, which periodically writes one CSV file per metric:
$ sparkctl configure --metrics-csv
This writes the metrics to <base>/metrics-csv (alongside stats-output) and survives cluster
teardown, which fits the ephemeral-allocation model better than relying on a live Prometheus
scraper. Change the sampling interval with --metrics-csv-period (seconds):
$ sparkctl configure --metrics-csv --metrics-csv-period 30
The two sinks are independent and can be combined; they share a single metrics.properties:
$ sparkctl configure --prometheus --metrics-csv