Apache spark graphite metrics
Send metrics from apache spark to graphite
Writing down a quick note on how to enable graphite metrics for apache spark. And where it will continue automatically after a restart.
It will not rely in spark-submit, that all examples seems to do.
Configuration file
This is the complete configuration you need in your metrics.properties. This file can be named anything, but to follow the standard javaprograms use it is named properties, for the ini-style configfiles.
*.sink.Graphite.class=org.apache.spark.metrics.sink.GraphiteSink
*.sink.Graphite.host=metrics.internal.network
*.sink.Graphite.port=2003
*.sink.Graphite.prefix=services.spark.hostname
*.sink.Graphite.period=10
*.sink.Graphite.unit=seconds
*.sink.Graphite.protocol=tcp
Note: This configuration file is case sensitive “Graphite is not the same as graphite”
Starting up
There is a file named spark-env.sh
that is sourced if available when spark
starts.
Here you want to add this line.
SPARK_DAEMON_JAVA_OPTS='-Dspark.metrics.conf=/opt/etc/spark/metrics.properties'
My file have smoe more properties and then it looks like this:
SPARK_DAEMON_JAVA_OPTS='-Dspark.metrics.conf=/opt/etc/spark/metrics.properties -Dspark.deploy.zookeeper.url=zookeeper01:2181,zookeeper02:2181,zookeeper03:2181 -Dspark.deploy.recoveryMode=ZOOKEEPER'
Since it talks to a zookeeper cluster for leader election.
Graphite prefix
I use a prefix standard in my graphite that is:
hostname.metric.submetric
And for services I use
services.service.metric
This separates hosts and services nicely.
Under the *.sink.Graphite.prefix you supply there will be a subfolder named “master” or “worker” depending on the spark role you are starting.