Memory settings for Spark standalone cluster

Memory settings for Spark standalone cluster

The Spark standalone cluster means it is not running on Mesos or YARN cluster managers.
There are multiple memory parameter settings and this article will explain each of them by showing real cases.
Firstly we need to understand the Spark components.

1. Master and Worker Daemon processes

Master and Worker are components of the Spark's standalone cluster manager. They manage available resources in the cluster.
For example on Master Node, we can find below JAVA process which is Master Daemon process:
-Xms2g -Xmx2g org.apache.spark.deploy.master.Master
Example of Worker Daemon process on Worker Node:
-Xms1g -Xmx1g org.apache.spark.deploy.worker.Worker spark://<IP of master>:7077
Above JVM settings are controlled by SPARK_DAEMON_MEMORY in spark-env.sh.
In this example, "n1a" is the Master Node host name, "n4a" is the Worker Node host name.
[root@n1a conf]# grep -i SPARK_DAEMON_MEMORY spark-env.sh
export SPARK_DAEMON_MEMORY=2g
[root@n4a conf]# grep -i SPARK_DAEMON_MEMORY spark-env.sh
export SPARK_DAEMON_MEMORY=1g

2. Driver and Executor

When running Spark applications, new processes will be created which means applications run as independent sets of processes.
The application is coordinated by the SparkContext object in your main program -- Driver.
Driver will ask Master for resources, Master then allocates Workers to this application, and Worker will start Executors, which are processes that run computations and store data for your application.

The Driver memory is controlled by "SPARK_DRIVER_MEMORY" in spark-env.sh , or "spark.driver.memory" in spark-defaults.conf, or by specifying "--driver-memory" in application.
The Executor memory is controlled by "SPARK_EXECUTOR_MEMORY" in spark-env.sh , or "spark.executor.memory" in spark-defaults.conf or by specifying "--executor-memory" in application.

For example, if I am running a spark-shell using below parameter:
spark-shell --executor-memory 123m --driver-memory 456m
On Master Node, you can find out below Driver process is created for this process:
-Xms456m -Xmx456m org.apache.spark.deploy.SparkSubmit
On Worker Node, below Executor process is created:
-Xms123M -Xmx123M org.apache.spark.executor.CoarseGrainedExecutorBackend

3. Total memory for all Spark applications per server

As we know, each application's Executor usage is controlled by SPARK_EXECUTOR_MEMORY.
Total memory limit for all applications per server is controlled by "SPARK_WORKER_MEMORY" in spark-env.sh.
[root@n1a conf]# grep SPARK_WORKER_MEMORY spark-env.sh
# - SPARK_WORKER_MEMORY, to set how much total memory workers have to give executors (e.g. 1000m, 2g)
export SPARK_WORKER_MEMORY=3g

4. Troubleshooting

Jobs may fail with error message "Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory". It is because of no sufficient memory for the request.
For example, if we ask for 4G executor memory, it will fail because the total SPARK_WORKER_MEMORY is set to 3g in above example.
[root@n1a conf]#  spark-shell --executor-memory 4g
scala> val textFile = sc.textFile("passwd")          
textFile: org.apache.spark.rdd.RDD[String] = passwd MappedRDD[1] at textFile at <console>:12

scala> textFile.count()           
14/11/25 17:49:33 WARN LoadSnappy: Snappy native library is available
14/11/25 17:49:48 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/11/25 17:50:03 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory
14/11/25 17:50:18 WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

Commentaires

Posts les plus consultés de ce blog

Controlling Parallelism in Spark by controlling the input partitions by controlling the input partitions

Spark performance optimization: shuffle tuning

Spark optimization