Apache Spark 1.2.0 on Docker
http://blog.sequenceiq.com/blog/2015/01/09/spark-1-2-0-docker/In this current post we’d like to help you to start with the
latest - 1.2.0Spark release in minutes – using Docker. Though we have released and pushed the container between the holidays into the official Docker repository, we were still due with the post. Here are the details …
Docker and Spark are two technologies which are very
hypedthese days. At SequenceIQ we use both quite a lot, thus we put together a Docker container and sharing it with the community.
The container’s code is available in our GitHub repository.
Pull the image from Docker RepositoryWe suggest to always pull the container from the official Docker repository – as this is always maintained and supported by us.
Building the imageAlternatively you can always build your own container based on our Dockerfile.
Running the imageOnce you have pulled or built the container, you are ready to start with Spark.
TestingIn order to check whether everything is OK, you can run one of the stock examples, coming with Spark. Check our previous blog posts and examples about Spark here and here.
There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
1 2 3 4 5 6
Estimating Pi (yarn-cluster mode):
Estimating Pi (yarn-client mode):
1 2 3 4
1 2 3 4