Learn Spark on Zeppelin in Docker container

2 min readOct 25, 2021

Apache Zeppelin is web-based notebook that enables data-driven,
interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.

This article is to guide you how to play Spark on Zeppelin in docker container without any manual setting.

Step 1. Download Spark

Zeppelin (0.10.0) supports many versions of Spark (from 1.6 to 3.1), you can download whatever version you’d like to use from apache spark site.

Step 2. Start Zeppelin in Docker

Then run the following command to start Zeppelin in Docker container:

docker run -u $(id -u) -p 8080:8080 -p 4040:4040 -p 6789:6789 --rm -v ${spark_location}:/opt/spark -e SPARK_HOME=/opt/spark -e ZEPPELIN_LOCAL_IP=0.0.0.0 --name zeppelin apache/zeppelin:0.10.0

Just replace ${spark_location} with the Spark location that you downloaded in step 1.

Then open http://localhost:8080, you can see the following page. You can also run the above command on another machine, and access Zeppelin remotely via url like http://my-zeppelin-server:8080

Zeppelin is shipped with some built-in tutorial notes. Open folder Spark Tutorial , you can see 8 Spark tutorial notes.

Here’s one short video which demonstrate how to Play Spark in Zeppelin docker container. You can run most of them directly without any setting. In the following video, I showed you how to run the following 3 tutorial note.

2. Spark Basic Features
3. Spark SQL (PySpark)
5. SparkR Basics
6. SparkR Shiny App

Besides that, there’re more features in Zeppelin’s Spark interpreter, for more details, please check out Zeppelin’s official document. https://zeppelin.apache.org/docs/0.10.0/interpreter/spark.html, and Join our community here. http://zeppelin.apache.org/community.html

Learn Spark on Zeppelin in Docker container

Step 1. Download Spark

Step 2. Start Zeppelin in Docker

Written by Jeff Zhang

No responses yet