Learn Spark on Zeppelin in Docker container

Jeff Zhang
2 min readOct 25, 2021

--

Apache Zeppelin is web-based notebook that enables data-driven,
interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.

This article is to guide you how to play Spark on Zeppelin in docker container without any manual setting.

Step 1. Download Spark

Zeppelin (0.10.0) supports many versions of Spark (from 1.6 to 3.1), you can download whatever version you’d like to use from apache spark site.

Step 2. Start Zeppelin in Docker

Then run the following command to start Zeppelin in Docker container:

docker run -u $(id -u) -p 8080:8080 -p 4040:4040 -p 6789:6789 --rm -v ${spark_location}:/opt/spark -e SPARK_HOME=/opt/spark -e ZEPPELIN_LOCAL_IP=0.0.0.0 --name zeppelin apache/zeppelin:0.10.0

Just replace ${spark_location} with the Spark location that you downloaded in step 1.

Then open http://localhost:8080, you can see the following page. You can also run the above command on another machine, and access Zeppelin remotely via url like http://my-zeppelin-server:8080

Zeppelin is shipped with some built-in tutorial notes. Open folder Spark Tutorial , you can see 8 Spark tutorial notes.

Here’s one short video which demonstrate how to Play Spark in Zeppelin docker container. You can run most of them directly without any setting. In the following video, I showed you how to run the following 3 tutorial note.

  • 2. Spark Basic Features
  • 3. Spark SQL (PySpark)
  • 5. SparkR Basics
  • 6. SparkR Shiny App

Besides that, there’re more features in Zeppelin’s Spark interpreter, for more details, please check out Zeppelin’s official document. https://zeppelin.apache.org/docs/0.10.0/interpreter/spark.html, and Join our community here. http://zeppelin.apache.org/community.html

--

--

Jeff Zhang

Apache Member, Open source veteran, Big Data, Data Science,