Learn Spark on Zeppelin in Docker container
Apache Zeppelin is web-based notebook that enables data-driven,
interactive data analytics and collaborative documents with SQL, Scala, Python, R and more.
This article is to guide you how to play Spark on Zeppelin in docker container without any manual setting.
Step 1. Download Spark
Zeppelin (0.10.0) supports many versions of Spark (from 1.6 to 3.1), you can download whatever version you’d like to use from apache spark site.
Step 2. Start Zeppelin in Docker
Then run the following command to start Zeppelin in Docker container:
docker run -u $(id -u) -p 8080:8080 -p 4040:4040 -p 6789:6789 --rm -v ${spark_location}:/opt/spark -e SPARK_HOME=/opt/spark -e ZEPPELIN_LOCAL_IP=0.0.0.0 --name zeppelin apache/zeppelin:0.10.0
Just replace ${spark_location} with the Spark location that you downloaded in step 1.
Then open http://localhost:8080, you can see the following page. You can also run the above command on another machine, and access Zeppelin remotely via url like http://my-zeppelin-server:8080
Zeppelin is shipped with some built-in tutorial notes. Open folder Spark Tutorial
, you can see 8 Spark tutorial notes.
Here’s one short video which demonstrate how to Play Spark in Zeppelin docker container. You can run most of them directly without any setting. In the following video, I showed you how to run the following 3 tutorial note.
- 2. Spark Basic Features
- 3. Spark SQL (PySpark)
- 5. SparkR Basics
- 6. SparkR Shiny App
Besides that, there’re more features in Zeppelin’s Spark interpreter, for more details, please check out Zeppelin’s official document. https://zeppelin.apache.org/docs/0.10.0/interpreter/spark.html, and Join our community here. http://zeppelin.apache.org/community.html