Zeppelin 0.8.0 New Features

  • Yarn Cluster mode for Spark Interpreter
  • IPython Interpreter
  • Interpreter Lifecycle Manager
  • Hadoop NotebookRepo
  • Hadoop Config Storage
  • Interpreter Recovery
  • Generic ConfInterpreter
  • New SparkInterpreter Implementation

Yarn Cluster Mode for Spark Interpreter

Before 0.8.0, Zeppelin only support yarn client mode for Spark Interpreter which means the driver would run in the same host of Zeppelin Server. The incur high memory pressure of the Zeppelin Server host especially when you run Spark Interpreter in isolated mode.

IPython Interpreter

IPython Interpreter is a new interpreter of 0.8.0 with purpose of replacing the old python interpreter of Zeppelin. IPython Interpreter provides comparable user experience like Jupyter Notebook. For the details, you can refer this link which has more details.

Interpreter Lifecycle Manager

Prior 0.8.0, user have to restart interpreter in interpreter setting page or note page to kill the interpreter process. It would cause resource wasting when user get off work but leave the interpreter alive. In 0.8.0, Zeppelin introduce interpreter lifecycle manager which can manage the lifecycle of interpreter, especially on when to terminate the interpreter. For now there’s only one implementation called TimeoutLifecycleManager which would terminate interpreter if it is idle for some threshold which is one hour be default.

<property>
<name>zeppelin.interpreter.lifecyclemanager.class</name>
<value>org.apache.zeppelin.interpreter.lifecycle.TimeoutLifecycleManager</value>
<description>LifecycleManager class for managing the lifecycle of interpreters, by default interpreter will
be closed after timeout</description>
</property>

<property>
<name>zeppelin.interpreter.lifecyclemanager.timeout.checkinterval</name>
<value>60000</value>
<description>milliseconds of the interval to checking whether interpreter is time out</description>
</property>

<property>
<name>zeppelin.interpreter.lifecyclemanager.timeout.threshold</name>
<value>3600000</value>
<description>milliseconds of the interpreter timeout threshold, by default it is 1 hour</description>
</property>

Hadoop NotebookRepo

In 0.8.0, Zeppelin add HDFS as another kind of NotebookRepo option. Storing notes on HDFS give you the more reliability due to hdfs replicas and security due to hadoop security.

  1. Add org.apache.Zeppelin.notebook.repo.FileSystemNotebookRepo to zeppelin-site.xml
<property>
<name>zeppelin.notebook.storage</name>
<value>
org.apache.zeppelin.notebook.repo.FileSystemNotebookRepo</value>
<description>hadoop compatible file system notebook persistence layer implementation</description>
</property>

Hadoop Config Storage

Zeppelin has lots of configuration which is stored in local files prior 0.8.0

  • interpreter.json (This file contains all the interpreter setting info)
  • notebook-authorization.json (This file contains all the note authorization info)
  • credential.json (This file contains the credential info)
  • Set zeppelin.config.storage.class as org.apache.zeppelin.storage.FileSystemConfigStorage
  • Set zeppelin.config.fs.dir to an HDFS path.
  • Also specify HADOOP_CONF_DIR in zeppelin-env.sh so that Zeppelin can find the right hadoop configuration files.

Interpreter Recovery

For now, all the interpreter processes will be shutdown when Zeppelin Server is terminated. This cause inconvenience when doing upgrade or maintenance. It would be nice to have the running interpreter processes alive and be reconnected when Zeppelin server is restarted. This is also a pre-requisites for Zeppelin HA when fallback to a standby Zeppelin Server.

<property>
<name>zeppelin.recovery.storage.class</name>
<value>org.apache.zeppelin.interpreter.recovery.FileSystemRecoveryStorage</value>
<description>RecoveryStorage implementation</description>
</property>

<property>
<name>zeppelin.recovery.dir</name>
<value>recovery</value>
<description>Location where recovery metadata is stored</description>
</property>

Generic ConfInterpreter

Zeppelin’s interpreter setting is shared by all users and notes. If you want to have different setting you have to create new interpreter, e.g. you can create spark_1 with configuration spark.jars=jar1 and spark_2 with configuration spark.jars=jar2 So that spark_1 and spark_2 can use different dependency for different notes or users. But this approach is not so convenient and manageable especially when there are many more users using Zeppelin, the number of interpreters will be exploded.

New Spark Interpreter

A new spark interpreter is added into 0.8.0, this refactor most of the spark interpreter to make it more robust and easy to maintain and extend. You can set zeppelin.spark.isNew to true to enable it. If you find some weird errors in the old spark interpreter, you can give the new spark interpreter a try.

Main Page Improvements

- The home page became customizable.

Note Improvements

- Paragraphs run sequentially in the entire note. In the previous releases paragraphs start run for all interpreters simultaneously.

Code Editor Improvements

- TAB key can be used for auto-complete.

Result Display Improvements

- Angular UI Grid (http://ui-grid.info/) is now used to display the tabulated results. It is powerful, fast and more functional than previous table viewer.

Helium

Helium is a plugin system that can extend Zeppelin a lot. You can add custom visualization or application to note.

JDBC Interpreter

Asynchronous query for metadata in JDBC Interpreter: now you queries are pushed to database immediately, you don’t need to wait the metadata query. You can set lifetime for autocomplete metadata in parameter “default.completer.ttlInSeconds”

SAP BusinessObjects Interpreter

A new interpreter was added: %sap. This interpreter can connect to your SAP Business Objects Platform, create queries over universes. You can use prompts and very complex conditions in “where” clause. Autocomplete help to write the query.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Jeff Zhang

Jeff Zhang

Apache Member, Open source veteran, Big Data, Data Science,