Flink Development Platform: Apache Zeppelin (Part-5) Recovering

Jeff Zhang
2 min readJul 30, 2020

--

In the last few posts, I talked about how to develop Flink job (via table api, sql, udf) in Zeppelin. And I also mentioned how to keep Flink job running even when Zeppelin server is down. But there’s still one pitfall, the Flink job link on the top right of paragraph is missing if zeppelin is restarted, and the realtime dashboard (as following) won’t work after Zeppelin is restarted even when the Flink job is still running. This is because the connection between Zeppelin Server and Flink job is broken. Fortunately, in the latest Zeppelin 0.9 (preview2), we fix it via Recovering.

How to enable recovery

It is pretty easy to do that, there’re 2 things you need to do.

Step 1. Configure 2 properties in zeppelin-site.xml

<property>
<name>zeppelin.recovery.storage.class</name>
<value>org.apache.zeppelin.interpreter.recovery.LocalRecoveryStorage</value>
<description>RecoveryStorage implementation based on java native local filesystem</description>
</property>
<property>
<name>zeppelin.recovery.dir</name>
<value>recovery</value>
<description>Location where recovery metadata is stored</description>
</property>

By default zeppelin.recovery.storage.class is org.apache.zeppelin.interpreter.recovery.NullRecoveryStorage which means recovery is disabled. By setting it as org.apache.zeppelin.interpreter.recovery.LocalRecoveryStorage , Zeppelin will store interpreter process metadata into local files. You can also set it as org.apache.zeppelin.interpreter.recovery.FileSystemRecoveryStorage which use store recovery metadata in hdfs. zeppelin.recovery.dir is used for configuring the folder where the interpreter process metadata is stored.

Step 2. Enable hadoop support (optional)

If you are using FileSystemRecoveryStroage, then you need to enable hadoop. Zeppelin 0.9 doesn’t ship with hadoop jars, instead you need to include it by yourself via the following 2 steps

  • set USE_HADOOP to be true in zeppelin-env.sh
export USE_HADOOP=true
  • Make sure hadoop command is on your PATH. Because internally zeppelin will run command hadoop classpath to get all the hadoop jars and put them on the classpath of zeppelin server.

Verify Recovering

Once you made the above configuration, you need to restart zeppelin to make them take effect. Now your interpreter process will keep running even when you stop Zeppelin server, and after you restart Zeppelin, you will see the running paragraphs is recovered. Here’s one screenshot where I use flink interpreter, you will notice that the flink job will be recovered after I restart zeppelin server.

Summary

Zeppelin community still try to improve and evolve the whole user experience of Flink on Zeppelin , you can join Zeppelin slack to discuss with community. http://zeppelin.apache.org/community.html#slack-channel

Besides this I also make a series of videos to show you how to do that, you can check them on this youtube link.

References

--

--

Jeff Zhang

Apache Member, Open source veteran, Big Data, Data Science,