Flink Development Platform: Apache Zeppelin — (Part 1). Get Started

5 min readMar 30, 2020

Introduction

In Apache Zeppelin 0.9, we redesign flink interpreter to support the latest version of Flink. Now only Flink 1.10+ is supported in Zeppelin, old version of Flink won’t work. I will write a series of blogs about how to use flink on Zeppelin, this is Part 1 which is about how to set up flink on Zeppelin and run a basic wordcount program in 3 different execution modes.

Prerequisites

Download 0.9.0-SNAPSHOT from this google driver link (Because when I write this blog, 0.9.0 is not still released yet, it is close to be released)
Download Flink 1.10 for scala 2.11 (Only scala-2.11 is supported, scala-2.12 is not supported yet in Zeppelin)
Copy flink-python_2.11–1.10.0.jar from flink opt folder to flink lib folder (it is used by pyflink which is supported in Zeppelin)

Start Zeppelin

After you download zeppelin, you can run the following command to start zeppelin

bin/zeppelin-daemon.sh start

By default, zeppelin server listens at address localhost:8080, so you can open your browser and go to http://localhost:8080, then you will see Zeppelin home page as following

By default, Zeppelin listens at address 127.0.0.1, so you can not access it remotely. If you want to access it remotely, you need to make following changes in zeppelin-site.xml.

Copy conf/zeppelin-site.xml.template to conf/zeppelin-site.xml
Change property zeppelin.server.addrto be 0.0.0.0
Restart zeppelin by running command: bin/zeppelin-daemon.sh restart

And you can change zeppelin.server.port to another port if 8080 is used by other processes.

Configure Flink Interpreter

The next thing is go to interpreter setting page, and configure flink interpreter. There’re many properties that you can set in flink interpreter, but now we only want to run wordcount in local mode, so the only property you need to set is FLINK_HOME . You need to set it to the location where your flink 1.10 is installed.

Here’s the screenshot of my setting.

Running WordCount

After you setup everything above, you can open the tutorial note Flink Basic which is included in Zeppelin to run wordcount to verify whether flink interpreter can work on Zeppelin properly. Here’s 2 screenshots of running wordcount in batch and streaming mode.

You can not only see the flink job output in Zeppelin frontend, but also there’s one job link on the top right which point to the flink job url on flink web ui.

These 2 examples are written in scala. In Zeppelin you don’t need to create the entry point of flink program (ExecutionEnvironment, StreamExecutionEnvironment, BatchTableEnvironment, StreamTableEnvironment). Zeppelin will create them for users (users can use benv and senvdirectly in these 2 examples). Actually flink interperter will create a scala shell internally and create these entry point variables for you.

Supported Interpreters

Apache Flink is supported in Zeppelin with flink interpreter group which consists of below five interpreters.

That means you can write 3 main languages in flink in Zeppelin

Scala
Python (PyFlink)
SQL (Both batch & stream)

Configuration

The flink interpreter can be configured with properties as following. But you can also set other flink properties which are not listed in the table. For all the available flink properties, refer to this link.

Execution Mode

Flink in Zeppelin supports 3 execution modes (flink.exeuction.mode):

Local
Remote
Yarn

Run Flink in Local Mode

Running Flink in local mode will start a MiniCluster in local JVM. By default, the local MiniCluster use port 8081, so make sure this port is available in your machine, otherwise you can configure rest.port to specify another port (To be noticed, standalone flink cluster will use 8081 in job manager, so if you happen to have a single node standalone cluster in your machine, you will fail to run local mode due to port conflicts).

You can also specify local.number-taskmanager and flink.tm.slot to customize the number of TM and number of slots per TM, because by default it is only 4 TM with 1 slot for local mode which may not be enough for some cases.

Run Flink in Remote Mode

Running Flink in remote mode will connect to an existing flink cluster (it could be standalone cluster or yarn session cluster).

Besides specifying flink.execution.mode to be remote . You also need to specify flink.execution.remote.host and flink.execution.remote.port (job manager rest port) to point to flink job manager.

Run Flink in Yarn Mode

In order to run Flink in Yarn mode, you need to make the following settings:

Set flink.execution.mode to yarn
Set HADOOP_CONF_DIR in flink interpreter setting.
Make sure hadoop command is your PATH. Because internally flink will call command hadoop classpath and load all the hadoop related jars in the flink interpreter process.

StreamExecutionEnvironment, ExecutionEnvironment, StreamTableEnvironment, BatchTableEnvironment

Zeppelin will create 6 variables as flink scala (%flink) entry points:

senv (StreamExecutionEnvironment),
benv (ExecutionEnvironment)
stenv (StreamTableEnvironment for blink planner)
btenv (BatchTableEnvironment for blink planner)
stenv_2 (StreamTableEnvironment for flink planner)
btenv_2 (BatchTableEnvironment for flink planner)

And will create 6 variables as pyflink (%flink.pyflink or %flink.ipyflink) entry points:

s_env (StreamExecutionEnvironment),
b_env (ExecutionEnvironment)
st_env (StreamTableEnvironment for blink planner)
bt_env (BatchTableEnvironment for blink planner)
st_env_2 (StreamTableEnvironment for flink planner)
bt_env_2 (BatchTableEnvironment for flink planner)

Blink/Flink Planner

There’re 2 planners supported by flink’s table api: flink & blink.

If you want to use DataSet api, and convert it to flink table then please use flink planner (btenv_2 and stenv_2).
In other cases, we would always recommend you to use blink planner. This is also what flink batch/streaming sql interpreter use (%flink.bsql & %flink.ssql)

Check this link for the details about these 2 planners.

Summary

OK, this is about how to set up flink on Zeppelin and run basic word count program. Besides this I also make a series of videos to show you how to do that, you can check them on this youtube link.

Zeppelin community still try to improve and evolve the whole user experience of Flink on Zeppelin , you can join Zeppelin slack to discuss with community. http://zeppelin.apache.org/community.html#slack-channel