NiFi Terminology Basics

This post is going to cover the basic terminology that you’ll need to know for working with NiFi.

Canvas

The Canvas is the free grid space on which you create your Flow. It is accessed through the NiFi Web UI. You build Flows by dragging Processors on to the Canvas.

Controller Service

A Controller Service is a encapsulation of functionality that is consumed or used by a Flow or Processor and does not operate independently. The same Controller Service can be shared by multiple Processors/Flows. An example is the AvroRecordSetWriter, which encapsulates the functionality for writing Records with a given Avro schema, but does not do anything on it’s own - it must be consumed by a supporting Record Processor.

Flow

A Flow is the combination of Processors, Controller Services, Funnels, Ports, etc. that are connected through their relationships to move and/or process some data. A Flow is built and viewed through the NiFi Web UI on the Canvas.

FlowFile

A FlowFile is the unit of ‘data’ in NiFi. It includes both the content of the data and it’s associated attributes (metadata). FlowFiles are created, and consumed, by Processors, which can read and/or modify either the content or the attributes.

FlowFile Attribute

A FlowFile Attribute is a single piece of metadata attached to a FlowFile. These Attributes are stored in memory and are intended for smaller pieces of information that describes the data, rather than the data itself.

FlowFile Content

FlowFile Content is used for the underlying data that the Flow is operating on. It can be any kind of data, such as textual JSON data or binary data like images and videos. FlowFile content can contain very large pieces of data that is not appropriate to keep in FlowFile Attributes. FlowFile Content is kept on disk.

Input Port

An Input Port is used to provide an input relationship to a Process Group, allowing for Processors outside of the Process Group to send FlowFiles to Processors inside the Process Group. Input Ports can be local or remote. A local Input Port can only be referenced by Process Groups that run in the same NiFi cluster as the Input Port and can attach directly in the Flow. A remote Input Port allows for Remote Process Groups to connect to the Input Port over a network, without needing to be directly connected inside the Flow or running in the same NiFi cluster.

NiFi Registry

NiFi Registry is a companion project of NiFi that provides Git-like version control for NiFi flows. It runs as a seperate service to NiFi, providing it’s own Web UI for management. NiFi connects to a NiFi Registry service through a Registry Client configured through the NiFi Web UI.

Output Port

An Output Port is used to provide an output relationship from a Process Group, allowing Processors inside the Process Group to send FlowFiles to Processors outside of the Process Group.

Parameter

A Parameter is a Process Group level variable that is statically configured inside a Parameter Context. Parameters can optionally contain sensitive values. Parameters are referenced by name inside a Processor’s configuration, using the #{name} syntax. A Paramater’s value is evaluated at the time the referencing Processor is started, and the Processor(s) must be stopped before the Parameter’s value can be changed.

Parameter Context

A Paramater Context is a container for one or more Parameters. A Parameter Context can be attached to one or more Process Groups, but a Process Group may only have one Parameter Context. A Processor can only reference Parameters in the Parameter Context that is attached to it’s Process Group. This is the preferred method for static variables in NiFi, with the Variable Registry being considered depricated.

Process Group

A Process Group is a nested container for Flows. Process Groups can be though of as similar to Directories or Folders in a File System. Each Process Group gets it’s own Canvas. The top level Process Group is known as the Root Process Group, which is the first Canvas presented to the user when accessing the NiFi Web UI. Process Groups can be used to organise Flows in to smaller areas of focus.

Processor

A Processor is the core functional component in NiFi. Each Processor provides a specific piece of functionality, typically operating on either the Content or Attributes of a FlowFile. Processors can optionally accept input FlowFiles. Processors always produce an output that can be consumed by other Processors. A Processor can have one or more output Relationships that can be connected to other processors.

Remote Process Group

A Remote Process Group is similar in concept to a Process Group; It is a logical container around a Flow. However, a Remote Process Group does not have to be running on the local NiFi cluster. It can be used to pass FlowFiles to a Flow that is running on an different, external NiFi cluster over a network. The Remote Process Group must have a Remote Input Port that allows external input.

Root Canvas

The Root Canvas is the first Canvas that is presented to the user when accessing the NiFi Web UI. It is the Canvas of the Root Process Group.

Variable Registry

The Variable Registry is a container for one of more static variables that are set outside of a Flow. They cannot be modified by Processors, and their value is evaluated at the time of a Processor being started. The Variable Registry is considered deprecated and Parameters should be used instead.