NiFi
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Table-Level Lineage | ✅ | Supported. See docs for limitations | 
Concept Mapping
| Source Concept | DataHub Concept | Notes | 
|---|---|---|
| "Nifi" | Data Platform | |
| Nifi flow | Data Flow | |
| Nifi Ingress / Egress Processor | Data Job | |
| Nifi Remote Port | Data Job | |
| Nifi Port with remote connections | Dataset | |
| Nifi Process Group | Container | Subtype Process Group | 
Caveats
- This plugin extracts the lineage information between external datasets and ingress/egress processors by analyzing provenance events. Please check your Nifi configuration to confirm max rentention period of provenance events and make sure that ingestion runs frequent enough to read provenance events before they are disappear. 
- Limited ingress/egress processors are supported - S3: ListS3,FetchS3Object,PutS3Object
- SFTP: ListSFTP,FetchSFTP,GetSFTP,PutSFTP
 
- S3: 
CLI based Ingestion
Install the Plugin
The nifi source works out of the box with acryl-datahub.
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
  type: "nifi"
  config:
    # Coordinates
    site_url: "https://localhost:8443/nifi/"
    # Credentials
    auth: SINGLE_USER
    username: admin
    password: password
sink:
  # sink configs
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| site_url ✅ string | URL for Nifi, ending with /nifi/. e.g. https://mynifi.domain/nifi/ | 
| auth Enum | Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT, KERBEROS Default: NO_AUTH | 
| ca_file One of boolean, string | Path to PEM file containing certs for the root CA(s) for the NiFi.Set to False to disable SSL verification. | 
| client_cert_file string | Path to PEM file containing the public certificates for the user/client identity, must be set for auth = "CLIENT_CERT" | 
| client_key_file string | Path to PEM file containing the client’s secret key | 
| client_key_password string | The password to decrypt the client_key_file | 
| emit_process_group_as_container boolean | Whether to emit Nifi process groups as container entities. Default: False | 
| incremental_lineage boolean | When enabled, emits incremental/patch lineage for Nifi processors. When disabled, re-states lineage on each run. Default: True | 
| password string | Nifi password, must be set for auth = "SINGLE_USER" | 
| provenance_days integer | time window to analyze provenance events for external datasets Default: 7 | 
| site_name string | Site name to identify this site with, useful when using input and output ports receiving remote connections Default: default | 
| site_url_to_site_name map(str,string) | |
| username string | Nifi username, must be set for auth = "SINGLE_USER" | 
| env string | The environment that all assets produced by this connector belong to Default: PROD | 
| process_group_pattern AllowDenyPattern | regex patterns for filtering process groups Default: {'allow': ['.*'], 'deny': [], 'ignoreCase': True} | 
| process_group_pattern.ignoreCase boolean | Whether to ignore case sensitivity during pattern matching. Default: True | 
| process_group_pattern.allow array | List of regex patterns to include in ingestion Default: ['.*'] | 
| process_group_pattern.allow.string string | |
| process_group_pattern.deny array | List of regex patterns to exclude from ingestion. Default: [] | 
| process_group_pattern.deny.string string | 
The JSONSchema for this configuration is inlined below.
{
  "title": "NifiSourceConfig",
  "description": "Any source that produces dataset urns in a single environment should inherit this class",
  "type": "object",
  "properties": {
    "env": {
      "title": "Env",
      "description": "The environment that all assets produced by this connector belong to",
      "default": "PROD",
      "type": "string"
    },
    "site_url": {
      "title": "Site Url",
      "description": "URL for Nifi, ending with /nifi/. e.g. https://mynifi.domain/nifi/",
      "type": "string"
    },
    "auth": {
      "description": "Nifi authentication. must be one of : NO_AUTH, SINGLE_USER, CLIENT_CERT, KERBEROS",
      "default": "NO_AUTH",
      "allOf": [
        {
          "$ref": "#/definitions/NifiAuthType"
        }
      ]
    },
    "provenance_days": {
      "title": "Provenance Days",
      "description": "time window to analyze provenance events for external datasets",
      "default": 7,
      "type": "integer"
    },
    "process_group_pattern": {
      "title": "Process Group Pattern",
      "description": "regex patterns for filtering process groups",
      "default": {
        "allow": [
          ".*"
        ],
        "deny": [],
        "ignoreCase": true
      },
      "allOf": [
        {
          "$ref": "#/definitions/AllowDenyPattern"
        }
      ]
    },
    "site_name": {
      "title": "Site Name",
      "description": "Site name to identify this site with, useful when using input and output ports receiving remote connections",
      "default": "default",
      "type": "string"
    },
    "site_url_to_site_name": {
      "title": "Site Url To Site Name",
      "description": "Lookup to find site_name for site_url ending with /nifi/, required if using remote process groups in nifi flow",
      "default": {},
      "type": "object",
      "additionalProperties": {
        "type": "string"
      }
    },
    "username": {
      "title": "Username",
      "description": "Nifi username, must be set for auth = \"SINGLE_USER\"",
      "type": "string"
    },
    "password": {
      "title": "Password",
      "description": "Nifi password, must be set for auth = \"SINGLE_USER\"",
      "type": "string"
    },
    "client_cert_file": {
      "title": "Client Cert File",
      "description": "Path to PEM file containing the public certificates for the user/client identity, must be set for auth = \"CLIENT_CERT\"",
      "type": "string"
    },
    "client_key_file": {
      "title": "Client Key File",
      "description": "Path to PEM file containing the client\u2019s secret key",
      "type": "string"
    },
    "client_key_password": {
      "title": "Client Key Password",
      "description": "The password to decrypt the client_key_file",
      "type": "string"
    },
    "ca_file": {
      "title": "Ca File",
      "description": "Path to PEM file containing certs for the root CA(s) for the NiFi.Set to False to disable SSL verification.",
      "anyOf": [
        {
          "type": "boolean"
        },
        {
          "type": "string"
        }
      ]
    },
    "emit_process_group_as_container": {
      "title": "Emit Process Group As Container",
      "description": "Whether to emit Nifi process groups as container entities.",
      "default": false,
      "type": "boolean"
    },
    "incremental_lineage": {
      "title": "Incremental Lineage",
      "description": "When enabled, emits incremental/patch lineage for Nifi processors. When disabled, re-states lineage on each run.",
      "default": true,
      "type": "boolean"
    }
  },
  "required": [
    "site_url"
  ],
  "additionalProperties": false,
  "definitions": {
    "NifiAuthType": {
      "title": "NifiAuthType",
      "description": "An enumeration.",
      "enum": [
        "NO_AUTH",
        "SINGLE_USER",
        "CLIENT_CERT",
        "KERBEROS",
        "BASIC_AUTH"
      ]
    },
    "AllowDenyPattern": {
      "title": "AllowDenyPattern",
      "description": "A class to store allow deny regexes",
      "type": "object",
      "properties": {
        "allow": {
          "title": "Allow",
          "description": "List of regex patterns to include in ingestion",
          "default": [
            ".*"
          ],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "deny": {
          "title": "Deny",
          "description": "List of regex patterns to exclude from ingestion.",
          "default": [],
          "type": "array",
          "items": {
            "type": "string"
          }
        },
        "ignoreCase": {
          "title": "Ignorecase",
          "description": "Whether to ignore case sensitivity during pattern matching.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    }
  }
}
Authentication
This connector supports following authentication mechanisms
Single User Authentication (auth: SINGLE_USER)
Connector will pass this username and password as used on Nifi Login Page over /access/token REST endpoint. This mode also works when Kerberos login identity provider is set up for Nifi.
Client Certificates Authentication (auth: CLIENT_CERT)
Connector will use client_cert_file(required) and client_key_file(optional), client_key_password(optional) for mutual TLS authentication. 
Kerberos Authentication via SPNEGO (auth: Kerberos)
If nifi has been configured to use Kerberos SPNEGO, connector will pass user’s Kerberos ticket to nifi over  /access/kerberos REST endpoint. It is assumed that user's Kerberos ticket is already present on the machine on which ingestion runs. This is usually done by installing krb5-user and then running kinit for user.
sudo apt install krb5-user
kinit user@REALM
Basic Authentication (auth: BASIC_AUTH)
Connector will use HTTPBasicAuth with username and password.
No Authentication (auth: NO_AUTH)
This is useful for testing purposes.
Access Policies
This connector requires following access policies to be set in Nifi for ingestion user.
Global Access Policies
| Policy | Privilege | Resource | Action | 
|---|---|---|---|
| view the UI | Allows users to view the UI | /flow | R | 
| query provenance | Allows users to submit a Provenance Search and request Event Lineage | /provenance | R | 
Component level Access Policies (required to be set on root process group)
| Policy | Privilege | Resource | Action | 
|---|---|---|---|
| view the component | Allows users to view component configuration details | /<component-type>/<component-UUID> | R | 
| view the data | Allows users to view metadata and content for this component in flowfile queues in outbound connections and through provenance events | /data/<component-type>/<component-UUID> | R | 
| view provenance | Allows users to view provenance events generated by this component | /provenance-data/<component-type>/<component-UUID> | R | 
Code Coordinates
- Class Name: datahub.ingestion.source.nifi.NifiSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for NiFi, feel free to ping us on our Slack.