Okta
Important Capabilities
| Capability | Status | Notes | 
|---|---|---|
| Descriptions | ✅ | Optionally enabled via configuration | 
| Detect Deleted Entities | ✅ | Optionally enabled via stateful_ingestion | 
This plugin extracts the following:
- Users
- Groups
- Group Membership
from your Okta instance.
Note that any users ingested from this connector will not be able to log into DataHub unless you have Okta OIDC SSO enabled. You can, however, have these users ingested into DataHub before they log in for the first time if you would like to take actions like adding them to a group or assigning them a role.
For instructions on how to do configure Okta OIDC SSO, please read the documentation here.
Extracting DataHub Users
Usernames
Usernames serve as unique identifiers for users on DataHub. This connector extracts usernames using the "login" field of an Okta User Profile. By default, the 'login' attribute, which contains an email, is parsed to extract the text before the "@" and map that to the DataHub username.
If this is not how you wish to map to DataHub usernames, you can provide a custom mapping using the configurations options detailed below. Namely, okta_profile_to_username_attr
and okta_profile_to_username_regex. e.g. if you want to map emails to urns then you may use the following configuration:
okta_profile_to_username_attr: "email"
okta_profile_to_username_regex: ".*"
Profiles
This connector also extracts basic user profile information from Okta. The following fields of the Okta User Profile are extracted
and mapped to the DataHub CorpUserInfo aspect:
- display name
- first name
- last name
- title
- department
- country code
Extracting DataHub Groups
Group Names
Group names serve as unique identifiers for groups on DataHub. This connector extracts group names using the "name" attribute of an Okta Group Profile. By default, a URL-encoded version of the full group name is used as the unique identifier (CorpGroupKey) and the raw "name" attribute is mapped as the display name that will appear in DataHub's UI.
If this is not how you wish to map to DataHub group names, you can provide a custom mapping using the configurations options detailed below. Namely, okta_profile_to_group_name_attr
and okta_profile_to_group_name_regex.
Profiles
This connector also extracts basic group information from Okta. The following fields of the Okta Group Profile are extracted and mapped to the
DataHub CorpGroupInfo aspect:
- name
- description
Extracting Group Membership
This connector additional extracts the edges between Users and Groups that are stored in Okta. It maps them to the GroupMembership aspect
associated with DataHub users (CorpUsers).
Filtering and Searching
You can also choose to ingest a subset of users or groups to Datahub by adding flags for filtering or searching. For
users, set either the okta_users_filter or okta_users_search flag (only one can be set at a time). For groups, set
either the okta_groups_filter or okta_groups_search flag. Note that these are not regular expressions. See below for full configuration
options.
CLI based Ingestion
Install the Plugin
The okta source works out of the box with acryl-datahub.
Starter Recipe
Check out the following recipe to get started with ingestion! See below for full configuration options.
For general pointers on writing and running a recipe, see our main recipe guide.
source:
  type: okta
  config:
    # Coordinates
    okta_domain: "dev-35531955.okta.com"
    # Credentials
    okta_api_token: "11be4R_M2MzDqXawbTHfKGpKee0kuEOfX1RCQSRx99"
sink:
  # sink configs
Config Details
- Options
- Schema
Note that a . is used to denote nested fields in the YAML recipe.
| Field | Description | 
|---|---|
| okta_api_token ✅ string | An API token generated for the DataHub application inside your Okta Developer Console. e.g. 00be4R_M2MzDqXawbWgfKGpKee0kuEOfX1RCQSRx00 | 
| okta_domain ✅ string | The location of your Okta Domain, without a protocol. Can be found in Okta Developer console. e.g. dev-33231928.okta.com | 
| delay_seconds One of number, integer | Number of seconds to wait between calls to Okta's REST APIs. (Okta rate limits). Defaults to 10ms. Default: 0.01 | 
| include_deprovisioned_users boolean | Whether to ingest users in the DEPROVISIONED state from Okta. Default: False | 
| include_suspended_users boolean | Whether to ingest users in the SUSPENDED state from Okta. Default: False | 
| ingest_group_membership boolean | Whether group membership should be ingested into DataHub. ingest_groups must be True if this is True. Default: True | 
| ingest_groups boolean | Whether groups should be ingested into DataHub. Default: True | 
| ingest_users boolean | Whether users should be ingested into DataHub. Default: True | 
| mask_group_id boolean | Default: True | 
| mask_user_id boolean | Default: True | 
| okta_groups_filter string | Okta filter expression (not regex) for ingesting groups. Only one of okta_groups_filterandokta_groups_searchcan be set. See (https://developer.okta.com/docs/reference/api/groups/#filters) for more info. | 
| okta_groups_search string | Okta search expression (not regex) for ingesting groups. Only one of okta_groups_filterandokta_groups_searchcan be set. See (https://developer.okta.com/docs/reference/api/groups/#list-groups-with-search) for more info. | 
| okta_profile_to_group_name_attr string | Which Okta Group Profile attribute to use as input to DataHub group name mapping. Default: name | 
| okta_profile_to_group_name_regex string | A regex used to parse the DataHub group name from the attribute specified in okta_profile_to_group_name_attr.Default: (.*) | 
| okta_profile_to_username_attr string | Which Okta User Profile attribute to use as input to DataHub username mapping. Common values used are - login, email. Default: email | 
| okta_profile_to_username_regex string | A regex used to parse the DataHub username from the attribute specified in okta_profile_to_username_attr.Default: (.*) | 
| okta_users_filter string | Okta filter expression (not regex) for ingesting users. Only one of okta_users_filterandokta_users_searchcan be set. See (https://developer.okta.com/docs/reference/api/users/#list-users-with-a-filter) for more info. | 
| okta_users_search string | Okta search expression (not regex) for ingesting users. Only one of okta_users_filterandokta_users_searchcan be set. See (https://developer.okta.com/docs/reference/api/users/#list-users-with-search) for more info. | 
| page_size integer | The number of entities requested from Okta's REST APIs in one request. Default: 100 | 
| skip_users_without_a_group boolean | Whether to only ingest users that are members of groups. If this is set to False, all users will be ingested regardless of group membership. Default: False | 
| stateful_ingestion StatefulStaleMetadataRemovalConfig | Okta Stateful Ingestion Config. | 
| stateful_ingestion.enabled boolean | Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or datahub_apiis specified, otherwise FalseDefault: False | 
| stateful_ingestion.remove_stale_metadata boolean | Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled. Default: True | 
The JSONSchema for this configuration is inlined below.
{
  "title": "OktaConfig",
  "description": "Base configuration class for stateful ingestion for source configs to inherit from.",
  "type": "object",
  "properties": {
    "stateful_ingestion": {
      "title": "Stateful Ingestion",
      "description": "Okta Stateful Ingestion Config.",
      "allOf": [
        {
          "$ref": "#/definitions/StatefulStaleMetadataRemovalConfig"
        }
      ]
    },
    "okta_domain": {
      "title": "Okta Domain",
      "description": "The location of your Okta Domain, without a protocol. Can be found in Okta Developer console. e.g. dev-33231928.okta.com",
      "type": "string"
    },
    "okta_api_token": {
      "title": "Okta Api Token",
      "description": "An API token generated for the DataHub application inside your Okta Developer Console. e.g. 00be4R_M2MzDqXawbWgfKGpKee0kuEOfX1RCQSRx00",
      "type": "string"
    },
    "ingest_users": {
      "title": "Ingest Users",
      "description": "Whether users should be ingested into DataHub.",
      "default": true,
      "type": "boolean"
    },
    "ingest_groups": {
      "title": "Ingest Groups",
      "description": "Whether groups should be ingested into DataHub.",
      "default": true,
      "type": "boolean"
    },
    "ingest_group_membership": {
      "title": "Ingest Group Membership",
      "description": "Whether group membership should be ingested into DataHub. ingest_groups must be True if this is True.",
      "default": true,
      "type": "boolean"
    },
    "okta_profile_to_username_attr": {
      "title": "Okta Profile To Username Attr",
      "description": "Which Okta User Profile attribute to use as input to DataHub username mapping. Common values used are - login, email.",
      "default": "email",
      "type": "string"
    },
    "okta_profile_to_username_regex": {
      "title": "Okta Profile To Username Regex",
      "description": "A regex used to parse the DataHub username from the attribute specified in `okta_profile_to_username_attr`.",
      "default": "(.*)",
      "type": "string"
    },
    "okta_profile_to_group_name_attr": {
      "title": "Okta Profile To Group Name Attr",
      "description": "Which Okta Group Profile attribute to use as input to DataHub group name mapping.",
      "default": "name",
      "type": "string"
    },
    "okta_profile_to_group_name_regex": {
      "title": "Okta Profile To Group Name Regex",
      "description": "A regex used to parse the DataHub group name from the attribute specified in `okta_profile_to_group_name_attr`.",
      "default": "(.*)",
      "type": "string"
    },
    "include_deprovisioned_users": {
      "title": "Include Deprovisioned Users",
      "description": "Whether to ingest users in the DEPROVISIONED state from Okta.",
      "default": false,
      "type": "boolean"
    },
    "include_suspended_users": {
      "title": "Include Suspended Users",
      "description": "Whether to ingest users in the SUSPENDED state from Okta.",
      "default": false,
      "type": "boolean"
    },
    "page_size": {
      "title": "Page Size",
      "description": "The number of entities requested from Okta's REST APIs in one request.",
      "default": 100,
      "type": "integer"
    },
    "delay_seconds": {
      "title": "Delay Seconds",
      "description": "Number of seconds to wait between calls to Okta's REST APIs. (Okta rate limits). Defaults to 10ms.",
      "default": 0.01,
      "anyOf": [
        {
          "type": "number"
        },
        {
          "type": "integer"
        }
      ]
    },
    "okta_users_filter": {
      "title": "Okta Users Filter",
      "description": "Okta filter expression (not regex) for ingesting users. Only one of `okta_users_filter` and `okta_users_search` can be set. See (https://developer.okta.com/docs/reference/api/users/#list-users-with-a-filter) for more info.",
      "type": "string"
    },
    "okta_users_search": {
      "title": "Okta Users Search",
      "description": "Okta search expression (not regex) for ingesting users. Only one of `okta_users_filter` and `okta_users_search` can be set. See (https://developer.okta.com/docs/reference/api/users/#list-users-with-search) for more info.",
      "type": "string"
    },
    "okta_groups_filter": {
      "title": "Okta Groups Filter",
      "description": "Okta filter expression (not regex) for ingesting groups. Only one of `okta_groups_filter` and `okta_groups_search` can be set. See (https://developer.okta.com/docs/reference/api/groups/#filters) for more info.",
      "type": "string"
    },
    "okta_groups_search": {
      "title": "Okta Groups Search",
      "description": "Okta search expression (not regex) for ingesting groups. Only one of `okta_groups_filter` and `okta_groups_search` can be set. See (https://developer.okta.com/docs/reference/api/groups/#list-groups-with-search) for more info.",
      "type": "string"
    },
    "skip_users_without_a_group": {
      "title": "Skip Users Without A Group",
      "description": "Whether to only ingest users that are members of groups. If this is set to False, all users will be ingested regardless of group membership.",
      "default": false,
      "type": "boolean"
    },
    "mask_group_id": {
      "title": "Mask Group Id",
      "default": true,
      "type": "boolean"
    },
    "mask_user_id": {
      "title": "Mask User Id",
      "default": true,
      "type": "boolean"
    }
  },
  "required": [
    "okta_domain",
    "okta_api_token"
  ],
  "additionalProperties": false,
  "definitions": {
    "DynamicTypedStateProviderConfig": {
      "title": "DynamicTypedStateProviderConfig",
      "type": "object",
      "properties": {
        "type": {
          "title": "Type",
          "description": "The type of the state provider to use. For DataHub use `datahub`",
          "type": "string"
        },
        "config": {
          "title": "Config",
          "description": "The configuration required for initializing the state provider. Default: The datahub_api config if set at pipeline level. Otherwise, the default DatahubClientConfig. See the defaults (https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/src/datahub/ingestion/graph/client.py#L19).",
          "default": {},
          "type": "object"
        }
      },
      "required": [
        "type"
      ],
      "additionalProperties": false
    },
    "StatefulStaleMetadataRemovalConfig": {
      "title": "StatefulStaleMetadataRemovalConfig",
      "description": "Base specialized config for Stateful Ingestion with stale metadata removal capability.",
      "type": "object",
      "properties": {
        "enabled": {
          "title": "Enabled",
          "description": "Whether or not to enable stateful ingest. Default: True if a pipeline_name is set and either a datahub-rest sink or `datahub_api` is specified, otherwise False",
          "default": false,
          "type": "boolean"
        },
        "remove_stale_metadata": {
          "title": "Remove Stale Metadata",
          "description": "Soft-deletes the entities present in the last successful run but missing in the current run with stateful_ingestion enabled.",
          "default": true,
          "type": "boolean"
        }
      },
      "additionalProperties": false
    }
  }
}
As a prerequisite, you should create a DataHub Application within the Okta Developer Console with full permissions to read your organization's Users and Groups.
Compatibility
Validated against Okta API Versions:
- 2021.07.2- Validated against load: 
- User Count: - 1000
- Group Count: - 100
- Group Membership Edges: - 1000(1 per User)
- Run Time (Wall Clock): - 2min 7sec
Code Coordinates
- Class Name: datahub.ingestion.source.identity.okta.OktaSource
- Browse on GitHub
Questions
If you've got any questions on configuring ingestion for Okta, feel free to ping us on our Slack.