Skip to main content

Entity Source Definitions

S2V defines data sources for Data Vault objects using specific YAML structures. These structures vary by the Data Vault object type. This document details the common patterns:

  • entity_source (singular): For objects with a single source definition. This itself has variations in how the source is keyed.
  • entity_sources (plural): For objects that can be fed by one or more source definitions (presented as a list of URN-keyed maps).

Common Properties within an Entity Source Definition

Regardless of whether it's a single entity_source or an item within an entity_sources list, each source definition will contain a set of common properties that describe how to access and process the data from that particular source.

PropertyTypeDescription
entity_source (Tuple)TupleSpecifies the physical location of this source's data. Format: (SOURCE_DATABASE, SOURCE_SCHEMA, SOURCE_TABLE) or (SOURCE_SCHEMA, SOURCE_TABLE). This property is present in all source definition structures, though its placement varies slightly.
source_filterStringAn SQL WHERE clause condition to filter data from this source. Example: "STATUS = 'ACTIVE'"
use_source_cdc_flagBooleanIndicates whether this source provides Change Data Capture (CDC) information (e.g., insert/update/delete flags).
source_system_configuration_urnStringA URN linking to a predefined source system configuration in your source_system_settings.yaml file. Example: 'urn:s2v:source_setting:erp_config'
business_key_mappingList of MapsDefines how the parent entity's business key(s) are populated from this source. See Business Key & Lookup Mappings.
source_business_keyStringDefines business key values originating from distinct source systems, addressing multi-master scenarios. Property is required only together with business_key_mapping.
lookup_mappingMap(Alternative to business_key_mapping for some entities) Defines how business keys are resolved via a lookup. See Business Key & Lookup Mappings.
connected_hub_relationsList of Maps(For Links/NHLs) Defines how the business keys of connected Hubs are sourced, using business_key_mapping or lookup_mapping. See Business Key & Lookup Mappings.

Single Source Objects (entity_source)

This top-level property is used when a Data Vault object is fed by a single conceptual source. The way this single source is defined can vary by the Data Vault object type.

1. Tuple-keyed

This structure is used when the single source is directly identified by its database, schema, and table tuple as the main key.

Used by:

  • Hub Satellites (hubsat)
  • References (reference)

Structure:

# Example for a Hub Satellite
entity_source:
(<SOURCE_DATABASE>, <SOURCE_SCHEMA>, <SOURCE_TABLE>): # Key: ( DB, Schema, Table)
source_filter: ''
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:<YOUR_SETTING_NAME>'
business_key_mapping:
- <TARGET_BK_COLUMN_NAME>:
- '<SOURCE_COLUMN>'
source_business_key: ''

Key Components:

  • Source Location Tuple: Source of the data.
    • Format: (SOURCE_DATABASE, SOURCE_SCHEMA, SOURCE_TABLE) or (SOURCE_SCHEMA, SOURCE_TABLE).
  • Properties: Configuration for this source, including:
    • source_filter
    • use_source_cdc_flag
    • source_system_configuration_urn
    • business_key_mapping or lookup_mapping

Example (Hub Satellite):

name: 'SAT_CUSTOMER'
entity_type: 'hubsat'
connected_entity: 'HUB_CUSTOMER'
entity_source:
(SAP_SE, KNA1): # Tuple-key identifying the source
source_filter: ''
use_source_cdc_flag: false
source_system_configuration_urn: 'urn:s2v:source_setting:SAP'
business_key_mapping:
- CUSTOMER_ID: # Hub's business key name
- 'KUNNR' # Source column for the hub's business key
source_business_key: '' # for multi-master scenarios

2. URN-keyed entity_source

This structure is used when a Data Vault object has a single source definition, but this definition is encapsulated within a map where the key is a URN. This resembles a single item from the entity_sources list (see below).

Used by:

  • Non-Historized Links (non_historized_link)

Structure:

# Example for a Non-Historized Link
entity_source:
urn:s2v:link_source:<YOUR_SOURCE_NAME>: # Key: URN for the single source
entity_source: '<SOURCE_DATABASE>, <SOURCE_SCHEMA>, <SOURCE_TABLE>' # tuple for source location
source_filter: ''
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:<YOUR_SETTING_NAME>'
connected_hub_relations:
# ... hub relations and their business key mappings ..
- <HUB_ALIAS>: # Connected hub alias
business_key_mapping:
- <TARGET_BK_COLUMN_NAME>:
- '<SOURCE_COLUMN>'
.
# ... other properties like non_historized_columns, historized_columns

Key Components:

  • URN Key: URN (e.g., urn:s2v:link_source:my_event_source).
  • Properties:
    • entity_source
    • source_filter
    • use_source_cdc_flag
    • source_system_configuration_urn
    • connected_hub_relations

Example (Non-Historized Link):

name: 'EVENT'
entity_type: 'non_historized_link'
enable_refresh: true

connected_hubs:
- HUB_USER: 'HUB_USER'
- HUB_EVENT: 'HUB_EVENT'

entity_source:
urn:s2v:link_source:src_1: # URN Key
entity_source: '(SOURCE_DATA, CLICK_EVENTS)' # source location tuple
source_filter: ''
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:yaml_interface'
connected_hub_relations:
- HUB_EVENT:
business_key_mapping:
- EVENT_BK:
- 'EVENT_ID'
source_business_key: ''
- HUB_USER:
business_key_mapping:
- USER_BK:
- 'USER_ID'
source_business_key: ''
# non_historized_columns, historized_columns etc. would follow at the same level as entity_source

Multi Source Objects (entity_sources)

This structure is used for Data Vault objects that can be fed by one or more source definitions. Each source definition is an item in a list, and each item is a map keyed by a unique URN.

Used by:

  • Hubs (hub)
  • Regular Links (link)

Structure (for each item in the entity_sources list):

# Example for one source definition within a Hub or Link's entity_sources list
- urn:s2v:<object_type>_source:<YOUR_SOURCE_NAME>: # Key: URN for this specific source
entity_source: '<SOURCE_DATABASE>, <SOURCE_SCHEMA>, <SOURCE_TABLE>' # tuple for source location
source_filter: ''
use_source_cdc_flag: false
source_system_configuration_urn: 'urn:s2v:source_setting:<YOUR_SETTING_NAME>'
# Object-specific properties follow, e.g.:
# For Hubs:
business_key_mapping:
- <TARGET_BK_COLUMN_NAME>:
- '<SOURCE_COLUMN>'
source_business_key: '' # for multi-master scenarios
# For Links:
connected_hub_relations:
- <HUB_ALIAS>:
business_key_mapping:
- <TARGET_BK_COLUMN_NAME>:
- '<SOURCE_COLUMN>'
source_business_key: ''

Key Components (for each source definition in the list):

  • Source URN Identifier (Map Key): A unique URN that identifies this specific source definition. The URN typically includes a prefix indicating the Data Vault object type (e.g., urn:s2v:hub_source:my_erp_source, urn:s2v:link_source:sap_orders).
  • entity_source (Tuple)
  • source_filter
  • use_source_cdc_flag
  • source_system_configuration_urn
  • Object-Specific Properties:
    • For Hubs:
      • business_key_mapping
      • source_business_key
    • For Links:
      • connected_hub_relations

Example (Hub with multiple sources):

name: 'HUB_PRODUCT'
entity_type: 'hub'
enable_refresh: true
concatenate_business_keys: false
requires_source_business_key: yes
target_business_key_columns:
- 'PRODUCT_BK'
entity_sources:
- urn:s2v:hub_source:erp_system:
entity_source: '(ERP_DB, PRODUCTS_SCHEMA, PRODUCT_MASTER)'
source_filter: ''
source_system_configuration_urn: 'urn:s2v:source_setting:erp_config'
use_source_cdc_flag: false
business_key_mapping:
- PRODUCT_BK:
- 'SKU'
source_business_key: 'ERP_DB'
- urn:s2v:hub_source:legacy_system:
entity_source: '(LEGACY_DB, dbo, ITEMS)'
source_filter: "STATUS = 'ACTIVE'"
source_system_configuration_urn: 'urn:s2v:source_setting:legacy_config'
use_source_cdc_flag: true
business_key_mapping:
- PRODUCT_BK:
- 'ITEM_CODE'
source_business_key: 'LEGACY_SYSTEM_IDENTIFIER'
tip

You can include the satellite name in the source URN to directly associate it with its corresponding hub or link, since the satellite often serves as their source. This makes navigating the model easier.

Example (Regular Link):

# ... (other link properties like name, entity_type, enable_refresh ...)
entity_sources:
- urn:s2v:link_source:SAP_SD:
entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)'
source_filter: "ORDER_TYPE = 'ZOR'"
source_system_configuration_urn: 'urn:s2v:source_setting:SAP'
use_source_cdc_flag: false
connected_hub_relations:
- HUB_CUSTOMER:
business_key_mapping:
- CUSTOMER_BK:
- 'KUNNR'
source_business_key: ''
- HUB_MATERIAL:
business_key_mapping:
- MATERIAL_BK:
- 'MATNR'
source_business_key: ''
# Potentially another source for the same Link could be added here
# - 'urn:s2v:link_source:CRM_ORDERS':
# ...
info

Important Note on Link Satellites:

Link Satellites (linksat) do not have their own entity_source or entity_sources property. They inherit their source context directly from their parent Link.