Entity Source Definitions
S2V defines data sources for Data Vault objects using specific YAML structures. These structures vary by the Data Vault object type. This document details the common patterns:
entity_source
(singular): For objects with a single source definition. This itself has variations in how the source is keyed.entity_sources
(plural): For objects that can be fed by one or more source definitions (presented as a list of URN-keyed maps).
Common Properties within an Entity Source Definition
Regardless of whether it's a single entity_source
or an item within an entity_sources
list, each source definition will contain a set of common properties that describe how to access and process the data from that particular source.
Property | Type | Description |
---|---|---|
entity_source (Tuple) | Tuple | Specifies the physical location of this source's data. Format: (SOURCE_DATABASE, SOURCE_SCHEMA, SOURCE_TABLE) or (SOURCE_SCHEMA, SOURCE_TABLE) . This property is present in all source definition structures, though its placement varies slightly. |
source_filter | String | An SQL WHERE clause condition to filter data from this source. Example: "STATUS = 'ACTIVE'" |
use_source_cdc_flag | Boolean | Indicates whether this source provides Change Data Capture (CDC) information (e.g., insert/update/delete flags). |
source_system_configuration_urn | String | A URN linking to a predefined source system configuration in your source_system_settings.yaml file. Example: 'urn:s2v:source_setting:erp_config' |
business_key_mapping | List of Maps | Defines how the parent entity's business key(s) are populated from this source. See Business Key & Lookup Mappings. |
source_business_key | String | Defines business key values originating from distinct source systems, addressing multi-master scenarios. Property is required only together with business_key_mapping . |
lookup_mapping | Map | (Alternative to business_key_mapping for some entities) Defines how business keys are resolved via a lookup. See Business Key & Lookup Mappings. |
connected_hub_relations | List of Maps | (For Links/NHLs) Defines how the business keys of connected Hubs are sourced, using business_key_mapping or lookup_mapping . See Business Key & Lookup Mappings. |
Single Source Objects (entity_source
)
This top-level property is used when a Data Vault object is fed by a single conceptual source. The way this single source is defined can vary by the Data Vault object type.
1. Tuple-keyed
This structure is used when the single source is directly identified by its database, schema, and table tuple as the main key.
Used by:
- Hub Satellites (
hubsat
) - References (
reference
)
Structure:
# Example for a Hub Satellite
entity_source:
(<SOURCE_DATABASE>, <SOURCE_SCHEMA>, <SOURCE_TABLE>): # Key: ( DB, Schema, Table)
source_filter: ''
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:<YOUR_SETTING_NAME>'
business_key_mapping:
- <TARGET_BK_COLUMN_NAME>:
- '<SOURCE_COLUMN>'
source_business_key: ''
Key Components:
- Source Location Tuple: Source of the data.
- Format:
(SOURCE_DATABASE, SOURCE_SCHEMA, SOURCE_TABLE)
or(SOURCE_SCHEMA, SOURCE_TABLE)
.
- Format:
- Properties: Configuration for this source, including:
source_filter
use_source_cdc_flag
source_system_configuration_urn
business_key_mapping
orlookup_mapping
Example (Hub Satellite):
name: 'SAT_CUSTOMER'
entity_type: 'hubsat'
connected_entity: 'HUB_CUSTOMER'
entity_source:
(SAP_SE, KNA1): # Tuple-key identifying the source
source_filter: ''
use_source_cdc_flag: false
source_system_configuration_urn: 'urn:s2v:source_setting:SAP'
business_key_mapping:
- CUSTOMER_ID: # Hub's business key name
- 'KUNNR' # Source column for the hub's business key
source_business_key: '' # for multi-master scenarios
2. URN-keyed entity_source
This structure is used when a Data Vault object has a single source definition, but this definition is encapsulated within a map where the key is a URN. This resembles a single item from the entity_sources
list (see below).
Used by:
- Non-Historized Links (
non_historized_link
)
Structure:
# Example for a Non-Historized Link
entity_source:
urn:s2v:link_source:<YOUR_SOURCE_NAME>: # Key: URN for the single source
entity_source: '<SOURCE_DATABASE>, <SOURCE_SCHEMA>, <SOURCE_TABLE>' # tuple for source location
source_filter: ''
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:<YOUR_SETTING_NAME>'
connected_hub_relations:
# ... hub relations and their business key mappings ..
- <HUB_ALIAS>: # Connected hub alias
business_key_mapping:
- <TARGET_BK_COLUMN_NAME>:
- '<SOURCE_COLUMN>'
.
# ... other properties like non_historized_columns, historized_columns
Key Components:
- URN Key: URN (e.g.,
urn:s2v:link_source:my_event_source
). - Properties:
entity_source
source_filter
use_source_cdc_flag
source_system_configuration_urn
connected_hub_relations
Example (Non-Historized Link):
name: 'EVENT'
entity_type: 'non_historized_link'
enable_refresh: true
connected_hubs:
- HUB_USER: 'HUB_USER'
- HUB_EVENT: 'HUB_EVENT'
entity_source:
urn:s2v:link_source:src_1: # URN Key
entity_source: '(SOURCE_DATA, CLICK_EVENTS)' # source location tuple
source_filter: ''
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:yaml_interface'
connected_hub_relations:
- HUB_EVENT:
business_key_mapping:
- EVENT_BK:
- 'EVENT_ID'
source_business_key: ''
- HUB_USER:
business_key_mapping:
- USER_BK:
- 'USER_ID'
source_business_key: ''
# non_historized_columns, historized_columns etc. would follow at the same level as entity_source
Multi Source Objects (entity_sources
)
This structure is used for Data Vault objects that can be fed by one or more source definitions. Each source definition is an item in a list, and each item is a map keyed by a unique URN.
Used by:
- Hubs (
hub
) - Regular Links (
link
)
Structure (for each item in the entity_sources
list):
# Example for one source definition within a Hub or Link's entity_sources list
- urn:s2v:<object_type>_source:<YOUR_SOURCE_NAME>: # Key: URN for this specific source
entity_source: '<SOURCE_DATABASE>, <SOURCE_SCHEMA>, <SOURCE_TABLE>' # tuple for source location
source_filter: ''
use_source_cdc_flag: false
source_system_configuration_urn: 'urn:s2v:source_setting:<YOUR_SETTING_NAME>'
# Object-specific properties follow, e.g.:
# For Hubs:
business_key_mapping:
- <TARGET_BK_COLUMN_NAME>:
- '<SOURCE_COLUMN>'
source_business_key: '' # for multi-master scenarios
# For Links:
connected_hub_relations:
- <HUB_ALIAS>:
business_key_mapping:
- <TARGET_BK_COLUMN_NAME>:
- '<SOURCE_COLUMN>'
source_business_key: ''
Key Components (for each source definition in the list):
- Source URN Identifier (Map Key): A unique URN that identifies this specific source definition. The URN typically includes a prefix indicating the Data Vault object type (e.g.,
urn:s2v:hub_source:my_erp_source
,urn:s2v:link_source:sap_orders
). entity_source
(Tuple)source_filter
use_source_cdc_flag
source_system_configuration_urn
- Object-Specific Properties:
- For Hubs:
business_key_mapping
source_business_key
- For Links:
connected_hub_relations
- For Hubs:
Example (Hub with multiple sources):
name: 'HUB_PRODUCT'
entity_type: 'hub'
enable_refresh: true
concatenate_business_keys: false
requires_source_business_key: yes
target_business_key_columns:
- 'PRODUCT_BK'
entity_sources:
- urn:s2v:hub_source:erp_system:
entity_source: '(ERP_DB, PRODUCTS_SCHEMA, PRODUCT_MASTER)'
source_filter: ''
source_system_configuration_urn: 'urn:s2v:source_setting:erp_config'
use_source_cdc_flag: false
business_key_mapping:
- PRODUCT_BK:
- 'SKU'
source_business_key: 'ERP_DB'
- urn:s2v:hub_source:legacy_system:
entity_source: '(LEGACY_DB, dbo, ITEMS)'
source_filter: "STATUS = 'ACTIVE'"
source_system_configuration_urn: 'urn:s2v:source_setting:legacy_config'
use_source_cdc_flag: true
business_key_mapping:
- PRODUCT_BK:
- 'ITEM_CODE'
source_business_key: 'LEGACY_SYSTEM_IDENTIFIER'
You can include the satellite name in the source URN to directly associate it with its corresponding hub or link, since the satellite often serves as their source. This makes navigating the model easier.
Example (Regular Link):
# ... (other link properties like name, entity_type, enable_refresh ...)
entity_sources:
- urn:s2v:link_source:SAP_SD:
entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)'
source_filter: "ORDER_TYPE = 'ZOR'"
source_system_configuration_urn: 'urn:s2v:source_setting:SAP'
use_source_cdc_flag: false
connected_hub_relations:
- HUB_CUSTOMER:
business_key_mapping:
- CUSTOMER_BK:
- 'KUNNR'
source_business_key: ''
- HUB_MATERIAL:
business_key_mapping:
- MATERIAL_BK:
- 'MATNR'
source_business_key: ''
# Potentially another source for the same Link could be added here
# - 'urn:s2v:link_source:CRM_ORDERS':
# ...
Important Note on Link Satellites:
Link Satellites (linksat
) do not have their own entity_source
or entity_sources
property. They inherit their source context directly from their parent Link.