Regular Link Properties
Regular links in Data Vault capture the relationships between hubs, which represent core business entities. A link typically connects two or more hubs and can derive information from one or more source systems. Let's explore the structure of a link and its fundamental components with practical examples.
Link Properties
Property Name | Type | Description |
---|---|---|
name | String | The unique name for the Link object. This name will be used for the generated table and in references by other objects. Example: 'SALES_ORDER' |
entity_type | String | Must be set to 'link' to define this object as a Link. |
enable_refresh | Boolean | if true , link receives new data. Set to false if the source contains not-changing (historical) data. See Shared Properties. |
connected_hubs | List of Key-Value pairs | Defines the list of Hubs that this Link connects, specifying the relationship between them. Each item maps an alias to a Hub name. |
entity_sources | List of Maps | Defines the list of sources that feed data into this Link. Each item in the list represents a distinct source, following the structure outlined in Entity Sources Definition. |
Link Specific Properties
connected_hubs
- Type:
List of Key-Value pairs
- Description: This property defines the hubs a link connects, specified as a list of key-value pairs.
- Key: Represents the hub's alias, or the specific name of the relationship. This is necessary because a link can connect to the same hub multiple times, requiring unique aliases to distinguish each connection. The hub alias must be unique within the connected_hubs list.
- Value: Represents the actual name of the hub, which must refer to an existing hub object.
Example: Link connecting to each hub exactly once
connected_hubs:
- HUB_CUSTOMER: 'HUB_CUSTOMER' # <HUB ALIAS>: <HUB NAME>
- HUB_MATERIAL: 'HUB_MATERIAL'
Example: Link connecting to one hub multiple times
connected_hubs:
- HUB_CUSTOMER_BILL_TO: 'HUB_CUSTOMER'
- HUB_CUSTOMER_DELIVER_TO: 'HUB_CUSTOMER'
- HUB_MATERIAL: 'HUB_MATERIAL'
In this scenario, unique aliases (HUB_CUSTOMER_BILL_TO, HUB_CUSTOMER_DELIVER_TO)
are used to distinguish different types of customer relationships, even though they all point to the HUB_CUSTOMER
hub.
A single link cannot connect multiple times to the same source table. If a link's source contains various relationships, consider extending the connected_hubs
definition. For more details, please refer to the FAQ: "How do I model link sources?"
Simple Example
This example illustrates a SALES_ORDER link, which establishes a relationship between the HUB_CUSTOMER and HUB_MATERIAL hubs. The link's data originates from a single source, urn:s2v:link_source:src_1
.
# 1. Defines name, entity type and refresh mode
name: 'SALES_ORDER'
entity_type: 'link'
enable_refresh: true
# 2. Defines the hubs that this link connects
connected_hubs:
- HUB_CUSTOMER: 'HUB_CUSTOMER'
- HUB_MATERIAL: 'HUB_MATERIAL'
# 3. Defines the list of sources that feeds the link
entity_sources:
- urn:s2v:link_source:src_1:
entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)'
source_filter: ''
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:SAP'
connected_hub_relations:
- HUB_CUSTOMER:
business_key_mapping:
- CUSTOMER_BK:
- 'KUNNR'
source_business_key: ''
- HUB_MATERIAL:
business_key_mapping:
- MATERIAL_BK:
- 'MATNR'
source_business_key: ''
Don't forget to ensure that the business key name defined in the business_key_mapping
matches the business key name used in the Hub it connects (HUB_CUSTOMER
has the business key CUSTOMER_BK
and HUB_MATERIAL
has the business key MATERIAL_BK
).
Comprehensive Examples
Multi-Source Link Example
This example demonstrates a link that combines data from two distinct source systems: SAP_SD
and SFDC_SALES
. Both sources contribute to the same SALES_ORDER
link, connecting HUB_CUSTOMER
and HUB_MATERIAL
.
name: 'SALES_ORDER'
entity_type: 'link'
enable_refresh: true
connected_hubs:
- HUB_CUSTOMER: 'HUB_CUSTOMER'
- HUB_MATERIAL: 'HUB_MATERIAL'
entity_sources:
# Source 1: SAP Sales Data
- urn:s2v:link_source:SAP_SD:
entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)'
source_filter: "ORDER_TYPE = 'ZOR'" # Example filter for specific order types
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:SAP'
connected_hub_relations:
- HUB_CUSTOMER:
business_key_mapping:
- CUSTOMER_BK:
- 'KUNNR'
source_business_key: ''
- HUB_MATERIAL:
business_key_mapping:
- MATERIAL_BK:
- 'MATNR'
source_business_key: ''
# Source 2: Salesforce Sales Data
- urn:s2v:link_source:SFDC_SALES:
entity_source: '(SOURCE_SFDC, OPPORTUNITY_LINE_ITEMS)'
source_filter: "IS_WON = TRUE" # Example filter for won opportunities
use_source_cdc_flag: false
source_system_configuration_urn: 'urn:s2v:source_setting:SFDC'
connected_hub_relations:
- HUB_CUSTOMER:
business_key_mapping:
- CUSTOMER_BK:
- 'ACCOUNT_ID'
source_business_key: ''
- HUB_MATERIAL:
business_key_mapping:
- MATERIAL_BK:
- 'PRODUCT_CODE'
source_business_key: ''
Lookup mapping example
This example demonstrates a scenario where the SALES_ORDER
link needs to resolve the business key for HUB_MATERIAL
via a lookup. Instead of having a direct source column for MATERIAL_BK
in its own source (SD_SALES_ORDERS
), it uses MATERIAL_CODE
to join with the source of HUB_MATERIAL
(identified by hub_source_urn: 'urn:s2v:hub_source:SAP_SE'
) on MATERIAL_ID
to obtain the correct business key. This is a common pattern when the link's immediate source data contains foreign keys or codes that need to be resolved against a master data source (the Hub's source in this case).
name: 'SALES_ORDER'
entity_type: 'link'
enable_refresh: true
connected_hubs:
- HUB_CUSTOMER: 'HUB_CUSTOMER'
- HUB_MATERIAL: 'HUB_MATERIAL'
entity_sources:
- urn:s2v:link_source:SAP_SD:
entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)'
source_filter: ''
use_source_cdc_flag: true
source_system_configuration_urn: 'urn:s2v:source_setting:SAP'
connected_hub_relations:
- HUB_CUSTOMER:
business_key_mapping:
- CUSTOMER_BK:
- 'KUNNR'
source_business_key: ''
- HUB_MATERIAL:
lookup_mapping:
hub_source_urn: 'urn:s2v:hub_source:SAP_SE'
entity_source_columns:
- 'MATERIAL_CODE'
hub_source_columns:
- 'MATERIAL_ID'