Skip to main content

Source Configuration

These settings define parameters for source systems integrated into your Data Vault. While you can create distinct configurations for each source, it's often practical to use a single set of settings for multiple similar systems (e.g., several SAP instances) or for a standardized staging area. The core of the source configuration is an array of these settings stored in source_system_settings.yaml. Each item in this array is an object defined by a unique URN, which acts as a key to identify and associate the settings with the relevant source object(s) in your data vault model.

Properties

The table below provides a comprehensive overview of each property available in the source_system_settings.yaml file, including its expected data type and a detailed description of its purpose.

Property NameExpected TypeDescription
(URN Key itself)stringA unique identifier for the source system configuration. Must follow the pattern: urn:s2v:source_setting:<NAME>.
hashkey_escape_charstringThe character used to escape the hashkey_delimiter if the delimiter character itself appears within a source value. This prevents misinterpretation during hash key concatenation.
empty_value_is_nullbooleanSpecifies if empty string values from the source should be treated as NULL values in the Data Vault. Set to true to normalize empty strings to NULL
trim_whitespacesboleanSpecifies if leading and trailing whitespaces should be trimmed from source column values before processing. Set to true to ensure data consistency.
load_timestamp_column_namestringThe name of the column in the source system that represents the load timestamp.
cdc_flag_column_namestringThe name of the column in the source system that acts as a CDC flag (e.g., indicating insert, update, delete).
cdc_value_mappingobjectAn object that defines the mapping between CDC operation types (insert, update, delete) and their corresponding values found in the source system's cdc_flag_column_name.
cdc_value_mapping.insertstringThe value in the source's CDC flag column that indicates an insert operation.
cdc_value_mapping.updatestringThe value in the source's CDC flag column that indicates an update operation. Can also be "[delete_value]+[insert_value]" for delete+insert logic.
cdc_value_mapping.deletestringThe value in the source's CDC flag column that indicates a delete operation.
cdc_window_thresholdintegerIndicates a time window in seconds that needs to pass to consider a record as a true delete. This is useful for handling soft deletes or asynchronous CDC processes where an update might be quickly followed by a delete.

Examples

Below is an example of a source_system_settings.yaml file with common configurations. You can adapt these settings to match your specific Data Vault environment and requirements.

source_system_settings:
# Source configuration
- urn:s2v:source_setting:SAP:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: 'LOAD_DATE'
cdc_flag_column_name: 'CDC_FLAG'
cdc_value_mapping:
insert: 'I'
update: 'D+I'
delete: 'D'
cdc_window_threshold: 1
# Source configuration
- urn:s2v:source_setting:SALESFORCE:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: 'CDC_LOAD_TIMESTAMP'
cdc_flag_column_name: 'CDC_FLAG'
cdc_value_mapping:
insert: 'INSERT'
update: 'INSERT'
delete: 'DELETE'
cdc_window_threshold: 1
# Additional source configurations can be added here...
# ...