Skip to main content
Version: 1.2.14

Source Configuration

These settings define parameters for source systems integrated into your Data Vault. While you can create distinct configurations for each source, it's often practical to use a single set of settings for multiple similar systems (e.g., several SAP instances) or for a standardized staging area. The core of the source configuration is an array of these settings stored in source_system_settings.yaml. Each item in this array is an object defined by a unique URN, which acts as a key to identify and associate the settings with the relevant source object(s) in your data vault model.

Properties

The table below provides a comprehensive overview of each property available in the source_system_settings.yaml file, including its expected data type and a detailed description of its purpose.

Property NameExpected TypeDescription
(URN Key itself)stringA unique identifier for the source system configuration. Must follow the pattern: urn:s2v:source_setting:<NAME>.
hashkey_escape_charstringThe character used to escape the hashkey_delimiter if the delimiter character itself appears within a source value. This prevents misinterpretation during hash key concatenation.
empty_value_is_nullbooleanSpecifies if empty string values from the source should be treated as NULL values in the Data Vault. Set to true to normalize empty strings to NULL
trim_whitespacesboleanSpecifies if leading and trailing whitespaces should be trimmed from source column values before processing. Set to true to ensure data consistency.
load_timestamp_column_namestringThe name of the column in the source system that represents the load timestamp.
cdc_flag_column_namestringThe name of the column in the source system that acts as a CDC flag (e.g., indicating insert, update, delete).
cdc_value_mappingobjectAn object that defines the mapping between CDC operation types (insert, update, delete) and their corresponding values found in the source system's cdc_flag_column_name.
cdc_value_mapping.insertstringThe value in the source's CDC flag column that indicates an insert operation.
cdc_value_mapping.updatestringThe value in the source's CDC flag column that indicates an update operation. Can also be "[delete_value]+[insert_value]" for delete+insert logic.
cdc_value_mapping.deletestringThe value in the source's CDC flag column that indicates a delete operation.
cdc_window_thresholdintegerIndicates a time window in seconds that needs to pass to consider a record as a true delete. This is useful for handling soft deletes or asynchronous CDC processes where an update might be quickly followed by a delete.

Examples

Below is an example of a source_system_settings.yaml file with common configurations. You can adapt these settings to match your specific Data Vault environment and requirements.

source_system_settings:
# Source configuration
- urn:s2v:source_setting:SAP:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: 'LOAD_DATE'
cdc_flag_column_name: 'CDC_FLAG'
cdc_value_mapping:
insert: 'I'
update: 'D+I'
delete: 'D'
cdc_window_threshold: 1
# Source configuration
- urn:s2v:source_setting:SALESFORCE:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: 'CDC_LOAD_TIMESTAMP'
cdc_flag_column_name: 'CDC_FLAG'
cdc_value_mapping:
insert: 'INSERT'
update: 'INSERT'
delete: 'DELETE'
cdc_window_threshold: 1
# Additional source configurations can be added here...
# ...