Skip to main content
Version: 1.3.0

Source Configuration

These settings define parameters for source systems integrated into your Data Vault. While you can create distinct configurations for each source, it's often practical to use a single set of settings for multiple similar systems (e.g., several SAP instances) or for a standardized staging area. The core of the source configuration is an array of these settings stored in source_system_settings.yaml. Each item in this array is an object defined by a unique URN, which acts as a key to identify and associate the settings with the relevant source object(s) in your data vault model.

Properties

The table below provides a comprehensive overview of each property available in the source_system_settings.yaml file, including its expected data type and a detailed description of its purpose.

Property NameExpected TypeDescription
(URN Key itself)stringA unique identifier for the source system configuration. Must follow the pattern: urn:s2v:source_setting:<NAME>.
hashkey_escape_charstringThe character used to escape the hashkey_delimiter if the delimiter character itself appears within a source value. This prevents misinterpretation during hash key concatenation.
empty_value_is_nullbooleanSpecifies if empty string values from the source should be treated as NULL values in the Data Vault. Set to true to normalize empty strings to NULL
trim_whitespacesboleanSpecifies if leading and trailing whitespaces should be trimmed from source column values before processing. Set to true to ensure data consistency.
load_timestamp_column_namestringThe name of the column in the source system that represents the load timestamp.
load_typestringSpecifies the type of incoming data, oneOf [full_history, incremental] and considered in code generation .
cdc_flag_column_namestringOnly if load_type: incremental: The name of the column in the source system that acts as a CDC flag (e.g., indicating insert, update, delete).
cdc_value_mappingobjectOnly if load_type: incremental: An object that defines the mapping between CDC operation types (insert, update, delete) and their corresponding values found in the source system's cdc_flag_column_name.
cdc_value_mapping.insertstringOnly if load_type: incremental: The value in the source's CDC flag column that indicates an insert operation.
cdc_value_mapping.updatestringOnly if load_type: incremental: The value in the source's CDC flag column that indicates an update operation. Can also be "[delete_value]+[insert_value]" for delete+insert logic.
cdc_value_mapping.deletestringOnly if load_type: incremental: The value in the source's CDC flag column that indicates a delete operation.
cdc_value_mapping.cdc_window_thresholdintegerOnly if load_type: incremental and updates are written as [delete_value]+[insert_value] in 2 records: Indicates a time window in seconds that needs to pass to consider a record as a true delete. This is useful for handling soft deletes or asynchronous CDC processes where an update might be quickly followed by a delete.
cdc_value_mapping.cdc_window_timeunitintegerOnly if load_type: incremental and updates are written as [delete_value]+[insert_value] in 2 records: Indicates the timeunit for the cdc_window_threshold.

Examples

Below is an example of a source_system_settings.yaml file with common configurations. You can adapt these settings to match your specific Data Vault environment and requirements.

source_system_settings:
# Source configuration for tables which are always loaded in full mode
- urn:s2v:source_setting:FULL:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: PSA_LOAD_DATE
load_type: full_history
# Source configuration for true icnremental sources
- urn:s2v:source_setting:INCREMENTAL:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: PSA_LOAD_DATE
cdc_flag_column_name: PSA_CDC_FLAG
load_type: incremental
cdc_value_mapping:
insert: I
update: U
delete: D
# Source configuration for incremental sources with soft-delete mechanism
- urn:s2v:source_setting:INCREMENTAL_DELETE:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: PSA_LOAD_DATE
cdc_flag_column_name: PSA_CDC_FLAG
load_type: incremental
cdc_value_mapping:
insert: I
update: D+I
delete: D
cdc_window_threshold: 1
cdc_window_timeunit: millisecond
# Additional source configurations can be added here...
# ...