Source Configuration
These settings define parameters for source systems integrated into your Data Vault. While you can create distinct configurations for each source, it's often practical to use a single set of settings for multiple similar systems (e.g., several SAP instances) or for a standardized staging area. The core of the source configuration is an array of these settings stored in source_system_settings.yaml
. Each item in this array is an object defined by a unique URN, which acts as a key to identify and associate the settings with the relevant source object(s) in your data vault model.
Properties
The table below provides a comprehensive overview of each property available in the source_system_settings.yaml
file, including its expected data type and a detailed description of its purpose.
Property Name | Expected Type | Description |
---|---|---|
(URN Key itself) | string | A unique identifier for the source system configuration. Must follow the pattern: urn:s2v:source_setting:<NAME> . |
hashkey_escape_char | string | The character used to escape the hashkey_delimiter if the delimiter character itself appears within a source value. This prevents misinterpretation during hash key concatenation. |
empty_value_is_null | boolean | Specifies if empty string values from the source should be treated as NULL values in the Data Vault. Set to true to normalize empty strings to NULL |
trim_whitespaces | bolean | Specifies if leading and trailing whitespaces should be trimmed from source column values before processing. Set to true to ensure data consistency. |
load_timestamp_column_name | string | The name of the column in the source system that represents the load timestamp. |
cdc_flag_column_name | string | The name of the column in the source system that acts as a CDC flag (e.g., indicating insert, update, delete). |
cdc_value_mapping | object | An object that defines the mapping between CDC operation types (insert, update, delete) and their corresponding values found in the source system's cdc_flag_column_name . |
cdc_value_mapping.insert | string | The value in the source's CDC flag column that indicates an insert operation. |
cdc_value_mapping.update | string | The value in the source's CDC flag column that indicates an update operation. Can also be "[delete_value]+[insert_value]" for delete+insert logic. |
cdc_value_mapping.delete | string | The value in the source's CDC flag column that indicates a delete operation. |
cdc_window_threshold | integer | Indicates a time window in seconds that needs to pass to consider a record as a true delete. This is useful for handling soft deletes or asynchronous CDC processes where an update might be quickly followed by a delete. |
Examples
Below is an example of a source_system_settings.yaml
file with common configurations. You can adapt these settings to match your specific Data Vault environment and requirements.
source_system_settings:
# Source configuration
- urn:s2v:source_setting:SAP:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: 'LOAD_DATE'
cdc_flag_column_name: 'CDC_FLAG'
cdc_value_mapping:
insert: 'I'
update: 'D+I'
delete: 'D'
cdc_window_threshold: 1
# Source configuration
- urn:s2v:source_setting:SALESFORCE:
hashkey_escape_char: '\\'
empty_value_is_null: false
trim_whitespaces: true
load_timestamp_column_name: 'CDC_LOAD_TIMESTAMP'
cdc_flag_column_name: 'CDC_FLAG'
cdc_value_mapping:
insert: 'INSERT'
update: 'INSERT'
delete: 'DELETE'
cdc_window_threshold: 1
# Additional source configurations can be added here...
# ...