# https://s2v.reeeliance.com/ llms-full.txt ## Stream2Vault Overview [Skip to main content](https://s2v.reeeliance.com/#__docusaurus_skipToContent_fallback) [![Concepts](https://s2v.reeeliance.com/img/concept_1.svg)](https://s2v.reeeliance.com/docs/concepts/overview) ### [Concepts](https://s2v.reeeliance.com/docs/concepts/overview) Understand the key ideas behind Stream2Vault to use it more effectively. [![Getting Started](https://s2v.reeeliance.com/img/get_started.svg)](https://s2v.reeeliance.com/docs/getting-started/overview) ### [Getting Started](https://s2v.reeeliance.com/docs/getting-started/overview) Begin your adventure: Start using Stream2Vault quickly with a simple setup and easy-to-follow steps. [![Tutorials](https://s2v.reeeliance.com/img/tutorial.svg)](https://s2v.reeeliance.com/docs/tutorials/overview) ### [Tutorials](https://s2v.reeeliance.com/docs/tutorials/overview) Follow our step-by-step guide to learn how to use Stream2Vault with confidence. [![S2V Reference](https://s2v.reeeliance.com/img/ref_1.svg)](https://s2v.reeeliance.com/docs/detailed-guides/overview) ### [S2V Reference](https://s2v.reeeliance.com/docs/detailed-guides/overview) Find detailed information about Stream2Vault’s features, configuration options, and supported formats. [![Deployment](https://s2v.reeeliance.com/img/deploy.svg)](https://s2v.reeeliance.com/docs/standard-deployment/overview) ### [Deployment](https://s2v.reeeliance.com/docs/standard-deployment/overview) Learn how to deploy Stream2Vault in your environment. Step-by-step instructions help you go from install to ready. [![FAQ](https://s2v.reeeliance.com/img/faq.svg)](https://s2v.reeeliance.com/docs/faq/overview) ### [FAQ](https://s2v.reeeliance.com/docs/faq/overview) Got questions? Find quick answers to the most common ones about using and troubleshooting Stream2Vault. ## React Testing Page [Skip to main content](https://s2v.reeeliance.com/helloReact/index.html#__docusaurus_skipToContent_fallback) Testing pages with React. [Skip to main content](https://s2v.reeeliance.com/helloReact/index.html#__docusaurus_skipToContent_fallback) Testing pages with React. ## Standalone Page Creation [Skip to main content](https://s2v.reeeliance.com/markdown-page/index.html#__docusaurus_skipToContent_fallback) You don't need React to write simple standalone pages. [Skip to main content](https://s2v.reeeliance.com/markdown-page/index.html#__docusaurus_skipToContent_fallback) You don't need React to write simple standalone pages. ## Data Vault Overview [Skip to main content](https://s2v.reeeliance.com/docs/concepts/data-vault/index.html#__docusaurus_skipToContent_fallback) On this page Data Vault is a hybrid data modeling approach that combines the best of 3rd Normal Form (3NF) and Star Schema. It's designed to provide long-term historical storage of data coming from multiple operational systems and to be resilient to changes in the source systems. Data Vault is particularly well-suited for enterprise data warehouses (EDW) and data integration projects. The methodology focuses on separating structural information (keys and relationships) from descriptive attributes, which makes the model highly adaptable and scalable. ## Core Components [​](https://s2v.reeeliance.com/docs/concepts/data-vault/index.html\#core-components "Direct link to Core Components") The Data Vault model is built around three primary types of entities: 1. **Hubs (Hub Tables):** - Represent core business concepts or entities (e.g., Customer, Product, Order). - Contain a unique business key (or a hash of the business key) that identifies the entity across the enterprise. - Store minimal metadata, such as load date and record source. - Hubs are designed to be stable and rarely change. 2. **Links (Link Tables):** - Represent relationships or transactions between Hubs. - Contain the hash keys of the Hubs they connect. - Can also store metadata like load date and record source. - Links capture the associations between business concepts. For example, a Link table might connect a `Customer` Hub and an `Order` Hub to represent a customer placing an order. 3. **Satellites (Satellite Tables):** - Store descriptive attributes (contextual information) about Hubs or Links. - Are connected to a single parent Hub or Link table. - Track historical changes to attributes over time (Type 2 slowly changing dimensions are a common pattern). - Each Satellite typically groups attributes by their source system or rate of change. - This separation of attributes allows for flexibility; new attributes or source systems can be added by creating new Satellites without altering existing structures. ## Key Principles & Benefits [​](https://s2v.reeeliance.com/docs/concepts/data-vault/index.html\#key-principles--benefits "Direct link to Key Principles & Benefits") - **Auditability:** Data Vault stores raw, unaltered data from source systems, providing a clear audit trail. Load dates and record sources are tracked for every piece of data. - **Scalability:** The model is designed to scale out. Adding new data sources or attributes often involves adding new Satellites or Links without major refactoring of the existing model. - **Flexibility & Adaptability:** Changes in source systems (e.g., new attributes, modified relationships) can be incorporated with minimal impact on the existing Data Vault structure. - **Parallel Loading:** The decoupled nature of Hubs, Links, and Satellites allows for high degrees of parallel data loading, improving ETL/ELT performance. - **Integration:** Provides a robust framework for integrating data from disparate source systems into a unified view. - **Historical Tracking:** Satellites are designed to capture historical changes to data, enabling point-in-time reporting and analysis. ## Data Vault 2.0 [​](https://s2v.reeeliance.com/docs/concepts/data-vault/index.html\#data-vault-20 "Direct link to Data Vault 2.0") Data Vault 2.0 is an evolution of the original methodology, emphasizing aspects like: - Hash keys for all primary keys and relationships (improves join performance and integration). - Separation of concerns (modeling, methodology, architecture). - Emphasis on automation in the build and deployment process. Interested in learning more? Check out [Data Vault Documentation](https://www.datavault4dbt.com/documentation/) - [Core Components](https://s2v.reeeliance.com/docs/concepts/data-vault/index.html#core-components) - [Key Principles & Benefits](https://s2v.reeeliance.com/docs/concepts/data-vault/index.html#key-principles--benefits) - [Data Vault 2.0](https://s2v.reeeliance.com/docs/concepts/data-vault/index.html#data-vault-20) ## Data Vault Limitations [Skip to main content](https://s2v.reeeliance.com/docs/concepts/limitations/data-vault/index.html#__docusaurus_skipToContent_fallback) On this page While Stream2Vault (S2V) aims to provide comprehensive support for Data Vault 2.0 principles, there are certain limitations regarding the types of Data Vault objects and specific modeling patterns it currently supports. ## Unsupported Advanced Data Vault Objects [​](https://s2v.reeeliance.com/docs/concepts/limitations/data-vault/index.html\#unsupported-advanced-data-vault-objects "Direct link to Unsupported Advanced Data Vault Objects") Stream2Vault focuses on the core, standard Data Vault entities. Some more advanced or specialized Data Vault object types are not natively supported by S2V at this time. - **Multi-Active Satellites:** S2V does not currently have built-in support for generating Multi-Active Satellites. These satellites are used to store multiple active descriptive records for the same business key from the same source, often differentiated by a "multi-active key" attribute. ## Modeling Pattern Limitations [​](https://s2v.reeeliance.com/docs/concepts/limitations/data-vault/index.html\#modeling-pattern-limitations "Direct link to Modeling Pattern Limitations") These are not strict prohibitions but rather current constraints in how S2V interprets or generates certain patterns: - **Single Use of a Source in a Link:** A specific source table (defined by its database, schema, and table name) can be referenced only once within the `entity_sources` section of a single Link definition. If you need to derive different parts of the same Link from the same physical source table based on different criteria, you might need to model add a new relation. It's important to be aware of these limitations when designing your Data Vault model with Stream2Vault. Check the FAQ section for suggested design strategies to address these limitations. - [Unsupported Advanced Data Vault Objects](https://s2v.reeeliance.com/docs/concepts/limitations/data-vault/index.html#unsupported-advanced-data-vault-objects) - [Modeling Pattern Limitations](https://s2v.reeeliance.com/docs/concepts/limitations/data-vault/index.html#modeling-pattern-limitations) ## Snowflake Dynamic Table Integration [Skip to main content](https://s2v.reeeliance.com/docs/concepts/limitations/snowflake/index.html#__docusaurus_skipToContent_fallback) On this page ### Input Table Characteristics (Especially for Snowflake Dynamic Table Integration): [​](https://s2v.reeeliance.com/docs/concepts/limitations/snowflake/index.html\#input-table-characteristics-especially-for-snowflake-dynamic-table-integration "Direct link to Input Table Characteristics (Especially for Snowflake Dynamic Table Integration):") - **Physical Tables:** When S2V is used in conjunction with features like Snowflake Dynamic Tables for downstream processing (e.g., building a Standardized Zone from a Raw Zone), the input tables S2V reads from (e.g., Raw Zone tables) must be actual Snowflake tables. External tables or views may not be suitable as Dynamic Tables rely on change tracking capabilities of base tables. - **Change Tracking:** For optimal performance and to enable incremental processing by downstream Dynamic Tables, change tracking should be enabled on these source Snowflake tables. - **Full History:** These source tables should ideally contain a full history of the data. This is important for potential rebuilds of downstream layers (like a Standardized Zone or the Data Vault itself) during deployments or recovery scenarios. - [Input Table Characteristics (Especially for Snowflake Dynamic Table Integration):](https://s2v.reeeliance.com/docs/concepts/limitations/snowflake/index.html#input-table-characteristics-especially-for-snowflake-dynamic-table-integration) ## Stream2Vault Overview [Skip to main content](https://s2v.reeeliance.com/docs/concepts/overview/index.html#__docusaurus_skipToContent_fallback) Welcome to Stream2Vault (S2V)! This section provides a foundational understanding of what S2V is, its core concepts, architecture, and the key benefits it offers for Data Vault 2.0 implementation and automation. [**Data Vault**\\ \\ Get a high-level introduction to Data Vault methodology](https://s2v.reeeliance.com/docs/concepts/data-vault) [**What's S2V**\\ \\ Understand the S2V in simple terms.](https://s2v.reeeliance.com/docs/concepts/overview/introduction) [**Core Concepts**\\ \\ Learn about the fundamental components used in S2V.](https://s2v.reeeliance.com/docs/concepts/overview/basic-components/core-components) [**Limitations**\\ \\ Learn about the basic limitations of S2V.](https://s2v.reeeliance.com/docs/concepts/limitations/data-vault) ## Stream2Vault Architecture Overview [Skip to main content](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#__docusaurus_skipToContent_fallback) On this page ## Overview [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#overview "Direct link to Overview") The following diagram illustrates the architecture of the Stream2Vault (S2V) platform, showcasing its integration with enterprise services and its interaction with users. Here’s a breakdown of the key components and their relationships: ![Stream2Vault Architecture Diagram](https://s2v.reeeliance.com/assets/images/s2v-arch-ae50cc219c82309ffd15d1ee78050df4.jpg) ### Stream2Vault Service [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#stream2vault-service "Direct link to Stream2Vault Service") This is the core backend service of the Stream2Vault platform, responsible for validating user-defined data-vault models and generating the SQL scripts and deployment artifacts for Data Vault automation. - It communicates directly with the S2V Client for validating and generating outputs. - It integrates with the Authentication Service to validate user credentials and permissions. ### Organizational Enterprise Services [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#organizational-enterprise-services "Direct link to Organizational Enterprise Services") #### Authentication Service [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#authentication-service "Direct link to Authentication Service") Handles user authentication for both the S2V Client and the S2V Service. Ensures secure access to enterprise resources, such as Git repositories and databases. #### Git Service [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#git-service "Direct link to Git Service") Manages code generated by the S2V Service, enabling version control and collaboration. Acts as a bridge between the generated code and the organization's CI/CD pipelines for automated deployment. #### Database Service [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#database-service "Direct link to Database Service") Receives and stores the deployed Data Vault structures. Works as the target system for Data Vault objects generated and validated by the Stream2Vault platform. Deployments are managed through the CI/CD pipeline, ensuring automated, seamless integration into the database environment. ### Stream2Vault Client [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#stream2vault-client "Direct link to Stream2Vault Client") The S2V Client is the command-line interface used by data modelers, data engineers, and other stakeholders. It enables data modeling, where users design or modify the metadata and Data Vault objects. It interacts with the S2V Service to validate and generate configurations for deployment. Authenticates with the Authentication Service to ensure secure operations within the organizational environment. ## Workflow [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#workflow "Direct link to Workflow") The typical workflow involving Stream2Vault includes the following stages: ### Authentication [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#authentication "Direct link to Authentication") The user authenticates via the S2V Client, which in turn interacts with the enterprise Authentication Service. The S2V Service also verifies its connection with the Authentication Service. ### Validation and Generation [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#validation-and-generation "Direct link to Validation and Generation") The user defines their Data Vault model using YAML files and provides configuration files (e.g., `data_vault_settings.yaml`). These inputs are passed via the S2V Client to the S2V Service, which validates the model and generates SQL deployable scripts and other artifacts (like Makefiles). ### Code Management [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#code-management "Direct link to Code Management") The generated SQL scripts and deployment artifacts are typically committed to a Git repository, enabling version control and integration into enterprise CI/CD pipelines. ### Deployment [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#deployment "Direct link to Deployment") The CI/CD pipeline consumes the artifacts from Git (e.g., using the generated Makefile) to handle the deployment of the Data Vault structures to the target Database Service, ensuring efficient and automated deployment. ## Summary [​](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html\#summary "Direct link to Summary") This architecture ensures seamless collaboration between the Stream2Vault platform and organizational enterprise services. It integrates secure authentication, robust code management, and automated deployments, providing a streamlined and model-driven experience for data modelers and technical teams. - [Overview](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#overview) - [Stream2Vault Service](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#stream2vault-service) - [Organizational Enterprise Services](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#organizational-enterprise-services) - [Stream2Vault Client](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#stream2vault-client) - [Workflow](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#workflow) - [Authentication](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#authentication) - [Validation and Generation](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#validation-and-generation) - [Code Management](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#code-management) - [Deployment](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#deployment) - [Summary](https://s2v.reeeliance.com/docs/concepts/overview/architecture/index.html#summary) ## Change Data Capture Overview [Skip to main content](https://s2v.reeeliance.com/docs/concepts/overview/basic-components/cdc/index.html#__docusaurus_skipToContent_fallback) Change Data Capture is the process of identifying and capturing changes made to data in a source database and then delivering those changes in real-time or near real-time to a downstream system, like a data warehouse or, in this case, your Data Vault staging area. Instead of extracting a full copy of a data source every time, CDC provides only the data that has changed (inserts, updates, deletes) since the last extraction. **Why is CDC important for S2V and Data Vault?** For sources where changes (updates, deletes) need to be tracked historically in your Data Vault (especially in Satellites), a clear and reliable CDC mechanism in your source system or ETL pipeline is essential. S2V relies on the information provided by your CDC process, typically in the form of specific flags in the staged data, to correctly apply these changes to the Data Vault structures. Without effective CDC, you might only capture new records or miss critical updates and deletions, leading to an incomplete or inaccurate historical view. **Key Benefits of Using CDC:** - **Reduced Data Processing:** Instead of processing the entire dataset from a source system during each load cycle, CDC allows you to process only the records that have changed. This significantly reduces the volume of data transferred, staged, and processed, leading to faster ETL jobs and lower computational costs. - **Improved Efficiency and Performance:** By focusing on deltas, CDC minimizes the load on both source systems (during extraction) and target systems (like your data warehouse during loading). - **Near Real-Time Data Integration:** CDC enables more frequent updates to your Data Vault, providing fresher data for analytics and reporting. - **Cost Savings:** In cloud data warehouse environments like Snowflake, processing less data translates directly to lower compute costs. - **Synergy with Modern Data Platform Features:** CDC is highly complementary to features like Snowflake's Dynamic Tables. Dynamic Tables can automatically and incrementally refresh based on changes in underlying source tables. When these source tables are populated via a CDC process (e.g., into a staging layer), the Dynamic Tables can efficiently propagate only the new changes into the Data Vault or downstream marts, further optimizing the data pipeline. By implementing a robust CDC strategy, you ensure that your Data Vault remains an accurate, timely, and cost-effective representation of your business operations. warning **Critical Requirement for CDC Integrity:** - **Deterministic Ordering:** Maintaining the exact sequence of operations as they occurred in the source system is not just important, it's an **must for reliable CDC and subsequent Data Vault loading**. If multiple changes occur for the same record within the same load window (e.g., an insert followed by an update), processing them out of order will lead to data corruption and an incorrect historical view. - **How to Achieve Determinism:** - **Source-Provided Load Timestamp:** The ideal scenario is when the source system's CDC mechanism inherently provides a guaranteed sequence via a consistently increasing timestamp with sufficient granularity. - **Load Timestamp & Sequence Column:** A common and robust pattern is to use the load timestamp in conjunction with an additional sequence number (e.g., a transaction log sequence number or a dedicated sequence column). This combination ensures that records are processed in the precise order they were captured, which is fundamental for maintaining data integrity in CDC patterns. ## S2V Client Overview [Skip to main content](https://s2v.reeeliance.com/docs/concepts/overview/basic-components/core-components/index.html#__docusaurus_skipToContent_fallback) **S2V Client (Command Line Interface - CLI):** This is the primary way users interact with S2V. It's a Python-based application that provides commands like s2v login, s2v validate, s2v generate, and s2v version. **YAML-based Model Definition Layer:** S2V relies heavily on YAML files for users to define their Data Vault models. This includes: - **Configuration Files:** Such as `data_vault_settings.yaml` (for global DV settings) and `source_system_settings.yaml` (for defining source system specific configurations). - **Data Vault Object Files:** Individual YAML files defining Hubs, Links, Satellites, and other Data Vault entities. This declarative approach allows users to specify what their Data Vault should look like. - **Source Metadata Input (information\_schema.csv):** This CSV file provides S2V with the necessary metadata about the source system tables and columns (like TABLE\_CATALOG, TABLE\_SCHEMA, TABLE\_NAME, COLUMN\_NAME, DATA\_TYPE). It's essential for validating the model definitions against the actual source structures. **S2V Service**: This is the core backend engine of the Stream2Vault platform. When you use the S2V Client to validate your model or generate code, the client sends your YAML definitions and configurations to the S2V Service. The S2V Service then: - Performs validation of your Data Vault model against rules and source metadata. - Generates the SQL scripts (DDL, DML) and other deployment artifacts (like Makefiles) needed to create and populate your Data Vault. - Integrates with your organization's authentication services to ensure secure operations. Essentially, it's the central processing unit that translates your designs into a deployable Data Vault. ## Stream2Vault Overview [Skip to main content](https://s2v.reeeliance.com/docs/concepts/overview/introduction/index.html#__docusaurus_skipToContent_fallback) Stream2Vault (S2V) is a tool designed to make building and managing your Data Vault much easier and faster. Think about how you design your Data Vault: you identify your business concepts (Hubs), the relationships between them (Links), and the descriptive details for each (Satellites). Traditionally, turning these designs into actual database tables and the logic to load them can involve a lot of manual coding and can be prone to errors. **S2V streamlines and accelerates this entire lifecycle.** Here’s how it works in simple terms: 1. **You Define Your Model with Speed:** Instead of writing complex SQL code from scratch, you describe your Hubs, Links, and Satellites in simple, human-readable configuration files (using **YAML**). This format makes it incredibly fast to define new objects, add attributes, or change existing structures. You tell S2V things like: - "This is a Customer Hub, and its business key is `CUSTOMER_ID` from my source system." - "This Link connects Customers to Orders." - "This Satellite stores the customer's address details and should track history." 2. **S2V Generates and Deploys Rapidly:** You then use the S2V command-line tool. It takes your YAML model definitions, validates them against Data Vault rules, and then generates all the necessary SQL code. - **Deployment is fast**, and S2V supports **partial deployments**, meaning you can update or add only specific parts of your Data Vault without redeploying everything. This is a huge time-saver, especially in large or evolving environments. - When used with technologies like **Snowflake Dynamic Tables**, the operational overhead is significantly reduced. You simply define your desired data freshness, and the underlying platform handles the refresh logic – no complex orchestration jobs or schedulers to manage for these objects. **Why is this helpful for someone familiar with Data Vault?** - **Unmatched Development Speed:** YAML definitions allow you to create, add, or change Data Vault objects much faster than manual SQL coding. - **Consistency:** Ensures that your Data Vault objects are built using standardized patterns and best practices. - **Reduced Errors:** The structured YAML approach and built-in validations minimize the chance of human error. - **Focus on Modeling:** Allows data modelers and data engineers to focus more on the business logic and the design of the Data Vault, rather than the intricacies of SQL implementation. - **Agile Adaptability:** When your source systems or business requirements change, updating your YAML definitions is straightforward. S2V can then quickly regenerate and deploy only the affected parts of your Data Vault. - **Simplified Operations:** Especially with features like Snowflake Dynamic Tables, S2V helps you move away from managing intricate loading jobs and orchestration, letting you focus on data freshness and model integrity. In essence, Stream2Vault empowers you to build, modify, and deploy your Data Vault with exceptional speed and simplicity. It translates your Data Vault designs into optimized, deployable database structures, significantly reducing manual effort and operational complexity. ## Stream2Vault Security Overview [Skip to main content](https://s2v.reeeliance.com/docs/concepts/overview/security/index.html#__docusaurus_skipToContent_fallback) Stream2Vault (S2V) prioritizes security through a multi-layered approach. The S2V client, installed on user workstations, communicates with the cloud-based S2V service (hosted in GCP Europe West1 Region) via APIs. It's designed with a strong emphasis on security, leveraging robust authentication mechanisms, ensuring data privacy by not storing client or business data, and employing secure cloud hosting practices with comprehensive logging and access controls. **Authentication and Authorization:** - S2V employs an OAuth 2.0 mechanism, integrating with the client organization's Identity Provider (IdP), such as Microsoft Entra. - Users authenticate via their organization's IdP, supporting existing policies like Multi-Factor Authentication (MFA). Basic authentication is not supported. - The S2V service validates tokens issued by the IdP, ensuring only authorized users from the organization's domain can access the platform. - The S2V service itself must be authorized by the client's system administrator to validate these tokens. **Data Handling and Privacy:** - **No PII or Business Data Stored:** Stream2Vault does not collect or store any Personally Identifiable Information (PII) or business data. - **In-Memory Processing:** Configuration files sent to the service for validation are processed entirely in-memory and are not retained by the service. - **No Database Credential Exposure:** Database credentials are not exposed to the S2V service. - **No Access to Client Infrastructure:** The S2V service has no direct access to any client infrastructure. **Hosting and Infrastructure Security:** - **GCP Cloud Run:** The S2V service is hosted on Google Cloud Run, a fully managed serverless platform. - **Managed by reeeliance IM GmbH:** The GCP tenant hosting S2V is owned and managed by reeeliance IM GmbH. The S2V Product Team within reeeliance is responsible for its management, including provisioning, monitoring, and securing resources, following the principle of least privilege. - **Containerization:** The application runs as a containerized service, with images stored in Google Artifact Registry. - **IAM and Service Accounts:** Access is controlled via Google IAM, and service accounts are used for secure execution. - **CI/CD Pipelines:** All deployments follow CI/CD pipelines for security and consistency. - **HTTPS Communication:** All communication between the client and the S2V service is secured using HTTPS. **Logging and Monitoring:** - **Service-Level Logging:** API requests to the S2V service are logged. - **Google Cloud Logging:** Application logs (API requests, processing events), security logs (authentication, access control), and audit logs (infrastructure changes) are collected via Google Cloud Logging. - **Client-Side Authentication Logs:** Authentication attempts are also recorded in the client's Azure AD Sign-in and Audit Logs. - **Restricted Log Access:** Access to logs is restricted to authorized administrators and security teams via Google IAM and the client's Azure AD policies. S2V service logs are retained indefinitely. **Regulatory Alignment:** - S2V aligns with best practices for SOC 2, ISO 27001, and GDPR by enforcing federated authentication, restricting API access with OAuth 2.0, logging access attempts, and using IAM for role-based access control. ## Stream2Vault FAQ [Skip to main content](https://s2v.reeeliance.com/docs/faq/overview/index.html#__docusaurus_skipToContent_fallback) On this page ## Security [​](https://s2v.reeeliance.com/docs/faq/overview/index.html\#security "Direct link to Security") **1\. Who owns the GCP tenant for Stream2Vault and who is responsible for its management?** The GCP tenant hosting the Stream2Vault (S2V) application is owned and managed by reeeliance IM GmbH. The S2V Product Team within reeeliance is responsible for its management, including provisioning, monitoring, and securing resources. Access control follows the principle of least privilege, ensuring only authorized personnel can perform administrative actions. **2\. How is the Stream2Vault application hosted in GCP?** Stream2Vault is hosted on Google Cloud Platform (GCP) using Google Cloud Run, a fully managed, serverless container execution platform. It runs as a containerized service with access controlled via Google IAM. The application image is stored in Google Artifact Registry, and service accounts are used for secure execution. All deployments follow CI/CD pipelines to ensure security and consistency. **3\. How does communication happen between the client environment and the Stream2Vault hosting environment?** Communication between the client environment and the Stream2Vault service happens via the S2V client, which runs on users' machines. The client connects to the Stream2Vault server, which is deployed on Google Cloud Run. All communication is secured using HTTPS and authenticated via OAuth 2.0 using Azure AD. Users authenticate with their Azure AD credentials, and API access is authorized using OAuth tokens. IAM policies further enforce access control. No direct network connectivity between the client environment and the hosting environment is required—users initiate connections via the client, which interacts with the service securely over the internet. **4\. Where and how are logs stored? Who has access to these logs?** Application logs are collected via Google Cloud Logging. This includes: - Application logs (API requests, processing events) - Security logs (authentication, access control events) - Audit logs (changes to infrastructure, configurations) In addition, authentication logs should also be recorded in the client's Azure AD Sign-in Logs and Audit Logs. Access to logs is restricted to authorized administrators and security teams as per Google IAM and Azure AD policies. Stream2Vault logs are retained indefinitely. **5\. Are there any regulatory compliance considerations for Stream2Vault?** Stream2Vault does not collect or store any PII, business data, or customer information. - No personally identifiable information (PII) is collected by the service or the client. - No business data is collected or stored. SQL code generated or configuration files sent to the service for validation are processed entirely in-memory and are not retained. - No database credentials are exposed to the S2V service. The S2V service has no access to any client infrastructure. In terms of regulatory compliance, Stream2Vault aligns with SOC 2, ISO 27001, and GDPR best practices by: - Enforcing federated authentication via Azure AD - Restricting API access using OAuth 2.0 tokens - Logging authentication and access attempts for auditability - Preventing unauthorized access using IAM role-based policies ## Modelling [​](https://s2v.reeeliance.com/docs/faq/overview/index.html\#modelling "Direct link to Modelling") **How should multi-active satellites be modeled?** Modeling multi-active satellites requires careful consideration of the specific use case. Here are a few approaches: 1. **Verify Necessity**: First, confirm if a multi-active satellite is genuinely required. Often, what appears to be a multi-active scenario can be addressed by a standard [Satellite](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite) if the driving key correctly represents the grain. 2. **Revisit Model Design**: Review your overall Data Vault model. Sometimes, adjustments to existing [Hubs](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub), [Links](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link), or the introduction of new ones can resolve the need for a multi-active satellite. 3. **Introduce Technical Hubs**: In some cases, creating a technical [Hub](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub) to represent the unique combination of keys that define the multi-active relationship can be a solution. This new Hub would then be parent to the Satellite. The best approach depends on the specific scenario. We recommend consulting Data Vault best practices or seeking expert advice for complex situations. **How can a Link be modeled if it uses the same source table multiple times for different relationships?** When a [Link](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link) needs to connect to the same source table to represent different roles or relationships (e.g., an `Employee` table for both an employee and their manager), you should not list the same source multiple times under `entity_sources` in the Link's YAML. Instead, treat each role as a distinct connection: 1. In the Link's `connected_hubs` section, reference the same Hub multiple times but use different **aliases** for each instance. For example 2. In the `entity_sources` section, map the source columns to the respective aliased Hub's business keys. If this aliasing approach doesn't fit, it might indicate a need to revisit the Link's design or the surrounding Hub structures. - [Security](https://s2v.reeeliance.com/docs/faq/overview/index.html#security) - [Modelling](https://s2v.reeeliance.com/docs/faq/overview/index.html#modelling) ## Snowflake Limitations FAQ [Skip to main content](https://s2v.reeeliance.com/docs/faq/snowflake/sf-limitations/index.html#__docusaurus_skipToContent_fallback) ## Snowflake Setup Guide [Skip to main content](https://s2v.reeeliance.com/docs/faq/snowflake/sf-setup/index.html#__docusaurus_skipToContent_fallback) ## Stream2Vault Prerequisites [Skip to main content](https://s2v.reeeliance.com/docs/getting-started/before-you-begin/prerequisites/index.html#__docusaurus_skipToContent_fallback) On this page Stream2Vault is a client application designed and built to streamline generation of the data vault objects available as a python package. Before you install and use the Stream2Vault (S2V) client, please ensure your system meets the following prerequisites: ## System Requirements [​](https://s2v.reeeliance.com/docs/getting-started/before-you-begin/prerequisites/index.html\#system-requirements "Direct link to System Requirements") 1. **Python:** - **Version:** 3.12 or higher. - S2V leverages features available in recent Python versions. 2. **Python Packaging System:** - You'll need a Python packaging tool to install S2V and its dependencies. - **Recommended:** [uv](https://astral.sh/) \- A fast Python package installer and resolver, written in Rust. It can be a significantly faster alternative to pip. - **Alternative:** [pip](https://pip.pypa.io/en/stable/installation/) \- The standard package installer for Python. It usually comes bundled with Python installations. - [System Requirements](https://s2v.reeeliance.com/docs/getting-started/before-you-begin/prerequisites/index.html#system-requirements) ## Initialize Stream2Vault Project [Skip to main content](https://s2v.reeeliance.com/docs/getting-started/initialize-first-project/index.html#__docusaurus_skipToContent_fallback) On this page This guide will help you set up and run your first Stream2Vault (S2V) project. **Before you start, please ensure:** - You have the Stream2Vault client installed (see Installation). - You have met all Prerequisites. - You have logged in via `s2v login -c ` ## Step 1: Get Your Project Files [​](https://s2v.reeeliance.com/docs/getting-started/initialize-first-project/index.html\#step-1-get-your-project-files "Direct link to Step 1: Get Your Project Files") The easiest way to start is by using a pre-configured sample project. Alternatively, you can create the minimal files manually. ### Option A: Use Sample Data (Recommended for Beginners) [​](https://s2v.reeeliance.com/docs/getting-started/initialize-first-project/index.html\#option-a-use-sample-data-recommended-for-beginners "Direct link to Option A: Use Sample Data (Recommended for Beginners)") 1. **Download Sample Project:** - Download the **Default Project** from [Sample Data](https://s2v.reeeliance.com/docs/tutorials/sample-data) page. 2. **Extract the Project:** - Extract the contents of the downloaded ZIP file into a new folder on your computer. Let's call this folder `my_first_s2v_project/`. - Inside `my_first_s2v_project/`, you should find a subfolder, often named `dv_model/` or similar, containing all the necessary YAML files and configuration. For this guide, we'll assume it's named `dv_model/`. ### Option B: Create a Minimal Project Manually [​](https://s2v.reeeliance.com/docs/getting-started/initialize-first-project/index.html\#option-b-create-a-minimal-project-manually "Direct link to Option B: Create a Minimal Project Manually") If you prefer to start from scratch: 1. **Create Project Folders:** - Create a main project folder, for example, `my_first_s2v_project/`. - Inside `my_first_s2v_project/`, create a subfolder named `dv_model/`. This will be your input model directory. 2. **Create Essential Configuration Files** inside `dv_model/`: - **`data_vault_settings.yaml`:** Defines global settings for your Data Vault generation. See [Data Vault Settings](https://s2v.reeeliance.com/docs/tutorials/configuration-files/data-vault-configuration). - **`source_system_settings.yaml`:** Defines settings for your source systems. See [Source System Settings](https://s2v.reeeliance.com/docs/tutorials/configuration-files/source-configuration). - **`information_schema.csv`:** Contains metadata about your source tables (columns, data types). S2V uses this to validate your model definitions against actual source structures. If you are not using sample data, you might need to manually create or obtain the information schema. See [Information Schema](https://s2v.reeeliance.com/docs/tutorials/configuration-files/information-schema). 3. **Create a Simple Data Vault Object File:** - You can create your own Hub definition file (e.g., `hub_customer.yaml`) in the `dv_model/` directory using the template below. - Alternatively, follow the [Build a Hub Tutorial](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hub) for a guided example. - Ensure the source table and columns you reference here exist in your `information_schema.csv`. ```codeBlockLines_e6Vv # dv_model/hub.yaml name: '' entity_type: 'hub' enable_refresh: true concatenate_business_keys: false requires_bussines_key: false target_business_key_columns: - '' entity_sources: - urn:s2v:hub_source:src_customers: entity_source: '(DATABASE_NAME, SCHEMA_NAME, TABLE_NAME)' source_system_configuration_urn: 'urn:s2v:source_setting:' business_key_mapping: - : - ',' ``` tip The include/exclude option is especially handy for partial deployments, as it generates the same code structure but only includes the selected objects. You don't need a specific deployment procedure; just run `make deploy_all`. See [Deployment with Makefile](https://s2v.reeeliance.com/docs/standard-deployment/make) info For more details about the output of `generate` command, please refer to the [Understanding the Generated Code](https://s2v.reeeliance.com/docs/detailed-guides/generated-files). - [Options](https://s2v.reeeliance.com/docs/detailed-guides/cli-detailed-guides/generate/index.html#options) ## Data Vault Validation [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/cli-detailed-guides/validate/index.html#__docusaurus_skipToContent_fallback) On this page The `validate` command is used to check your data vault model for any structural or setup problems before you create the final deployment code. It makes sure that your input files are correctly put together and match the required settings. Validation happens in multiple steps, starting with checking the format of your YAML files. The format validator looks for common YAML errors, like a missing colon after a key. It also verifies that your data vault objects (like hubs) match their expected structure, ensuring, for example, that every hub has a defined name among others After successfully passing the format validation, the tool then checks the content of your input model. This includes checks as verifying that objects don't contain duplicate entity sources and many others. The final step ensures that all source columns used throughout your model are properly listed in your information schema to secure successfull deployment. It's important to understand that these validation phases run in a specific order. This means the tool will not proceed to content validation if it finds errors in the file format. ### Options [​](https://s2v.reeeliance.com/docs/detailed-guides/cli-detailed-guides/validate/index.html\#options "Direct link to Options") `-i, --input DIRECTORY` \- (Required) This is the folder where your data vault model files are located. `-u, --url URL` \- URL of the Stream2Vault server to connect to. Default: [https://s2v.reeeliance.com](https://s2v.reeeliance.com/) ```codeBlockLines_e6Vv s2v validate -i path/to/my-input-folder/ ``` This basic command is sufficient to validate your input model in case the folder structure follows default structure (all configuration files must be a part of the root folder). If that's not the case, S2V lets you specify the exact location of individual files using the following options: `--information-schema-path FILE` \- The location of the information schema file used during the validation process. `--data-vault-settings-path FILE` \- The location of the file containing specific settings for your data vault. `--source-system-settings-path FILE` \- The location of the file with settings specific to your source systems. ```codeBlockLines_e6Vv s2v validate -i path/to/my-input-folder \ --data-vault-settings-path settings/dv_settings.yaml \ --source-system-settings-path settings/source_settings.yaml \ --information-schema-path settings/inf_schema_files/dev_schema.csv ``` tip Consider defining default `validate` commands in your Makefile to simplify your work! By default the `validate` command checks everything in your model within the specified folder. However, if you only want to validate certain parts of your model, you can use `--include-objects` and `--exclude-objects` options. These options expect a list of object names, separated by commas, that you want to include or exclude. `--include-objects TEXT` \- Comma-separated list of specific object names to validate. `--exclude-objects TEXT` \- Comma-separated list of object names to exclude from validation. ```codeBlockLines_e6Vv s2v validate -i path/to/my-input-folder \ --include-objects HUB_CUSTOMER,SAT_CUSTOMER ``` To see all the available options for this command, type `s2v validate --help`. - [Options](https://s2v.reeeliance.com/docs/detailed-guides/cli-detailed-guides/validate/index.html#options) ## Data Vault Visualization [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/cli-detailed-guides/visualize/index.html#__docusaurus_skipToContent_fallback) On this page The `visualize` command allows you to see and explore your data vault model as a network graph. The command runs a local dashboard that you can access in your web browser at [http://127.0.0.1:8050/](http://127.0.0.1:8050/). Please make sure you provide a valid model, as this command doesn't automatically validate your model, and some features might not work correctly if there are issues. To see all the available options for this command, type `s2v visualize --help`. ### Options [​](https://s2v.reeeliance.com/docs/detailed-guides/cli-detailed-guides/visualize/index.html\#options "Direct link to Options") `-i, --input DIRECTORY` \- (Required) This is the folder where your data vault model files are located. ```codeBlockLines_e6Vv s2v visualize -i path/to/my-input-folder/ ``` tip You can easily open the dashboard by cmd+left click on the url in CLI - [Options](https://s2v.reeeliance.com/docs/detailed-guides/cli-detailed-guides/visualize/index.html#options) ## Data Vault 2.0 Deployment [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#__docusaurus_skipToContent_fallback) On this page Stream2Vault (S2V) generates a structured set of SQL scripts and Makefiles designed to facilitate the deployment and management of your Data Vault 2.0 objects in Snowflake. This document explains the organization of these generated files. ## Output Directory Structure [​](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html\#output-directory-structure "Direct link to Output Directory Structure") All generated code is placed within a main output directory (e.g., `GENERATED_CODE/`). This directory contains: - An `INIT_DYNAMIC_TABLE/` (and/ or `INIT_TABLE/`) folder for initialization scripts. - A `DYNAMIC_TABLE/` (and/ or `TABLE/`) folder containing the SQL definitions for your Data Vault entities. - A root `Makefile` to orchestrate the deployment. - Several auxiliary CSV and JSON files providing metadata and lineage information. ### Simplified Directory Tree Example [​](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html\#simplified-directory-tree-example "Direct link to Simplified Directory Tree Example") Here's a simplified view of the typical directory structure, focusing on a Hub and its Satellite to illustrate the pattern: ```codeBlockLines_e6Vv GENERATED_CODE/ ├── DYNAMIC_TABLE/ │ ├── HUB/ │ │ ├── HUB_MATERIAL/ │ │ │ ├── MAIN_LAYER/ │ │ │ │ └── HUB_MATERIAL.sql │ │ │ ├── PREP_LAYER/ │ │ │ │ ├── HUB_MATERIAL[urn_s2v_hub_source_src_1].sql │ │ │ │ ├── HUB_MATERIAL[urn_s2v_hub_source_src_2].sql │ │ │ ├── REFRESH_LAYER/ │ │ │ │ └── HUB_MATERIAL.sql │ │ │ └── STREAMS_LAYER/ │ │ │ └── HUB_MATERIAL.sql │ ├── HUB_SAT/ │ │ ├── SAT_CDC_FLAG/ │ │ │ ├── MAIN_LAYER/ │ │ │ │ └── SAT_CDC_FLAG.sql │ │ │ ├── PREP_LAYER/ │ │ │ │ └── SAT_CDC_FLAG[(SOURCE_DATA,CUSTOMER)].sql │ │ │ ├── REFRESH_LAYER/ │ │ │ │ └── SAT_CDC_FLAG.sql │ │ │ └── STREAMS_LAYER/ │ │ │ └── SAT_CDC_FLAG.sql │ ├── LINK_SAT/ # For Link Satellites │ ├── LOOKUP/ # For Looup Tables │ ├── NON_HISTORIZED_LINK/ # For Non-Historized Links │ ├── REFERENCE/ # For Reference │ ├── REGULAR_LINK/ # For Regular Links │ └── STATUS_TRACKING_SATELLITE/ # For Status Tracking Satellites ├── INIT_DYNAMIC_TABLE/ │ ├── CREATE_SCHEMAS.sql │ ├── CREATE_TAGS.sql │ └── GHOST_RECORD.sql ├── Makefile ├── dependencies.json ├── entity_relationships.csv ├── lineage.csv └── source_system_urn_check.csv ``` ## Core Components [​](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html\#core-components "Direct link to Core Components") ### 1\. `Makefiles` [​](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html\#1-makefiles "Direct link to 1-makefiles") - A `Makefile` generated in the root of the output directory and individual `Makefiles` generated in each subdirectory - For more details on how this Makefile is used, refer to the Deployment via Makefile documentation. ### 2\. `INIT_/` Folder [​](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html\#2-init_table_type-folder "Direct link to 2-init_table_type-folder") This folder contains essential one-time setup scripts: - **`CREATE_SCHEMAS.sql`**: SQL to create the necessary database schemas - **`CREATE_TAGS.sql`**: SQL to create Snowflake tags used for versioning or classification of Data Vault objects - **`GHOST_RECORD.sql`**: SQL to create the `GHOST_RECORD` table. This table provides standardized unknown/missing key records, crucial for maintaining referential integrity in Data Vault models. ### 3\. `DYNAMIC_TABLE/` or `TABLE/` Folder [​](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html\#3-dynamic_table-or-table-folder "Direct link to 3-dynamic_table-or-table-folder") - This top-level folder distinguishes between code generated for Snowflake **Dynamic Tables** and regular Snowflake **Tables**. The choice depends on your S2V configuration and Data Vault implementation strategy. Within this folder, the structure is hierarchical: 1. **Entity Type Folders (e.g., `HUB`, `HUBSAT`):** - Objects are grouped by their Data Vault entity type. Common entity type folders you will see include: - `HUB`: For Hub objects. - `REGULAR_LINK`: For Regular Link objects. - `HUBSAT`: For Hub Satellite objects. - `LINKSAT`: For Link Satellite objects. - `NON_HISTORIZED_LINK`: For Non-Historized Link objects. - `REFERENCE`: For Reference tables. - `STATUS_TRACKING_SATELLITE`: Automatically generated for each Link to track the status of relationship instances. - `LOOKUP_TABLE`: Generated when `lookup_mapping` is used in an entity's definition. 2. **Entity Name Folders (e.g., `HUB_MATERIAL`, `SAT_MATERIAL_DETAILS`):** - Each specific Data Vault object defined in your YAML metadata will have its own folder under its entity type. 3. **Layer Folders (e.g., `PREP_LAYER`, `MAIN_LAYER`):** - These folders organize the SQL scripts based on their role in the data processing and deployment pipeline, particularly for Dynamic Table implementations: - **`PREP_LAYER` (Preparation Layer):** - Contains SQL scripts to create "preparation" Dynamic Tables, one for each source feeding a main Data Vault object. - **`MAIN_LAYER` (Main/Core Layer):** - Contains the SQL script for the primary Data Vault Dynamic Table (Hub, Link, Satellite). - **`STREAMS_LAYER` (for CDC on Dynamic Tables):** - Contains SQL to create a Snowflake Stream on the main Data Vault Dynamic Table to capture changes. - **`REFRESH_LAYER` (Initializing Dynamic Tables):** - Contains SQL to trigger an the refresh of the main Data Vault Dynamic Table. info **Auto-generated Objects:** - **Status Tracking Satellites (STS):** An STS is automatically generated for every Link object. Its purpose is to track the status and history of the relationships managed by the Link. The STS's column `SATELLITE_PAYLOAD` includes the values of the source columns that participate in the relationships, together with an additinal fields. - **Lookup Tables:** These are technical objects generated when a `lookup_mapping` is defined for an entity. They facilitate lookup operations during data processing but are not considered part of the main consumable Data Vault layer. ### 4\. SQL File Content Characteristics [​](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html\#4-sql-file-content-characteristics "Direct link to 4. SQL File Content Characteristics") The generated SQL files share common characteristics: - **Variables:** SQL files use placeholder variables (e.g., `&{TARGET_DATABASE}`, `&{TARGET_SCHEMA_DYNAMIC_TABLE}`, `&{TARGET_LAG}`). These are replaced at deployment time by SnowSQL, using values from your `.ini` configuration file. - **`CREATE OR REPLACE DYNAMIC TABLE ...` / `CREATE OR REPLACE TABLE ...`:** The core DDL statements. - For Dynamic Tables, this includes definitions for `TARGET_LAG`, `WAREHOUSE`, `REFRESH_MODE`, and `INITIALIZE`. - **`AS SELECT ...`:** The query that defines the logic for populating the table. - **Prep Layer:** Selects from source tables and applies transformations - **Main Layer:** Typically unions data from corresponding prep layer DTs ### 5\. Auxiliary Metadata Files [​](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html\#5-auxiliary-metadata-files "Direct link to 5. Auxiliary Metadata Files") Several other files are generated in the root of the output directory to provide insights into your model: - **`dependencies.json` (Optional):** - A JSON representation of dependencies between Data Vault objects, useful for understanding model structure or custom deployment orchestration. - **`entity_relationships.csv` (Optional):** - A CSV file detailing parent-child relationships within the Data Vault model (e.g., which Satellites belong to which Hubs/Links). - **`lineage.csv` (Optional):** - A CSV file providing data lineage information, tracing data from source columns to their target columns in the Data Vault. - **`source_system_urn_check.csv` (Optional):** - A CSV file that helps in verifying the consistency and usage of source system URNs across the model. - [Output Directory Structure](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#output-directory-structure) - [Simplified Directory Tree Example](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#simplified-directory-tree-example) - [Core Components](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#core-components) - [1\. `Makefiles`](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#1-makefiles) - [2\. `INIT_/` Folder](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#2-init_table_type-folder) - [3\. `DYNAMIC_TABLE/` or `TABLE/` Folder](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#3-dynamic_table-or-table-folder) - [4\. SQL File Content Characteristics](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#4-sql-file-content-characteristics) - [5\. Auxiliary Metadata Files](https://s2v.reeeliance.com/docs/detailed-guides/generated-files/index.html#5-auxiliary-metadata-files) ## Data Vault Model Errors [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html#__docusaurus_skipToContent_fallback) On this page When defining your Data Vault model and settings using YAML files, S2V performs validation checks. Issues are reported as errors or warnings: - **Errors (ERROR):** Critical issues preventing artifact generation. **Must be fixed.** - **Warnings (WARNING):** Potential problems. **Review and address recommended.** This document categorizes common messages. ## 1\. File and Basic YAML Processing [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html\#1-file-and-basic-yaml-processing "Direct link to 1. File and Basic YAML Processing") Checks for file presence and fundamental YAML syntax. - **Missing Configuration Files**: - `ERROR: Missing information_schema.csv / data_vault_settings.yaml / source_system_settings.yaml` - **Fix:** Ensure these files are in your input model's root. - **YAML Syntax & Structure**: - `ERROR: YAML contains duplicated keys` - **Fix:** Remove or rename duplicate keys in the YAML file. - `ERROR: YAML parsing error (e.g., YAMLError, ParserError)` - **Fix:** Check YAML syntax (indentation, colons) near the indicated line. See [YAML Errors](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/pyyaml) - `ERROR: Empty or invalid YAML content` - **Fix:** Ensure the YAML file has valid content and formatting. - `WARNING: Ignored file: ..., only '.yaml' files can be processed` - **Fix (if needed):** If the file should be processed, use a `.yaml` extension. - `WARNING: YAML doesn't contain entity_type property and not a configuration file` - **Fix:** Add the `entity_type` property (e.g., `hub`, `link`). - `WARNING: Ignored file: ... Unknown entity_type in entity_type property` - **Fix:** Correct the `entity_type` value. ## 2\. Data Vault Model Definition [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html\#2-data-vault-model-definition "Direct link to 2. Data Vault Model Definition") Checks related to the overall structure and consistency of your Data Vault objects. - **Object Naming & Uniqueness**: - `ERROR: DV model contains duplicated object [ObjectName] in files [FileA.yaml, FileB.yaml]` - **Fix:** Ensure object names are unique across all YAML files. Rename or consolidate. - `WARNING: The names of some objects in DV model differ only by case: [ObjectA (FileA.yaml), objecta (FileB.yaml)]` - **Fix:** Use consistent casing or make names distinctly different to avoid confusion. - **Partial Deployment**: - `ERROR: Entities [EntityName] do not exist. Make sure to provide existing objects for partial deployment.` - **Fix:** If using `--include-objects` or `--exclude-objects` with `generate` or `validate`, ensure specified entity names exist in your model. ## 3\. Data Vault Object YAML Structure (Schema Validation) [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html\#3-data-vault-object-yaml-structure-schema-validation "Direct link to 3. Data Vault Object YAML Structure (Schema Validation)") This validation uses predefined schemas to check if your YAML files for Data Vault objects (Hubs, Links, Satellites, References) and global settings ( `data_vault_settings.yaml`, `source_system_settings.yaml`) adhere to the expected structure, properties, data types, and patterns. - **Common Schema Validation Message Format**: - `ERROR: YAML property: path -> to -> property - Message: [Detailed error message from schema validator]` - The `path -> to -> property` points to the exact location of the issue in your YAML file. - **Examples of Schema Validation Messages**: - `Message: 'property_name' is a required property` - **Meaning:** A mandatory field is missing. - **Fix:** Add the required property. Refer to the specific model reference page (e.g., Hub, Link) for required fields. - `Message: Additional properties are not allowed ('extra_prop' was unexpected)` - **Meaning:** You've included a property that isn't defined for that object type or section. - **Fix:** Remove the unexpected property. - `Message: 'value' is not of type 'integer'/'string'/'boolean'` - **Meaning:** A property has the wrong data type. - **Fix:** Change the value to the correct data type. - `Message: 'value' does not match 'pattern'` - **Meaning:** A value doesn't follow a required format (e.g., URNs like `urn:s2v:hub_source:NAME`, or source table definitions like `(schema,table)`). - **Fix:** Adjust the value to match the expected pattern. ## 4\. Data Vault Model Logic & Rules [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html\#4-data-vault-model-logic--rules "Direct link to 4. Data Vault Model Logic & Rules") These checks validate the semantic correctness and adherence to Data Vault principles within your model definitions. - **Hubs**: - `ERROR: Hub: [HubName] - Business key mapping for source [SourceURN] must not be empty.` - **Fix:** Ensure `business_key_mapping` is defined for the specified entity source. - `ERROR: Hub: [HubName] - Source business key for source [SourceURN] must not be empty because requires_source_business_key is true.` - **Fix:** If `requires_source_business_key: true` on the Hub, provide a `source_business_key` value in the entity source. - `ERROR: Hub: [HubName] - Source business key for source [SourceURN] must be empty because requires_source_business_key is false.` - **Fix:** If `requires_source_business_key: false` on the Hub, ensure `source_business_key` is empty or omitted in the entity source. - **Links**: - `ERROR: Link: [LinkName] - Connected hub [HubAlias -> HubName] does not exist.` - **Fix:** Ensure all Hubs listed in `connected_hubs` are defined in your model. - `ERROR: Link: [LinkName] - Connected hub relations for source [SourceURN] must not be empty.` - **Fix:** Define `connected_hub_relations` for the entity source. - `ERROR: Link: [LinkName] - Relation for connected hub [HubAlias] in source [SourceURN] is not defined.` - **Fix:** Ensure each Hub alias in `connected_hubs` has a corresponding entry in `connected_hub_relations` for that source. - **Satellites**: - `ERROR: Satellite: [SatName] - Connected entity [Hub/Link Name] does not exist.` - **Fix:** Ensure the `connected_entity` (Hub or Link) is defined in your model. - `ERROR: Link Satellite: [SatName] - Connected entity source reference [SourceURN] for connected link [LinkName] does not exist in the link's entity sources.` - **Fix:** If `connected_entity_source_ref` is used, ensure the URN matches an existing source URN in the parent Link. - `ERROR: Satellite: [SatName] - Business key mapping for source [SourceTuple] must not be empty.` - **Fix:** Define `business_key_mapping` in the Satellite's `entity_source`. - **General Entity Source Issues**: - `ERROR: Object: [ObjectName] - Entity source [SourceURN/Tuple] - Source system configuration URN [ConfigURN] does not exist in source_system_settings.yaml.` - **Fix:** Ensure the `source_system_configuration_urn` points to a valid entry in `source_system_settings.yaml`. - `ERROR: Object: [ObjectName] - Entity source [SourceURN/Tuple] - Business key mapping for target key [TargetBK] must not be empty.` - **Fix:** Ensure all target business keys have source columns mapped to them. - `ERROR: Object: [ObjectName] - Entity source [SourceURN/Tuple] - Lookup mapping for Hub source URN [HubSourceURN] does not exist in the Hub's entity sources.` - **Fix:** If using `lookup_mapping`, ensure the `hub_source_urn` points to a valid source URN in the referenced Hub. ## 5\. Information Schema and Source Column Validation [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html\#5-information-schema-and-source-column-validation "Direct link to 5. Information Schema and Source Column Validation") Checks related to the consistency between your model definitions and the `information_schema.csv`. - `ERROR: Column: [ColumnName] from table: [Schema.Table] is not defined in information_schema.csv.` - **Fix:** Add the missing source column `ColumnName` for table `Schema.Table` to your `information_schema.csv`. - `ERROR: Column: [ColumnName] from table: [Schema.Table] defined in information_schema.csv is not used in any DV object.` - **Fix:** This is often a warning but can be an error. Remove the unused column from `information_schema.csv` or ensure it's correctly mapped in a DV object. - `ERROR: Data type for column [ColumnName] from table [Schema.Table] is not defined in information_schema.csv.` - **Fix:** Provide the data type for the specified column in `information_schema.csv`. ## 6\. CLI Command Errors [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html\#6-cli-command-errors "Direct link to 6. CLI Command Errors") Errors specifically related to running S2V CLI commands ( `validate`, `generate`, `visualize`). - **`s2v validate` / `s2v generate` / `s2v visualize`**: - `ERROR: Invalid value for "-i" / "--input": Path "[Path]" does not exist.` - **Fix:** Ensure the input path provided with the `-i` option is correct and accessible. - `ERROR: Invalid value for "--data-vault-settings-path": Path "[Path]" does not exist.` (and similar for other specific path arguments) - **Fix:** If overriding default file locations, ensure the specified path to the configuration file is correct. - `ERROR: Authentication failed. Please check your credentials or token.` - **Fix:** Ensure your S2V login credentials/token are valid and correctly configured. Run `s2v login`. - `ERROR: Connection to Stream2Vault server failed. Please check your network connection and the server URL.` - **Fix:** Verify network connectivity and that the S2V server URL (if overridden) is correct. * * * This guide should help you decipher most validation messages. Always pay close attention to the file name, object name, and specific property mentioned in the error or warning to quickly locate and fix the issue in your YAML files. - [1\. File and Basic YAML Processing](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html#1-file-and-basic-yaml-processing) - [2\. Data Vault Model Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html#2-data-vault-model-definition) - [3\. Data Vault Object YAML Structure (Schema Validation)](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html#3-data-vault-object-yaml-structure-schema-validation) - [4\. Data Vault Model Logic & Rules](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html#4-data-vault-model-logic--rules) - [5\. Information Schema and Source Column Validation](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html#5-information-schema-and-source-column-validation) - [6\. CLI Command Errors](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/errors/index.html#6-cli-command-errors) ## YAML Errors Guide [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/pyyaml/index.html#__docusaurus_skipToContent_fallback) On this page When working with YAML files in Python, you might run into a few common issues. These errors usually mean that your YAML file doesn't follow the proper structure or syntax rules. Understanding these errors helps you quickly pinpoint and fix problems in your configuration or data files. ## 1\. Syntax Errors [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/pyyaml/index.html\#1-syntax-errors "Direct link to 1. Syntax Errors") This is one of the most frequent errors and it means PyYAML got confused while trying to read the basic building blocks (like individual characters or words) of your YAML. Think of it like a spell checker finding a typo or a grammar error. **What it looks like:** ```codeBlockLines_e6Vv YAMLError: while scanning for the next token found character ' ' that cannot start any token. in "config.yaml", line X, column Y ``` **Common Causes:** - **Incorrect Indentation:** YAML relies heavily on spaces for defining structure. Mixing tabs and spaces, or inconsistent indentation (e.g., some lines indented with 2 spaces, others with 4) will cause this error. - **Missing Colons ( `:`):** In YAML, key-value pairs are separated by a colon (e.g., `key: value`). Forgetting a colon or placing it incorrectly is a common mistake. - **Invalid Characters:** Using characters that don't belong in a specific part of the YAML structure. - **Dashes ( `-`) out of place:** Dashes typically signify list items. If a dash appears where a key-value pair is expected, it can lead to a `YAMLError`. **How to fix it:** - **Check Indentation:** The most common culprit! Ensure you are _only_ using spaces for indentation and that indentation levels are consistent. Many text editors have features to show whitespace characters. - **Verify Syntax:** Carefully review the lines mentioned in the error message (line X, column Y) for any missing colons, misplaced characters, or incorrect use of list dashes. - **Use a YAML Linter/Validator:** Online tools or command-line linters (like `yamllint`) can quickly identify syntax errors before you even run your Python script. ## 2\. Structure Errors [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/pyyaml/index.html\#2--structure-errors "Direct link to 2. Structure Errors") This error occurs when PyYAML understands the basic elements but struggles to interpret the overall structure of your YAML document. It's like your grammar checker understanding individual words but not how they form a coherent sentence or paragraph. **What it looks like:** ```codeBlockLines_e6Vv YAMLError: while parsing a block mapping expected , but found '' in "config.yaml", line X, column Y ``` **Common Causes:** - **Improper Nesting:** Your YAML might have a mix-up in how blocks (mappings or sequences) are nested. For example, a list item ( `-`) might be indented under a key where a value is expected, or vice-versa. - **Inconsistent Block Styles:** While YAML supports different ways to write lists and mappings (block vs. flow style), mixing them incorrectly, especially with indentation, can lead to parsing issues. - **Unexpected End of File:** The YAML might abruptly end while a block is still expected to be open. **How to fix it:** - **Review Block Structure:** Pay close attention to how your dictionaries (key-value pairs) and lists (items starting with `-`) are structured and indented. - **Ensure Proper Termination:** Make sure all blocks are properly closed or have the expected subsequent elements. - **Check for Missing Parent-Child Relationships:** Sometimes, a line might be indented as if it's a child of a parent, but the parent itself is missing or malformed. - [1\. Syntax Errors](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/pyyaml/index.html#1-syntax-errors) - [2\. Structure Errors](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/code-errors/pyyaml/index.html#2--structure-errors) ## Data Vault 2.0 Hub Reference [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html#__docusaurus_skipToContent_fallback) On this page In Data Vault 2.0, a **Hub** represents a core business concept or entity. It contains a distinct list of unique business keys from across the enterprise, acting as an integration point. Hubs are designed to be stable and resilient to changes in source systems. The primary purpose of a Hub is to store these business keys, along with metadata such as load timestamps and record sources, but not the descriptive attributes of the business entity itself (those are stored in Satellites). Below are the properties you can use to define a Hub. ### Hub Properties Overview [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html\#hub-properties-overview "Direct link to Hub Properties Overview") | Property | Type | Description | | --- | --- | --- | | `name` | String | The unique name for the Hub object. This name will be used for the generated table and in references by other objects (e.g., Links, Satellites). Example: `'HUB_MATERIAL'` | | `entity_type` | String | Must be set to `'hub'` to define this object as a Hub. | | `concatenate_business_keys` | Boolean | If `true`, and multiple source columns are mapped to single business key, their values will be concatenated before hashing to form the Hub's hash key. | | `requires_source_business_key` | Boolean | If `true`, each source definition within `entity_sources` _must_ provide a non-empty `source_business_key` value. | | `enable_refresh` | Boolean | if `true`, hub receives new data. Set to `false` if the source contains not-changing (historical) data. See Shared Properties. | | `target_business_key_columns` | List of Strings | A list defining the names of the column(s) that will store the business key(s) in the generated Hub table. Example: `['MATERIAL_ID']`. | | `entity_sources` | List of Maps | Defines the list of sources that feed data into this Hub. Each item in the list represents a distinct source, following the structure outlined in [Entity Sources Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources). | ### Hub-Specific Property Details [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html\#hub-specific-property-details "Direct link to Hub-Specific Property Details") #### `concatenate_business_keys` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html\#concatenate_business_keys "Direct link to concatenate_business_keys") - **Type:** `Boolean` - **Description:** This property determines how the Hub's hash key is generated when multiple source columns are mapped to the Hub's business keys. - If `true`: The values from the multiple source columns (as defined in `business_key_mapping` for a given `entity_source`) are concatenated together (using a delimiter defined in `data_vault_settings.yaml`) before the hash function is applied. This is typically used when a combination of several source attributes forms a single conceptual business key. The `target_business_key_columns` must list a single column name for this concatenated key. - If `false`: Each business key defined in `target_business_key_columns` is treated independently. If multiple target business keys are defined and mapped from source columns, each will result in a separate hash key calculation if not handled by other logic. This is common for Hubs with a single, clearly defined business key from the source or when business keys are sourced independently and are not part of a composite key from a single source instance. #### `requires_source_business_key` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html\#requires_source_business_key "Direct link to requires_source_business_key") - **Type:** `Boolean` - **Description:** This property enforces whether the `source_business_key` field must be populated within each definition in the `entity_sources` list. - If `true`: Each source entry in `entity_sources` _must_ provide a non-empty string value for its `source_business_key` field. This is crucial for multi-master Hubs where the same business key value (e.g., customer ID '123') might originate from different source systems (e.g., 'CRM' and 'Billing'). The `source_business_key` helps distinguish these instances, ensuring the uniqueness of the Hub's hash key when combined with the actual business key. - If `false`: The `source_business_key` field within each `entity_source` definition must be empty or omitted. This is suitable when the business keys are globally unique or when the Hub is not designed to handle multi-master scenarios where the same key value can appear in different source systems with different meanings or contexts. #### `target_business_key_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html\#target_business_key_columns "Direct link to target_business_key_columns") - **Type:** `List of Strings` - **Description:** This property defines the names of the column(s) that will store the business key(s) in the generated Hub table. - If `concatenate_business_keys` is `true` and multiple source columns contribute to a single conceptual key, this list must contain the single target column name for that concatenated key. - If `concatenate_business_keys` is `false` and you list multiple column names here, it implies the Hub represents a composite business key where each part is stored separately. Each of these columns would then need to be mapped from the source(s) in the `business_key_mapping`. ### Simple Example [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html\#simple-example "Direct link to Simple Example") This example defines a simple `HUB_PRODUCT` with a single business key `PRODUCT_SKU` sourced from one table. ```codeBlockLines_e6Vv entity_type: 'hub' name: 'HUB_PRODUCT' concatenate_business_keys: false requires_source_business_key: false # Not a multi-master hub enable_refresh: true target_business_key_columns: - 'PRODUCT_SKU' entity_sources: - urn:s2v:hub_source:inventory_products: entity_source: '(INVENTORY_DB, dbo, Products)' source_filter: "IsActive = 1" source_system_configuration_urn: 'urn:s2v:source_setting:inventory_config' business_key_mapping: - PRODUCT_SKU: - 'SKU' source_business_key: '' # Empty as requires_source_business_key is false ``` ### Comprehensive Hub Example [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html\#comprehensive-hub-example "Direct link to Comprehensive Hub Example") This example defines `HUB_MATERIAL` which integrates material identifiers from three different source contexts or systems, all mapping to a single target business key `MATERIAL_ID`. ```codeBlockLines_e6Vv entity_type: 'hub' name: 'HUB_MATERIAL' concatenate_business_keys: false requires_source_business_key: true # Assuming MATERIAL_ID is NOT globally unique across these sources enable_refresh: true target_business_key_columns: - 'MATERIAL_ID' entity_sources: - urn:s2v:hub_source:src_erp_materials: entity_source: '(ERP_DATA, dbo, MaterialMaster)' source_filter: "TYPE = 'RAW'" source_system_configuration_urn: 'urn:s2v:source_setting:erp_system_config' business_key_mapping: - MATERIAL_ID: - 'MaterialInternalID' source_business_key: 'ERP_DATA' - urn:s2v:hub_source:src_legacy_parts: entity_source: '(LEGACY_SYSTEM, parts_data, PartTable)' source_filter: '' # No filter for this source source_system_configuration_urn: 'urn:s2v:source_setting:legacy_system_config' business_key_mapping: - MATERIAL_ID: - 'PartGlobalIdentifier' source_business_key: 'LEGACY_SYSTEM' - urn:s2v:hub_source:src_webshop_catalog: entity_source: '(WEBSHOP_DB, catalog, ProductItems)' source_filter: "IsPublished = true" source_system_configuration_urn: 'urn:s2v:source_setting:webshop_config' business_key_mapping: - MATERIAL_ID: - 'WebshopProductID' source_business_key: 'WEBSHOP_DB' ``` - [Hub Properties Overview](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html#hub-properties-overview) - [Hub-Specific Property Details](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html#hub-specific-property-details) - [Simple Example](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html#simple-example) - [Comprehensive Hub Example](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub/index.html#comprehensive-hub-example) ## Data Vault Links [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link/index.html#__docusaurus_skipToContent_fallback) Links in Data Vault are specialized many-to-many relationship tables that connect two or more hubs, representing a unique business relationship or event. They primarily store the business keys of the connected entities and a load date timestamp, enabling the historical tracking of when relationships began and evolved without storing descriptive attributes. S2V supports two types of links: - **Regular Links** capture relationships between entities. They are designed to show a distinct list of all relationships that have ever existed between entities. See [Link Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes). - **Non-Historized Links** (often called transactional links) are used for high-volume, immutable event or transactional data where every occurrence of a relationship, even if the same entities are involved, is unique and important. They typically include additional business keys to define a unique grain and do not usually track changes to their descriptive attributes, as the data is considered static once recorded. See [Non-Historized Link Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes). ## Data Vault Link Attributes [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html#__docusaurus_skipToContent_fallback) On this page Regular links in Data Vault capture the relationships between hubs, which represent core business entities. A link typically connects two or more hubs and can derive information from one or more source systems. Let's explore the structure of a link and its fundamental components with practical examples. ### Link Properties [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html\#link-properties "Direct link to Link Properties") | Property Name | Type | Description | | --- | --- | --- | | `name` | `String` | The unique name for the Link object. This name will be used for the generated table and in references by other objects. Example: `'SALES_ORDER'` | | `entity_type` | `String` | Must be set to `'link'` to define this object as a Link. | | `enable_refresh` | Boolean | if `true`, link receives new data. Set to `false` if the source contains not-changing (historical) data. See Shared Properties. | | `connected_hubs` | `List of Key-Value pairs` | Defines the list of Hubs that this Link connects, specifying the relationship between them. Each item maps an alias to a Hub name. | | `entity_sources` | `List of Maps` | Defines the list of sources that feed data into this Link. Each item in the list represents a distinct source, following the structure outlined in [Entity Sources Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources). | ### Link Specific Properties [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html\#link-specific-properties "Direct link to Link Specific Properties") #### `connected_hubs` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html\#connected_hubs "Direct link to connected_hubs") - **Type:** `List of Key-Value pairs` - **Description:** This property defines the hubs a link connects, specified as a list of key-value pairs. - **Key**: Represents the hub's alias, or the specific name of the relationship. This is necessary because a link can connect to the same hub multiple times, requiring unique aliases to distinguish each connection. The hub alias must be unique within the connected\_hubs list. - **Value**: Represents the actual name of the hub, which must refer to an existing hub object. **Example: Link connecting to each hub exactly once** ```codeBlockLines_e6Vv connected_hubs: - HUB_CUSTOMER: 'HUB_CUSTOMER' # : - HUB_MATERIAL: 'HUB_MATERIAL' ``` **Example: Link connecting to one hub multiple times** ```codeBlockLines_e6Vv connected_hubs: - HUB_CUSTOMER_BILL_TO: 'HUB_CUSTOMER' - HUB_CUSTOMER_DELIVER_TO: 'HUB_CUSTOMER' - HUB_MATERIAL: 'HUB_MATERIAL' ``` In this scenario, unique aliases `(HUB_CUSTOMER_BILL_TO, HUB_CUSTOMER_DELIVER_TO)` are used to distinguish different types of customer relationships, even though they all point to the `HUB_CUSTOMER` hub. warning **A single link cannot connect multiple times to the same source table**. If a link's source contains various relationships, consider extending the `connected_hubs` definition. For more details, please refer to the [FAQ: "How do I model link sources?](https://s2v.reeeliance.com/docs/faq/overview)" ### Simple Example [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html\#simple-example "Direct link to Simple Example") This example illustrates a **SALES\_ORDER** link, which establishes a relationship between the **HUB\_CUSTOMER** and **HUB\_MATERIAL** hubs. The link's data originates from a single source, `urn:s2v:link_source:src_1`. basic\_regular\_link.yaml ```codeBlockLines_e6Vv codeBlockLinesWithNumbering_o6Pm # 1. Defines name, entity type and refresh mode name: 'SALES_ORDER' entity_type: 'link' enable_refresh: true # 2. Defines the hubs that this link connects connected_hubs: - HUB_CUSTOMER: 'HUB_CUSTOMER' - HUB_MATERIAL: 'HUB_MATERIAL' # 3. Defines the list of sources that feeds the link entity_sources: - urn:s2v:link_source:src_1: entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)' source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:SAP' connected_hub_relations: - HUB_CUSTOMER: business_key_mapping: - CUSTOMER_BK: - 'KUNNR' source_business_key: '' - HUB_MATERIAL: business_key_mapping: - MATERIAL_BK: - 'MATNR' source_business_key: '' ``` ### Comprehensive Examples [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html\#comprehensive-examples "Direct link to Comprehensive Examples") **Multi-Source Link Example** This example demonstrates a link that combines data from two distinct source systems: `SAP_SD` and `SFDC_SALES`. Both sources contribute to the same `SALES_ORDER` link, connecting `HUB_CUSTOMER` and `HUB_MATERIAL`. ```codeBlockLines_e6Vv name: 'SALES_ORDER' entity_type: 'link' enable_refresh: true connected_hubs: - HUB_CUSTOMER: 'HUB_CUSTOMER' - HUB_MATERIAL: 'HUB_MATERIAL' entity_sources: # Source 1: SAP Sales Data - urn:s2v:link_source:SAP_SD: entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)' source_filter: "ORDER_TYPE = 'ZOR'" # Example filter for specific order types use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:SAP' connected_hub_relations: - HUB_CUSTOMER: business_key_mapping: - CUSTOMER_BK: - 'KUNNR' source_business_key: '' - HUB_MATERIAL: business_key_mapping: - MATERIAL_BK: - 'MATNR' source_business_key: '' # Source 2: Salesforce Sales Data - urn:s2v:link_source:SFDC_SALES: entity_source: '(SOURCE_SFDC, OPPORTUNITY_LINE_ITEMS)' source_filter: "IS_WON = TRUE" # Example filter for won opportunities use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:SFDC' connected_hub_relations: - HUB_CUSTOMER: business_key_mapping: - CUSTOMER_BK: - 'ACCOUNT_ID' source_business_key: '' - HUB_MATERIAL: business_key_mapping: - MATERIAL_BK: - 'PRODUCT_CODE' source_business_key: '' ``` **Lookup mapping example** This example demonstrates a scenario where the `SALES_ORDER` link needs to resolve the business key for `HUB_MATERIAL` via a lookup. Instead of having a direct source column for `MATERIAL_BK` in its own source ( `SD_SALES_ORDERS`), it uses `MATERIAL_CODE` to join with the source of `HUB_MATERIAL` (identified by `hub_source_urn: 'urn:s2v:hub_source:SAP_SE'`) on `MATERIAL_ID` to obtain the correct business key. This is a common pattern when the link's immediate source data contains foreign keys or codes that need to be resolved against a master data source (the Hub's source in this case). ```codeBlockLines_e6Vv name: 'SALES_ORDER' entity_type: 'link' enable_refresh: true connected_hubs: - HUB_CUSTOMER: 'HUB_CUSTOMER' - HUB_MATERIAL: 'HUB_MATERIAL' entity_sources: - urn:s2v:link_source:SAP_SD: entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)' source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:SAP' connected_hub_relations: - HUB_CUSTOMER: business_key_mapping: - CUSTOMER_BK: - 'KUNNR' source_business_key: '' - HUB_MATERIAL: lookup_mapping: hub_source_urn: 'urn:s2v:hub_source:SAP_SE' entity_source_columns: - 'MATERIAL_CODE' hub_source_columns: - 'MATERIAL_ID' ``` - [Link Properties](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html#link-properties) - [Link Specific Properties](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html#link-specific-properties) - [Simple Example](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html#simple-example) - [Comprehensive Examples](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes/index.html#comprehensive-examples) ## Non-Historized Links Overview [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html#__docusaurus_skipToContent_fallback) On this page Non-Historized Links (NHLs) are a variation of standard Links in Data Vault. They are used to model relationships or events where the history of the relationship itself is not tracked in the same way as descriptive data in Satellites. Instead, NHLs typically store the current state of a relationship or event attributes directly within the Link table. ### Non-Historized Link Properties Overview [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html\#non-historized-link-properties-overview "Direct link to Non-Historized Link Properties Overview") | Property Name | Type | Description | | --- | --- | --- | | `name` | `String` | The unique name for the Non-Historized Link object. This name will be used for the generated table. Example: `'USER_SESSION_EVENT'` | | `entity_type` | `String` | Must be set to `'non_historized_link'` to define this object as a Non-Historized Link. | | `enable_refresh` | Boolean | if `true`, non-historized link receives new data. Set to `false` if the source contains not-changing (historical) data. See Shared Properties. | | `connected_hubs` | `List of Key-Value pairs` | Defines the list of Hubs that this NHL connects, specifying the relationship between them. Each item maps an alias to a Hub name. See Connected Hubs in Regular Links. | | `entity_source` | `Map` | Defines the single source that feed data into this NHL. See [Entity Sources Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources). | | `non_historized_columns` | `List of Strings` | A list of column names that will be included directly in the NHL table to store non-historized attributes of the relationship or event. | | `historized_columns` | `List of Strings` | A list of column names that, while part of the NHL's definition, are typically related to load metadata or event timestamps rather than descriptive attributes. | ### Non-Historized Link Specific Property Details [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html\#non-historized-link-specific-property-details "Direct link to Non-Historized Link Specific Property Details") #### `non_historized_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html\#non_historized_columns "Direct link to non_historized_columns") - **Type:** `List of Strings` - **Description:** This property lists the names of the columns that will be created directly in the Non-Historized Link table to store attributes that describe the event or relationship. These columns **are NOT included** in the hash difference calculation. #### `historized_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html\#historized_columns "Direct link to historized_columns") - **Type:** `List of Strings` - **Description:** This property lists the names of the columns that will be created directly in the Non-Historized Link table to store attributes that describe the event or relationship. These columns **are included** in the hash difference calculation. #### `connected_hubs` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html\#connected_hubs "Direct link to connected_hubs") - **Type:** `List of Key-Value pairs` - **Description:** This property defines the hubs a non-historized link connects, specified as a list of key-value pairs. - **Key**: Represents the hub's alias, or the specific name of the relationship. This is necessary because a link can connect to the same hub multiple times, requiring unique aliases to distinguish each connection. The hub alias must be unique within the connected\_hubs list. - **Value**: Represents the actual name of the hub, which must refer to an existing hub definition. **Example: Link connecting to each hub exactly once** ```codeBlockLines_e6Vv connected_hubs: - HUB_CUSTOMER: 'HUB_CUSTOMER' # : - HUB_MATERIAL: 'HUB_MATERIAL' ``` **Example: Link connecting to one hub multiple times** ```codeBlockLines_e6Vv connected_hubs: - HUB_CUSTOMER_BILL_TO: 'HUB_CUSTOMER' - HUB_CUSTOMER_DELIVER_TO: 'HUB_CUSTOMER' - HUB_MATERIAL: 'HUB_MATERIAL' ``` In this scenario, unique aliases `(HUB_CUSTOMER_BILL_TO, HUB_CUSTOMER_DELIVER_TO)` are used to distinguish different types of customer relationships, even though they all point to the HUB\_CUSTOMER hub. ### Example [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html\#example "Direct link to Example") This example defines a Non-Historized Link (NHL) named `EVENT` that connects `HUB_EVENT` and `HUB_USER`. It captures event details from a source table `CLICK_EVENTS`, including non-historized attributes like `EVENT_ID` and `USER_ID`, and historized columns for metadata and event timestamps. basic\_non\_historized\_link.yaml ```codeBlockLines_e6Vv codeBlockLinesWithNumbering_o6Pm name: 'EVENT' entity_type: 'non_historized_link' enable_refresh: true connected_hubs: - HUB_EVENT: 'HUB_EVENT' - HUB_USER: 'HUB_USER' entity_source: urn:s2v:link_source:src_1: entity_source: '(SOURCE_DATA, CLICK_EVENTS)' source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:yaml_interface' connected_hub_relations: - HUB_EVENT: business_key_mapping: - EVENT_BK: - 'EVENT_ID' source_business_key: '' - HUB_USER: business_key_mapping: - USER_BK: - 'USER_ID' source_business_key: '' non_historized_columns: - 'EVENT_ID' - 'USER_ID' historized_columns: - 'CDC_TIMESTAMP' - 'EVENT_TIMESTAMP' - 'DESTINATION' ``` - [Non-Historized Link Properties Overview](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html#non-historized-link-properties-overview) - [Non-Historized Link Specific Property Details](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html#non-historized-link-specific-property-details) - [Example](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes/index.html#example) ## Data Vault Model Reference [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/overview/index.html#__docusaurus_skipToContent_fallback) On this page The Model Reference section provides detailed specifications for defining your Data Vault models using Stream2Vault's YAML-based configuration. Each page outlines the structure, properties, and usage for different Data Vault object types and related configurations. ### Key Considerations for S2V YAML Models: [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/overview/index.html\#key-considerations-for-s2v-yaml-models "Direct link to Key Considerations for S2V YAML Models:") Before diving into specific Data Vault objects like Hubs or Links, it's highly recommended to familiarize yourself with the common YAML structures and properties that are shared across multiple entity types. These are detailed in the [YAML Properties](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/shared_properties) section, covering: - [Shared Properties](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/shared_properties): Common attributes like `name`, `entity_type`, and `enable_refresh`. - [Entity Sources Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources): The standard way to define data sources. - [Business Key & Lookup Mappings](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping): How source columns are mapped to target business keys or used for lookups. Understanding these foundational elements first will make the definitions for individual Data Vault objects much clearer. **General Principles:** - **Structured Templates:** Each Data Vault object type (e.g., Hub, Link, Satellite) has a specific YAML template. These templates are designed to be intuitive, generally guiding you from high-level object definition (like its name and type) down to specific source mappings (how data is acquired for the object). - **Defined Properties:** Only properties specified in the documentation for each object type are permissible within its YAML definition. Stream2Vault validates your model files against these defined structures to ensure correctness. - **Building on Shared Concepts:** The definitions for Hubs, Links, Satellites, and References will refer back to and utilize the shared properties and structures mentioned above. Use this Model Reference section as your primary resource for understanding how to construct valid and effective S2V models. It will help you understand the expected YAML structure for: - [Hubs](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub) - [Links](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link) - [Satellites](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite) - [References](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference) - [Key Considerations for S2V YAML Models:](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/overview/index.html#key-considerations-for-s2v-yaml-models) ## Reference Tables Overview [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html#__docusaurus_skipToContent_fallback) On this page Reference tables store static or slowly-changing reference data. Examples include lists of country codes, currency codes, or product categories. These tables are often useful for joining other tables and ensuring consistency in data values. ### Reference Properties Overview [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#reference-properties-overview "Direct link to Reference Properties Overview") | Property | Type | Description | | --- | --- | --- | | `name` | `String` | The unique name for the Reference object. This name will be used for the generated table. Example: `'REF_COUNTRY_CODES'` | | `entity_type` | `String` | Must be set to `'reference'` to define this object as a Reference table. | | `enable_refresh` | Boolean | if `true`, reference receives new data. Set to `false` if the source contains not-changing (historical) data. See Shared Properties. | | `concatenate_business_keys` | `Boolean` | If `true`, and multiple source columns are mapped to a single business key, their values will be concatenated before hashing to form the Reference table's primary key. | | `target_business_key_columns` | `List of Strings` | A list defining the names of the column(s) that will store the business key(s) in the generated Reference table. Example: `['COUNTRY_CODE_KEY']` | | `ordering_columns` | `List of Strings` | Specifies columns from the source used to order records when selecting the most recent or relevant row for a given business key, especially if duplicates exist. | | `skip_hashdiff_comparison` | `Boolean` | Default: `false`. If `true`, S2V will not generate or compare a hashdiff for the descriptive attributes, typically used if the table is always fully refreshed. | | `entity_source` | `Map` | Defines the single source that feeds data into this Reference table. The key is the source tuple, and the value contains source details. See [Entity Sources Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources). | | `historized_columns` | `List of Strings` | A list of column names from the source that should be included in the Reference table and are considered part of its descriptive, potentially versioned, content. | | `non_historized_columns` | `List of Strings` | A list of column names from the source that should be included in the Reference table but are not considered for historization or hashdiff comparison. | | `enable_history` | `Boolean` | TBD | ### Reference-Specific Property Details [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#reference-specific-property-details "Direct link to Reference-Specific Property Details") #### `concatenate_business_keys` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#concatenate_business_keys "Direct link to concatenate_business_keys") - **Type:** `Boolean` - **Description:** This property determines how the Reference's hash key is generated when multiple source columns are mapped to the Reference's business keys. - If `true`: The values from the multiple source columns (as defined in `business_key_mapping` for a given `entity_source`) are concatenated together (using a delimiter defined in `data_vault_settings.yaml`) before the hash function is applied. This is typically used when a combination of several source attributes forms a single conceptual business key. The `target_business_key_columns` must list a single column name for this concatenated key. - If `false`: Each business key defined in `target_business_key_columns` is treated independently. If multiple target business keys are defined and mapped from source columns, each will result in a separate hash key calculation if not handled by other logic. This is common for References with a single, clearly defined business key from the source or when business keys are sourced independently and are not part of a composite key from a single source instance. #### `target_business_key_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#target_business_key_columns "Direct link to target_business_key_columns") - **Type:** `List of Strings` - **Description:** This property defines the names of the column(s) that will store the business key(s) in the generated Reference table. - If `concatenate_business_keys` is `true` and multiple source columns contribute to a single conceptual key, this list must contain the single target column name for that concatenated key. - If `concatenate_business_keys` is `false` and you list multiple column names here, it implies the Reference represents a composite business key where each part is stored separately. Each of these columns would then need to be mapped from the source(s) in the `business_key_mapping`. #### `ordering_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#ordering_columns "Direct link to ordering_columns") - **Type:** `List of Strings` - **Description:** This property is used to specify one or more columns from the source table that determine the order of records. When multiple records from the source might map to the same business key, `ordering_columns` (often a timestamp or sequence number) are used to pick the most relevant record for loading. #### `skip_hashdiff_comparison` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#skip_hashdiff_comparison "Direct link to skip_hashdiff_comparison") - **Type:** `Boolean` - **Description:** - If `false` (default): S2V will compute a hashdiff for the columns listed in `historized_columns`. This is useful if you want to track changes to the reference data values over time, similar to a Satellite, even if the Reference table itself is refreshed. - If `true`: S2V will not compute or use a hashdiff. This is suitable for Reference tables that are always fully truncated and reloaded from the source, or where changes to descriptive attributes are not tracked. #### `historized_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#historized_columns "Direct link to historized_columns") - **Type:** `List of Strings` - **Description:** Lists the names of the columns from the source table that contain the descriptive attributes of the entity. These are the columns whose values are tracked for changes over time. Changes in these columns (when `skip_hashdiff_comparison` is `false`) will result in a new record (version) in the Reference. #### `non_historized_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#non_historized_columns "Direct link to non_historized_columns") - **Type:** `List of Strings` - **Description:** Lists the names of columns from the source table that should be included in the Reference table but are not considered for hashdiff calculation (i.e., changes to these columns won't trigger a new version). These might include metadata, operational flags, or attributes that are descriptive but not part of the core historical tracking. #### `enable_history` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#enable_history "Direct link to enable_history") - **Type:** `Boolean` - **Description:** This property controls whether the Reference table stores all historical versions of a record or only the latest version. - If `true`: The Reference table will retain all historical records for each business key. This means that when a change is detected (based on `historized_columns` and if `skip_hashdiff_comparison` is `false`), a new record is inserted, and previous versions are kept. This allows for tracking the full history of changes to the reference data. - If `false` (default): The Reference table will only store the most current version of each record. When a change is detected, the existing record for that business key is updated (or a new one inserted if it's a new key), effectively overwriting the previous state. ### Example [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html\#example "Direct link to Example") This example defines a `REF_SALESPERSON` Reference table. It concatenates `SALESPERSON_ID` and `SALESPERSON_NAME` from the source to form its business key `SALESPERSON_KEY`. It specifies `CDC_FLAG` for ordering and includes `SALESPERSON_EMAIL`, `COUNTRY`, `SALESPERSON_DESCRIPTION` as a historized column, meaning changes to the email would be tracked if `skip_hashdiff_comparison` were false. ```codeBlockLines_e6Vv entity_type: 'reference' name: 'REF_SALESPERSON' ordering_columns: ['CDC_FLAG'] skip_hashdiff_comparison: false concatenate_business_keys: true enable_refresh: true target_business_key_columns: - 'SALESPERSON_KEY' entity_source: (SOURCE_DATA, SALESPERSON): source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: urn:s2v:source_setting:yaml_interface business_key_mapping: - SALESPERSON_KEY: - 'SALESPERSON_ID' - 'SALESPERSON_NAME' historized_columns: - 'SALESPERSON_DESCRIPTION' - 'COUNTRY' - 'SALESPERSON_EMAIL' non_historized_columns: ``` - [Reference Properties Overview](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html#reference-properties-overview) - [Reference-Specific Property Details](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html#reference-specific-property-details) - [Example](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference/index.html#example) ## Hub Satellite Reference [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html#__docusaurus_skipToContent_fallback) On this page In Data Vault 2.0, a **Satellite** stores descriptive attributes for a core business concept (Hub) or a relationship (Link). Satellites capture historical changes to these attributes over time, providing a full audit trail. Each Satellite is attached to a single Hub or Link. ### Hub Satellite Properties Overview [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#hub-satellite-properties-overview "Direct link to Hub Satellite Properties Overview") | Property | Type | Description | | --- | --- | --- | | `name` | `String` | The unique name for the Satellite object. This name will be used for the generated table. Example: `'SAT_CUSTOMER_DETAILS'` | | `entity_type` | `String` | Must be set to `'hubsat'` for a Satellite attached to a Hub. | | `enable_refresh` | Boolean | if `true`, satellite receives new data. Set to `false` if the source contains not-changing (historical) data. See Shared Properties. | | `connected_entity` | `String` | The name of the Hub this Satellite is attached to. Example: `'HUB_CUSTOMER'`. | | `ordering_columns` | `List of Strings` | Specifies columns from the source used to order records when selecting the most recent or relevant row for a given business key, especially if duplicates exist. | | `skip_hashdiff_comparison` | `Boolean` | Default: `false`. If `true`, S2V will not generate or compare a hashdiff for the descriptive attributes, meaning all incoming records are treated as new versions. | | `entity_source` | `Map` | Defines the single source that feeds data into this Satellite. The key is the source tuple, and the value contains source details. See [Entity Sources Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources). | | `historized_columns` | `List of Strings` | A list of column names from the source that should be included in the Satellite and are considered part of its descriptive, versioned content. | | `non_historized_columns` | `List of Strings` | A list of column names from the source that should be included in the Satellite but are not considered for historization or hashdiff comparison. | ### Hub Satellite-Specific Property Details [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#hub-satellite-specific-property-details "Direct link to Hub Satellite-Specific Property Details") #### `connected_entity` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#connected_entity "Direct link to connected_entity") - **Type:** `String` - **Description:** Specifies the name of the parent Hub to which this Satellite is attached. The Satellite will inherit the business key structure from this parent entity. #### `ordering_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#ordering_columns "Direct link to ordering_columns") - **Type:** `List of Strings` - **Description:** This property is used to specify one or more columns from the source table that determine the order of records. When multiple records from the source might map to the same business key, `ordering_columns` (often a timestamp or sequence number) are used to pick the most relevant record for loading. #### `skip_hashdiff_comparison` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#skip_hashdiff_comparison "Direct link to skip_hashdiff_comparison") - **Type:** `Boolean` - **Description:** - If `false` (default): S2V will compute a hashdiff for the columns listed in `historized_columns`. A new record is inserted into the Satellite only if the hashdiff changes or if the business key is new. - If `true`: S2V will not compute or use a hashdiff. Every incoming record from the source (after filtering and business key mapping) will result in a new entry in the Satellite, effectively creating a new version regardless of whether the descriptive attributes have changed. This can be useful for logging all source system changes or when the source guarantees unique records per load. #### `historized_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#historized_columns "Direct link to historized_columns") - **Type:** `List of Strings` - **Description:** Lists the names of the columns from the source table that contain the descriptive attributes of the entity. These are the columns whose values are tracked for changes over time. Changes in these columns (when `skip_hashdiff_comparison` is `false`) will result in a new record (version) in the Satellite. #### `non_historized_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#non_historized_columns "Direct link to non_historized_columns") - **Type:** `List of Strings` - **Description:** Lists the names of columns from the source table that should be included in the Satellite table but are not considered for hashdiff calculation (i.e., changes to these columns won't trigger a new version). These might include metadata, operational flags, or attributes that are descriptive but not part of the core historical tracking. ### Simple Example [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#simple-example "Direct link to Simple Example") This example defines a basic Hub Satellite `SAT_CUSTOMER_DETAILS` attached to `HUB_CUSTOMER`. It sources customer descriptive attributes like `CDC_FLAG` (historized) and `CUSTOMER_NAME` (non-historized) from a single source table `CUSTOMER`. ```codeBlockLines_e6Vv name: 'SAT_CUSTOMER_DETAILS' entity_type: 'hubsat' ordering_columns: skip_hashdiff_comparison: false enable_refresh: true connected_entity: 'HUB_CUSTOMER' entity_source: (SOURCE_DATA, CUSTOMER): source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:SAP_SE' business_key_mapping: - CUSTOMER_ID: - 'CUSTOMER_ID' source_business_key: '' historized_columns: - 'CDC_FLAG' non_historized_columns: - 'CUSTOMER_NAME' ``` ### Comprehensive Example [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html\#comprehensive-example "Direct link to Comprehensive Example") This example showcases a Hub Satellite `SATELLITE_ONLY_CUSTOMER_DETAILS` also attached to `HUB_CUSTOMER`. It demonstrates using a lookup\_mapping to resolve the business key. The satellite sources descriptive attributes from a local ERP table `CUSTOMER_DATA`, looking up the `CUSTOMER_PK` from a master data source defined in HUB\_CUSTOMER's entity\_sources (referenced by `hub_source_urn`). ```codeBlockLines_e6Vv name: 'SATELLITE_ONLY_CUSTOMER_DETAILS' entity_type: 'hubsat' ordering_columns: skip_hashdiff_comparison: false enable_refresh: true connected_entity: 'HUB_CUSTOMER' entity_source: (LOCAL_ERP, CUSTOMER_DATA): source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:GLOBAL_ERP' lookup_mapping: hub_source_urn: urn:s2v:hub_source:master_data_customer entity_source_join_columns: - CUSTOMER_FK hub_source_join_columns: - CUSTOMER_PK historized_columns: - 'CDC_FLAG' non_historized_columns: - 'CUSTOMER_NAME' ``` - [Hub Satellite Properties Overview](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html#hub-satellite-properties-overview) - [Hub Satellite-Specific Property Details](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html#hub-satellite-specific-property-details) - [Simple Example](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html#simple-example) - [Comprehensive Example](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite/index.html#comprehensive-example) ## Link Satellite Overview [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html#__docusaurus_skipToContent_fallback) On this page A **Link Satellite** functions similarly to a Hub Satellite but is attached to a Link instead of a Hub. It stores descriptive attributes related to the relationship or transaction captured by the Link, tracking changes to these attributes over time. ### Link Satellite Properties Overview [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#link-satellite-properties-overview "Direct link to Link Satellite Properties Overview") Link Satellites share most properties with Hub Satellites. The key difference is the `entity_type` and the potential use of `connected_entity_source_ref`. | Property | Type | Description | | --- | --- | --- | | `name` | `String` | The unique name for the Link Satellite object. Example: `'LSAT_ORDER_ITEM_DETAILS'` | | `entity_type` | `String` | Must be set to `'linksat'` for a Satellite attached to a Link. | | `enable_refresh` | `Boolean` | Default: `true`. Indicates whether the Satellite's data can be refreshed. See Shared Properties. | | `connected_entity` | `String` | The name of the Link this Satellite is attached to. Example: `'LINK_ORDER_ITEM'`. | | `connected_entity_source_ref` | `String` | A URN referencing a specific `entity_source` definition within the connected Link. Example: `'urn:s2v:link_source:src_erp_orders'` | | `ordering_columns` | `List of Strings` | Specifies columns from the source used to order records when selecting the most recent or relevant row for a given business key, especially if duplicates exist. | | `skip_hashdiff_comparison` | `Boolean` | Default: `false`. If `true`, S2V will not generate or compare a hashdiff for the descriptive attributes. | | `historized_columns` | `List of Strings` | A list of column names from the source that should be included in the Satellite and are considered part of its descriptive, versioned content. | | `non_historized_columns` | `List of Strings` | A list of column names from the source that should be included in the Satellite but are not considered for historization or hashdiff comparison. | ### Link Satellite-Specific Property Details [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#link-satellite-specific-property-details "Direct link to Link Satellite-Specific Property Details") Refer to the Satellite documentation for details on common properties like `ordering_columns`, `skip_hashdiff_comparison`, `historized_columns`, `non_historized_columns`, and the structure of `entity_source`. #### `connected_entity` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#connected_entity "Direct link to connected_entity") - **Type:** `String` - **Description:** Specifies the name of the parent Link to which this Satellite is attached. The Satellite will inherit the business key structure from this parent entity. #### `connected_entity_source_ref` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#connected_entity_source_ref "Direct link to connected_entity_source_ref") - **Type:** `String` - **Description:** This property is **only used for Link Satellites ( `entity_type: 'linksat'`)**. It is required if the connected Link is fed by multiple `entity_sources`. The URN provided here must match one of the URN keys defined in the `entity_sources` list of the parent Link. This tells S2V which specific source context within the Link this Satellite's attributes are derived from. #### `ordering_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#ordering_columns "Direct link to ordering_columns") - **Type:** `List of Strings` - **Description:** This property is used to specify one or more columns from the source table that determine the order of records. When multiple records from the source might map to the same business key, `ordering_columns` (often a timestamp or sequence number) are used to pick the most relevant record for loading. #### `skip_hashdiff_comparison` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#skip_hashdiff_comparison "Direct link to skip_hashdiff_comparison") - **Type:** `Boolean` - **Description:** - If `false` (default): S2V will compute a hashdiff for the columns listed in `historized_columns`. A new record is inserted into the Satellite only if the hashdiff changes or if the business key is new. - If `true`: S2V will not compute or use a hashdiff. Every incoming record from the source (after filtering and business key mapping) will result in a new entry in the Satellite, effectively creating a new version regardless of whether the descriptive attributes have changed. This can be useful for logging all source system changes or when the source guarantees unique records per load. #### `historized_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#historized_columns "Direct link to historized_columns") - **Type:** `List of Strings` - **Description:** Lists the names of the columns from the source table that contain the descriptive attributes of the entity. These are the columns whose values are tracked for changes over time. Changes in these columns (when `skip_hashdiff_comparison` is `false`) will result in a new record (version) in the Satellite. #### `non_historized_columns` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#non_historized_columns "Direct link to non_historized_columns") - **Type:** `List of Strings` - **Description:** Lists the names of columns from the source table that should be included in the Satellite table but are not considered for hashdiff calculation (i.e., changes to these columns won't trigger a new version). These might include metadata, operational flags, or attributes that are descriptive but not part of the core historical tracking. ### Example [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html\#example "Direct link to Example") This example defines a Link Satellite `LSAT_ORDER_ITEM_DETAILS` attached to `LINK_ORDER_ITEM`. It sources descriptive attributes like `QUANTITY`, `UNIT_PRICE`, `DISCOUNT_PERCENTAGE`, and `LINE_STATUS` from the `OrderLineItems` table, using `LINE_MODIFICATION_DATE` for ordering. It also includes `INTERNAL_PROCESSING_FLAG` as a non-historized attribute. ```codeBlockLines_e6Vv name: 'LSAT_ORDER_ITEM_DETAILS' entity_type: 'linksat' # Attaching to a Link enable_refresh: true connected_entity: 'LINK_ORDER_ITEM' # Name of the parent Link connected_entity_source_ref: 'urn:s2v:link_source:erp_order_lines' # A URN referencing a specific `entity_source` definition within the connected Link ordering_columns: ['LINE_MODIFICATION_DATE'] skip_hashdiff_comparison: false historized_columns: - 'QUANTITY' - 'UNIT_PRICE' - 'DISCOUNT_PERCENTAGE' - 'LINE_STATUS' non_historized_columns: - 'INTERNAL_PROCESSING_FLAG' ``` - [Link Satellite Properties Overview](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html#link-satellite-properties-overview) - [Link Satellite-Specific Property Details](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html#link-satellite-specific-property-details) - [Example](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite/index.html#example) ## Business Key Mapping Guide [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html#__docusaurus_skipToContent_fallback) On this page This document explains how business keys are defined in various data vault objects. Business keys are the unique identifiers for your core business concepts and are crucial for the integrity and functionality of your Data Vault. ## Business Key Mapping Types [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html\#business-key-mapping-types "Direct link to Business Key Mapping Types") There are two primary methods for defining business key mappings, depending on the availability of the business key columns directly in your source data: 1. `business_key_mapping`: Used when the source columns that form the business key are **directly available** in the source system feeding the Data Vault object. 2. `lookup_mapping`: Used when the business key source columns are **not directly available** in the source. In this scenario, a join (or "lookup") operation is required with other related source data (typically from a hub's source) to obtain the necessary business key values. ### Usage by Object Type [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html\#usage-by-object-type "Direct link to Usage by Object Type") The applicability of these mapping types varies across Data Vault object types: **Strictly** `business_key_mapping`: Used exclusively in **Hubs** and **Reference** Tables. For these objects, business keys are always derived directly from their primary source. **Both Mapping Types** ( `business_key_mapping` and `lookup_mapping`): Applicable in **Hub Satellites** (if the satellite's source doesn't directly feed the associated hub, so called satellite only), **Links**, and **Non-Historized Links**. These objects might need to look up business keys from related hubs if they are not directly present in their own source. ## Business key mapping ( `business_key_mapping`) [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html\#business-key-mapping-business_key_mapping "Direct link to business-key-mapping-business_key_mapping") **Key Considerations:** - **`source_business_key`**: This property is used to differentiate identical business key values originating from distinct source systems, addressing **multi-master scenarios**. It's typically part of the business key formation when you're integrating data from multiple systems where the same natural key might exist in different contexts. The property **must be used only together with** `business_key_mapping`. - **Reference Tables Exception**: For References, the `source_business_key` property is omitted from the `business_key_mapping`. This is because reference tables typically do not integrate data from multiple, conflicting sources in a multi-master fashion, thus removing the need to distinguish identical keys by source. ### Example: Direct Business Key Mapping [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html\#example-direct-business-key-mapping "Direct link to Example: Direct Business Key Mapping") The example below illustrates mapping a source column ( `MATERIAL_PK`) directly to a business key ( `MATERIAL_ID`). ```codeBlockLines_e6Vv entity_sources: - urn:s2v:hub_source:src_1: entity_source: '(SOURCE_DATA, MATERIAL)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:yaml_interface' business_key_mapping: - MATERIAL_ID: - 'MATERIAL_PK' source_business_key: '' # Empty if not a multi-master scenario or not needed ``` Hub might have multiple business keys same as it can map multiple columns into one or more business keys. two examples below maps the same source columns towards one ( `concatenate_business_keys` set to `true`) and multiple ( `concatenate_business_keys` set to `false`) business keys. #### Example: Concatenating Multiple Source Columns into One Business Key [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html\#example-concatenating-multiple-source-columns-into-one-business-key "Direct link to Example: Concatenating Multiple Source Columns into One Business Key") A Hub can derive its business key from multiple source columns, potentially concatenating them into a single business key. In the example below, `SALES_ORDER_ID` is formed by concatenating `VBELN` and `POSNR` from the `VBAP` source, assuming `concatenate_business_keys` is set to `true` for the associated hub. ```codeBlockLines_e6Vv entity_sources: - urn:s2v:hub_source:SAP_SE: entity_source: '(SAP_SE, VBAP)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - SALES_ORDER_ID: - 'VBELN' - 'POSNR' source_business_key: '' ``` #### Example: Mapping Multiple Source Columns to Multiple Business Keys [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html\#example-mapping-multiple-source-columns-to-multiple-business-keys "Direct link to Example: Mapping Multiple Source Columns to Multiple Business Keys") Alternatively, if concatenate\_business\_keys is set to false, multiple source columns can be mapped to distinct business keys within the same object. This is useful when a single source record contains identifiers for different entities. ```codeBlockLines_e6Vv entity_sources: - urn:s2v:hub_source:SAP_SE: entity_source: '(SAP_SE, VBAP)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - SALES_ORDER_HEADER: - 'VBELN' - SALES_ORDER_ITEM: - 'POSNR' source_business_key: '' ``` tip When defining business key mappings for Links or Satellites Only, it's highly recommended to have the definitions of the connected Hubs open. This helps you correctly match the business key mapping in the link with the specific business keys defined in its connected Hubs, preventing errors and ensuring proper key resolution. ## Lookup Mapping ( `lookup_mapping`) [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html\#lookup-mapping-lookup_mapping "Direct link to lookup-mapping-lookup_mapping") A `lookup_mapping` is utilized when the business keys for a target object (like a Link or a Hub Satellite) are not directly present in its immediate source data. Instead, these business keys must be obtained by joining with another source (typically the source of a connected Hub). To achieve this, you define: - `entity_source_join_columns`: Columns from the current object's source that will be used in the join condition. - `hub_source_join_columns`: Columns from the associated Hub's source that will be used in the join condition. - `hub_source_urn`: This property points to the specific Hub's source configuration, identifying which source table to join against. In this scenario, you can't specify `source_business_key` within the lookup mapping itself, as the process directly obtains the already prepared business key value, ready for hashing. #### Example: Lookup Mapping [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html\#example-lookup-mapping "Direct link to Example: Lookup Mapping") This example shows a `lookup_mapping` where the `CUSTOMER_FK` from the link's `CUSTOMER_ADDRESS` source is joined with `CUSTOMER_PK` from a Hub's source (identified by `hub_source_urn`) to derive the necessary business key. Notice that link's `hub_source_urn` value `urn:s2v:hub_source:master_data_customer` matches hub's source name as highlighted below. ```codeBlockLines_e6Vv ### Example of link's entity_sources definition entity_sources: - urn:s2v:hub_source:cust_addr_link_src: entity_source: '(SOURCE_DATA, CUSTOMER_ADDRESS)' source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:SAP' lookup_mapping: hub_source_urn: 'urn:s2v:hub_source:master_data_customer' # This URN defines which Hub's source to join with entity_source_join_columns: - 'CUSTOMER_FK' # Join column(s) from the current entity's source hub_source_join_columns: - 'CUSTOMER_PK' # Join column(s) from the hub's source ### Example of connected hub's entity_source we join with entity_sources: - urn:s2v:hub_source:master_data_customer: entity_source: '(SOURCE_DATA, CUSTOMER)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - CUSTOMER_ID: - 'CUSTOMER_ID' source_business_key: 'SAP_SE' ``` - [Business Key Mapping Types](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html#business-key-mapping-types) - [Usage by Object Type](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html#usage-by-object-type) - [Business key mapping ( `business_key_mapping`)](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html#business-key-mapping-business_key_mapping) - [Example: Direct Business Key Mapping](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html#example-direct-business-key-mapping) - [Lookup Mapping ( `lookup_mapping`)](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping/index.html#lookup-mapping-lookup_mapping) ## Data Vault Entity Sources [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html#__docusaurus_skipToContent_fallback) On this page S2V defines data sources for Data Vault objects using specific YAML structures. These structures vary by the Data Vault object type. This document details the common patterns: - **`entity_source` (singular)**: For objects with a **single source definition**. This itself has variations in how the source is keyed. - **`entity_sources` (plural)**: For objects that can be fed by **one or more source definitions** (presented as a list of URN-keyed maps). ### Common Properties within an Entity Source Definition [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html\#common-properties-within-an-entity-source-definition "Direct link to Common Properties within an Entity Source Definition") Regardless of whether it's a single `entity_source` or an item within an `entity_sources` list, each source definition will contain a set of common properties that describe how to access and process the data from that particular source. | Property | Type | Description | | --- | --- | --- | | `entity_source` (Tuple) | Tuple | Specifies the physical location of this source's data. Format: `(SOURCE_DATABASE, SOURCE_SCHEMA, SOURCE_TABLE)` or `(SOURCE_SCHEMA, SOURCE_TABLE)`. This property is present in all source definition structures, though its placement varies slightly. | | `source_filter` | String | An SQL `WHERE` clause condition to filter data from this source. Example: `"STATUS = 'ACTIVE'"` | | `use_source_cdc_flag` | Boolean | Indicates whether this source provides Change Data Capture (CDC) information (e.g., insert/update/delete flags). | | `source_system_configuration_urn` | String | A URN linking to a predefined source system configuration in your `source_system_settings.yaml` file. Example: `'urn:s2v:source_setting:erp_config'` | | `business_key_mapping` | List of Maps | Defines how the parent entity's business key(s) are populated from this source. See [Business Key & Lookup Mappings](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping). | | `source_business_key` | String | Defines business key values originating from distinct source systems, addressing multi-master scenarios. Property is required only together with `business_key_mapping`. | | `lookup_mapping` | Map | (Alternative to `business_key_mapping` for some entities) Defines how business keys are resolved via a lookup. See [Business Key & Lookup Mappings](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/bk_mapping). | | `connected_hub_relations` | List of Maps | (For Links/NHLs) Defines how the business keys of connected Hubs are sourced, using `business_key_mapping` or `lookup_mapping`. See Business Key & Lookup Mappings. | ## Single Source Objects ( `entity_source`) [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html\#single-source-objects-entity_source "Direct link to single-source-objects-entity_source") This top-level property is used when a Data Vault object is fed by a single conceptual source. The way this single source is defined can vary by the Data Vault object type. ### 1\. Tuple-keyed [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html\#1-tuple-keyed "Direct link to 1. Tuple-keyed") This structure is used when the single source is directly identified by its database, schema, and table tuple as the main key. **Used by:** - Hub Satellites ( `hubsat`) - References ( `reference`) **Structure:** ```codeBlockLines_e6Vv # Example for a Hub Satellite entity_source: (, , ): # Key: ( DB, Schema, Table) source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:' business_key_mapping: - : - '' source_business_key: '' ``` **Key Components:** - **Source Location Tuple**: Source of the data. - Format: `(SOURCE_DATABASE, SOURCE_SCHEMA, SOURCE_TABLE)` or `(SOURCE_SCHEMA, SOURCE_TABLE)`. - **Properties**: Configuration for this source, including: - `source_filter` - `use_source_cdc_flag` - `source_system_configuration_urn` - `business_key_mapping` or `lookup_mapping` **Example (Hub Satellite):** ```codeBlockLines_e6Vv name: 'SAT_CUSTOMER' entity_type: 'hubsat' connected_entity: 'HUB_CUSTOMER' entity_source: (SAP_SE, KNA1): # Tuple-key identifying the source source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - CUSTOMER_ID: # Hub's business key name - 'KUNNR' # Source column for the hub's business key source_business_key: '' # for multi-master scenarios ``` ### 2\. URN-keyed `entity_source` [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html\#2-urn-keyed-entity_source "Direct link to 2-urn-keyed-entity_source") This structure is used when a Data Vault object has a single source definition, but this definition is encapsulated within a map where the key is a URN. This resembles a single item from the `entity_sources` list (see below). **Used by:** - Non-Historized Links ( `non_historized_link`) **Structure:** ```codeBlockLines_e6Vv # Example for a Non-Historized Link entity_source: urn:s2v:link_source:: # Key: URN for the single source entity_source: ', , ' # tuple for source location source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:' connected_hub_relations: # ... hub relations and their business key mappings .. - : # Connected hub alias business_key_mapping: - : - '' . # ... other properties like non_historized_columns, historized_columns ``` **Key Components:** - **URN Key**: URN (e.g., `urn:s2v:link_source:my_event_source`). - **Properties**: - `entity_source` - `source_filter` - `use_source_cdc_flag` - `source_system_configuration_urn` - `connected_hub_relations` **Example (Non-Historized Link):** ```codeBlockLines_e6Vv name: 'EVENT' entity_type: 'non_historized_link' enable_refresh: true connected_hubs: - HUB_USER: 'HUB_USER' - HUB_EVENT: 'HUB_EVENT' entity_source: urn:s2v:link_source:src_1: # URN Key entity_source: '(SOURCE_DATA, CLICK_EVENTS)' # source location tuple source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:yaml_interface' connected_hub_relations: - HUB_EVENT: business_key_mapping: - EVENT_BK: - 'EVENT_ID' source_business_key: '' - HUB_USER: business_key_mapping: - USER_BK: - 'USER_ID' source_business_key: '' # non_historized_columns, historized_columns etc. would follow at the same level as entity_source ``` ## Multi Source Objects ( `entity_sources`) [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html\#multi-source-objects-entity_sources "Direct link to multi-source-objects-entity_sources") This structure is used for Data Vault objects that can be fed by **one or more source definitions**. Each source definition is an item in a list, and each item is a map keyed by a unique URN. **Used by:** - Hubs ( `hub`) - Regular Links ( `link`) **Structure (for each item in the `entity_sources` list):** ```codeBlockLines_e6Vv # Example for one source definition within a Hub or Link's entity_sources list - urn:s2v:_source:: # Key: URN for this specific source entity_source: ', , ' # tuple for source location source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:' # Object-specific properties follow, e.g.: # For Hubs: business_key_mapping: - : - '' source_business_key: '' # for multi-master scenarios # For Links: connected_hub_relations: - : business_key_mapping: - : - '' source_business_key: '' ``` **Key Components (for each source definition in the list):** - **Source URN Identifier (Map Key)**: A unique URN that identifies this specific source definition. The URN typically includes a prefix indicating the Data Vault object type (e.g., `urn:s2v:hub_source:my_erp_source`, `urn:s2v:link_source:sap_orders`). - **`entity_source` (Tuple)** - **`source_filter`** - **`use_source_cdc_flag`** - **`source_system_configuration_urn`** - **Object-Specific Properties**: - For **Hubs**: - `business_key_mapping` - `source_business_key` - For **Links**: - `connected_hub_relations` **Example (Hub with multiple sources):** ```codeBlockLines_e6Vv name: 'HUB_PRODUCT' entity_type: 'hub' enable_refresh: true concatenate_business_keys: false requires_source_business_key: yes target_business_key_columns: - 'PRODUCT_BK' entity_sources: - urn:s2v:hub_source:erp_system: entity_source: '(ERP_DB, PRODUCTS_SCHEMA, PRODUCT_MASTER)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:erp_config' use_source_cdc_flag: false business_key_mapping: - PRODUCT_BK: - 'SKU' source_business_key: 'ERP_DB' - urn:s2v:hub_source:legacy_system: entity_source: (LEGACY_DB, dbo, ITEMS) source_filter: "STATUS = 'ACTIVE'" source_system_configuration_urn: 'urn:s2v:source_setting:legacy_config' use_source_cdc_flag: true business_key_mapping: - PRODUCT_BK: - 'ITEM_CODE' source_business_key: 'LEGACY_SYSTEM_IDENTIFIER' ``` **Example (Regular Link):** ```codeBlockLines_e6Vv # ... (other link properties like name, entity_type, enable_refresh ...) entity_sources: - urn:s2v:link_source:SAP_SD: entity_source: '(SOURCE_SAP, SD_SALES_ORDERS)' source_filter: "ORDER_TYPE = 'ZOR'" source_system_configuration_urn: 'urn:s2v:source_setting:SAP' use_source_cdc_flag: false connected_hub_relations: - HUB_CUSTOMER: business_key_mapping: - CUSTOMER_BK: - 'KUNNR' source_business_key: '' - HUB_MATERIAL: business_key_mapping: - MATERIAL_BK: - 'MATNR' source_business_key: '' # Potentially another source for the same Link could be added here # - 'urn:s2v:link_source:CRM_ORDERS': # ... ``` info **Important Note on Link Satellites:** Link Satellites ( `linksat`) do **not** have their own `entity_source` or `entity_sources` property. They inherit their source context directly from their parent Link. - [Common Properties within an Entity Source Definition](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html#common-properties-within-an-entity-source-definition) - [Single Source Objects ( `entity_source`)](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html#single-source-objects-entity_source) - [1\. Tuple-keyed](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html#1-tuple-keyed) - [2\. URN-keyed `entity_source`](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html#2-urn-keyed-entity_source) - [Multi Source Objects ( `entity_sources`)](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/entity_sources/index.html#multi-source-objects-entity_sources) ## Data Vault Shared Properties [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/shared_properties/index.html#__docusaurus_skipToContent_fallback) On this page This document outlines the common properties that are shared across all of your Data Vault object templates, regardless of their specific `entity_type`. These properties provide fundamental configurations for naming, type identification, and refresh behavior. Other object specific properties are described in respective object references. ## Shared Template Properties [​](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/shared_properties/index.html\#shared-template-properties "Direct link to Shared Template Properties") All Data Vault object YAML templates will include the following core properties: | Property Name | Expected Type | Description | | --- | --- | --- | | `name` | string | The **unique name** of the Data Vault object. This name is used for identification and corresponds to the generated table name in your Data Vault. | | `entity_type` | enumeration | Specifies the **type of Data Vault object** being defined by the template. Posible values are: `hub`, `link`, `non_historized_link`, `hubsat`, `linksat`, `reference` | | `enable_refresh` | boolean | Controls whether the Data Vault object is **enabled for automatic refresh**. Set to `true` if this object receives new data, set to `false` if it reflects only historical data without any automatic updates. | | `entity_source/s` | Entity Source | Defines **one or more sources** that feed data into the Data Vault object. Depending on whether the object is allowed to have multiple sources or not, its definition varies slightly. | info **Important Note on Entity Source for Link Satellites:** The `entity_source` property is **not present in Link Satellite templates**. This is because satellites associated with Links are designed to inherit their entity source directly from the Link itself, eliminating the need for separate source configuration within the satellite template. Read more about entity sources property in the next chapter. - [Shared Template Properties](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/yaml-properties/shared_properties/index.html#shared-template-properties) ## Stream2Vault Reference Guide [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/overview/index.html#__docusaurus_skipToContent_fallback) The References section provides in-depth details about Stream2Vault’s configuration, supported features, and available options. Use it as a go-to resource when you need specifics or want to explore advanced capabilities. [**CLI Commands Reference**\\ \\ Detailed documentation for each S2V command-line interface (CLI) command, including options and usage examples.](https://s2v.reeeliance.com/docs/detailed-guides/cli-guides/cli) [**Model Reference**\\ \\ In-depth information on the YAML structure, properties, and configuration options for defining Data Vault objects and other model components in S2V.](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/overview) [**User Management**\\ \\ Guidance on managing user access, roles, and permissions within S2V, including integration with systems like Active Directory groups.](https://s2v.reeeliance.com/docs/detailed-guides/user-management-ad-groups) [**Generated Code**\\ \\ An explanation of the output files, code structure, and deployment artifacts produced by the S2V \`generate\` command.](https://s2v.reeeliance.com/docs/detailed-guides/generated-files) ## User Management Groups [Skip to main content](https://s2v.reeeliance.com/docs/detailed-guides/user-management-ad-groups/index.html#__docusaurus_skipToContent_fallback) ## Custom SQL Deployment [Skip to main content](https://s2v.reeeliance.com/docs/standard-deployment/custom/index.html#__docusaurus_skipToContent_fallback) While the Deployment via Makefile offers a structured and recommended approach for deploying S2V-generated SQL code, there might be scenarios where a custom deployment strategy is preferred or necessary. Custom deployment typically involves executing the generated SQL files on an on-demand basis, outside the provided Makefile orchestration. This can be suitable for: - **Manual Execution:** Directly running individual SQL scripts or batches of scripts using a SQL client like SnowSQL or your preferred database IDE. This might be used for quick fixes, targeted updates, or during initial development and testing phases. - **Integration with Existing CI/CD Pipelines:** If you have an established CI/CD pipeline that uses tools other than Make (e.g., Jenkins, GitLab CI, Azure DevOps, GitHub Actions), you can integrate the execution of the S2V-generated SQL scripts into your existing workflows. This would involve scripting the deployment steps using your pipeline's native capabilities. - **Using Other Orchestration Tools:** Employing different workflow orchestration tools (e.g., Apache Airflow, Prefect) to manage the sequence and execution of SQL scripts. **Considerations for Custom Deployment:** - **Dependency Management:** You will be responsible for ensuring that SQL scripts are executed in the correct order, especially when deploying objects with dependencies (e.g., Hubs before Links, base tables before views). The generated Makefile often handles this sequencing, which you would need to replicate. - **Configuration Management:** Connection details, variable substitutions (like database names, warehouse names), and other configurations managed by the `.ini` file in the Makefile approach will need to be handled by your custom process. - **Error Handling and Logging:** Implementing robust error handling and logging mechanisms will be crucial for troubleshooting and monitoring your custom deployments. Custom deployment provides maximum flexibility but requires careful planning to ensure reliability and maintainability. ## Data Vault Deployment Guide [Skip to main content](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html#__docusaurus_skipToContent_fallback) On this page ## Data Vault Deployment Process [​](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html\#data-vault-deployment-process "Direct link to Data Vault Deployment Process") This document describes the deployment process for the Data Vault objects, which is orchestrated by Makefiles and executed using SnowSQL, Snowflake's command-line client. Configuration, including connection parameters and script variables, is managed through an `.ini` file (e.g., `conf.ini`). **Deployment Flow Summary** - **Invocation:** You run a target (e.g., `make deploy_all`) on the top-level Makefile. - **Setup:** The top-level Makefile configures the `SNOWSQL` command, specifying the `.ini` file and command-line variables like the release tag. - **Navigation:** It changes the current directory to `$(OUTPUT_DIR)`, where the generated SQL scripts and the deployment-specific Makefile are located. - **Delegation:** It executes the same target (e.g., `deploy_all`) on the Makefile within `$(OUTPUT_DIR)`. - **Execution:** The Makefile inside `$(OUTPUT_DIR)` (generated by the tool) then: - Runs initialization scripts (e.g., creating schemas, tags). - Deploys Data Vault objects (Hubs, Links, Satellites) in the correct sequence, often layer by layer (PREP\_LAYER, MAIN\_LAYER). - Each SQL execution uses the snowsql command. SnowSQL reads the `.ini` file, connects to Snowflake, and substitutes all `&{variable_name}` placeholders in the SQL scripts with values from the .ini file before running them. - **Error Handling:** If `exit_on_error = True` is active, any SQL error will stop the deployment for that target. This deployment architecture offers a flexible and configurable method for managing your Data Vault. The clear separation of responsibilities—top-level Makefile for orchestration and environment setup, generated Makefile for detailed deployment steps, and .ini files for configuration—enhances manageability across various environments. **Partial Deployments** There's no special deployment process required for partial model deployments. The `s2v generate` command, with its `--include-objects` or `--exclude-objects` parameters, allows you to generate SQL scripts for only a subset of your Data Vault model. Once the partial model is generated, you can use the standard deployment targets like `make deploy_all` to deploy only the selected objects. The generated Makefile in the `$(OUTPUT_DIR)` will only contain the code for these included objects. ## Core Components [​](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html\#core-components "Direct link to Core Components") ### 1\. Top-Level Makefile [​](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html\#1-top-level-makefile "Direct link to 1. Top-Level Makefile") The main Makefile (typically located at the root of the project) serves as the primary entry point for all deployment operations. Its main role is to delegate tasks to another Makefile that resides within the directory where the SQL scripts are generated by the code generation tool. **Key Features:** - **`OUTPUT_DIR` Variable**: This variable specifies the path to the generated code directory. This directory contains all the SQL scripts and a dedicated Makefile for their deployment. - **Delegation of Tasks**: When a target like `make deploy_all` is executed from this top-level Makefile, it doesn't directly run the SQL files. Instead, it navigates to the `$(OUTPUT_DIR)` and executes the `make deploy_all` command on the Makefile found _within_ that directory. This pattern is consistent for other targets such as `deploy_dynamic_tables`, `deploy_tables`, and `refresh_dynamic_tables`. - **`SNOWSQL` Command Configuration**: The Makefile defines how `snowsql` commands are constructed. This includes specifying the configuration file (derived from an `$(ENV)` variable, like `dev` for `conf.ini`) and passing variables such as a release identifier ( `$(TAG)`) directly to SnowSQL. This tag is useful for versioning the deployed Data Vault objects. Minimal Makefile example: ```codeBlockLines_e6Vv # Note: Replace and with your actual project paths. # Always execute targets in parallel, defaults to 20 parellel jobs MAKEFLAGS += --jobs 20 --no-builtin-rules --no-keep-going MODEL_DIR ?= OUTPUT_DIR ?= $(MODEL_DIR)/ CURRENT_DIR := $(shell pwd) # ╔════════════════════════════════════════════════════════════════════════════╗ # ║ S2V ║ # ╚════════════════════════════════════════════════════════════════════════════╝ TAG := $(shell git describe --tags --dirty --always) # ╔════════════════════════════════════════════════════════════════════════════╗ # ║ Deployment with SnowSQL and Azure CLI example ║ # ║ - requires privileges to access Key Vault ║ # ╚════════════════════════════════════════════════════════════════════════════╝ # SNOWSQL example export SNOWSQL := snowsql --config $(PWD)/conf/conf.ini --variable S2V_RELEASE_ID_DYNAMIC_TABLE=$(TAG) --variable S2V_RELEASE_ID_TABLE=$(TAG) # ╔════════════════════════════════════════════════════════════════════════════╗ # ║ Default Makefile targets ║ # ╚════════════════════════════════════════════════════════════════════════════╝ .PHONY: deploy_all deploy_all: $(MAKE) --directory $(OUTPUT_DIR) deploy_all .PHONY: deploy_dynamic_tables deploy_dynamic_tables: $(MAKE) --directory $(OUTPUT_DIR) deploy_dynamic_tables .PHONY: deploy_tables deploy_tables: $(MAKE) --directory $(OUTPUT_DIR) deploy_tables .PHONY: refresh_dynamic_tables refresh_dynamic_tables: $(MAKE) --directory $(OUTPUT_DIR) refresh_dynamic_tables ``` tip Leverage your top-level Makefile to streamline your S2V workflow: - Environment Management: Easily switch between deployment environments (dev, test, prod) by changing the ENV variable, which can point to different `conf/.ini` files. - S2V Command Orchestration: Add targets to run S2V commands like `s2v validate -i $(MODEL_DIR)` or `s2v generate -i $(MODEL_DIR) -o $(OUTPUT_DIR)` before deployment. - Targeted Deployments: Create custom Make targets for deploying specific subsets of your Data Vault model (e.g., only Hubs, or objects related to a particular source system) by invoking specific targets in the `$(OUTPUT_DIR)/Makefile`. - Pre/Post Deployment Hooks: Add custom shell commands or scripts to execute before or after deployment steps for tasks like information schema refresh or warehouse scaling. ### 2\. The .ini Configuration File (e.g., dev.ini) [​](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html\#2-the-ini-configuration-file-eg-devini "Direct link to 2. The .ini Configuration File (e.g., dev.ini)") This file is essential for SnowSQL, providing connection details, operational options, and variables for substitution within the SQL scripts. Example conf.ini Content: ```codeBlockLines_e6Vv [connections] accountname = ... [options] exit_on_error = ... [variables] TARGET_DATABASE= TARGET_WAREHOUSE= ... ``` info Running Make on Windows may require additional configuration. See [Run Make on Windows](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows). - [Data Vault Deployment Process](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html#data-vault-deployment-process) - [Core Components](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html#core-components) - [1\. Top-Level Makefile](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html#1-top-level-makefile) - [2\. The .ini Configuration File (e.g., dev.ini)](https://s2v.reeeliance.com/docs/standard-deployment/make/index.html#2-the-ini-configuration-file-eg-devini) ## SQL Deployment Overview [Skip to main content](https://s2v.reeeliance.com/docs/standard-deployment/overview/index.html#__docusaurus_skipToContent_fallback) This section outlines the recommended methods for deploying the SQL code generated by Stream2Vault (S2V). The primary approach, **Deployment via Makefile**, leverages Makefiles and SnowSQL for a structured and configurable deployment process. This method is detailed with examples, covering the orchestration of SQL script execution, environment configuration through `.ini` files, and variable substitution. It's designed for consistency and manageability across different deployment environments (dev, test, prod). For scenarios requiring more flexibility or integration with existing CI/CD pipelines, a **Custom Deployment** approach can be adopted. This allows for on-demand execution of generated SQL files using various tools or processes. [**Deployment via Makefile**\\ \\ Leverages Makefiles and SnowSQL for a structured and configurable deployment process.](https://s2v.reeeliance.com/docs/standard-deployment/make) [**Custom Deployment**\\ \\ Allows for on-demand execution of generated SQL files using various tools or processes.](https://s2v.reeeliance.com/docs/standard-deployment/custom) ## Data Vault Configuration [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/configuration-files/data-vault-configuration/index.html#__docusaurus_skipToContent_fallback) On this page This document outlines the **global configuration settings** for your data vault implementation, managed through the `data_vault_settings.yaml` file. These parameters define core behaviors and naming conventions across your data vault, ensuring consistency and proper functioning. ### Properties [​](https://s2v.reeeliance.com/docs/tutorials/configuration-files/data-vault-configuration/index.html\#properties "Direct link to Properties") The table below provides a comprehensive overview of each property available in the `data_vault_settings.yaml` file, including its expected data type and a detailed description of its purpose. | Property Name | Expected Type | Description | | --- | --- | --- | | `data_vault_name` | string | The **name of your data vault**. | | `is_business_key_case_sensitive` | boolean | Specifies if **business keys should be treated as case-sensitive**. Set to true if you need distinct entries for keys that differ only by case (e.g., "ABC" vs. "abc"). | | `load_timestamp_column_name` | string | The **name for the load timestamp column in target** data vault object. | | `hashdiff_column_name` | string | The **name for the hash difference column in target** data vault object. | | `record_source_column_name` | string | The **name for the column indicating the source of the record in target** data vault object. | | `source_business_key_column_name` | string | The **name for the column storing the source system business key in target** data vault object. | | `cdc_flag_column_name` | string | The **name for the CDC (Change data Capture) flag column in target** data vault object. | | `hashkey_delimiter` | string | The **delimiter used when concatenating multiple columns** to form a hash key. | | `hash_key_column_prefix` | string | The **prefix to be added to hash key** column names. | | `hashing_algorithm` | string | The **hashing algorithm** to use (e.g., "HASH", "MD5", "SHA1", "SHA2", "NOHASH") | | `use_binary_hashing_algorithm` | boolean | Specifies whether the chosen **hashing algorithm should produce binary output**. | | `multi_source_databases` | boolean | Indicates if the **data vault integrates data from multiple source databases** (affects source tuple format in `entity_source` property). | ### Example [​](https://s2v.reeeliance.com/docs/tutorials/configuration-files/data-vault-configuration/index.html\#example "Direct link to Example") Below is an example of a `data_vault_settings.yaml` file with common configurations. You can adapt these settings to match your specific data vault environment and requirements. ```codeBlockLines_e6Vv data_vault_name: 'My Data Vault Model' is_business_key_case_sensitive: false load_timestamp_column_name: 'LOAD_DATE' hashdiff_column_name: 'HASH_DIFF' record_source_column_name: 'RECORD_SOURCE' hashkey_delimiter: '##' hash_key_column_prefix: 'HKEY_' source_business_key_column_name: 'SRC_BK' cdc_flag_column_name: 'CDC_FLAG' hashing_algorithm: 'HASH' use_binary_hashing_algorithm: false multi_source_databases: false ``` - [Properties](https://s2v.reeeliance.com/docs/tutorials/configuration-files/data-vault-configuration/index.html#properties) - [Example](https://s2v.reeeliance.com/docs/tutorials/configuration-files/data-vault-configuration/index.html#example) ## Information Schema Overview [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/configuration-files/information-schema/index.html#__docusaurus_skipToContent_fallback) On this page ## Overview [​](https://s2v.reeeliance.com/docs/tutorials/configuration-files/information-schema/index.html\#overview "Direct link to Overview") This document describes the Information Schema file `information_schema.csv`, a crucial component for defining the metadata of your source systems. This metadata is essential for ensuring that all source columns referenced within your Data Vault model are correctly identified and available in the corresponding source database. Ensure that following mandatory columns are included in the file: - `TABLE_SCHEMA`: The schema where the table resides. - `TABLE_NAME`: The name of the table. - `COLUMN_NAME`: The name of the column. - `ORDINAL_POSITION`: The position of the column within its table. - `DATA_TYPE`: The data type of the column (e.g., NUMBER, TEXT, TIMESTAMP). ### Snowflake [​](https://s2v.reeeliance.com/docs/tutorials/configuration-files/information-schema/index.html\#snowflake "Direct link to Snowflake") For Snowflake users, you can easily extract the necessary information schema details using the following SQL query. ```codeBlockLines_e6Vv USE SCHEMA .; SELECT * FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_SCHEMA = '' ORDER BY TABLE_NAME, ORDINAL_POSITION ; ``` tip Automate the schema extraction via Makefile - [Overview](https://s2v.reeeliance.com/docs/tutorials/configuration-files/information-schema/index.html#overview) - [Snowflake](https://s2v.reeeliance.com/docs/tutorials/configuration-files/information-schema/index.html#snowflake) ## Source System Configuration [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/configuration-files/source-configuration/index.html#__docusaurus_skipToContent_fallback) On this page These settings define parameters for source systems integrated into your Data Vault. While you can create distinct configurations for each source, it's often practical to use a single set of settings for multiple similar systems (e.g., several SAP instances) or for a standardized staging area. The core of the source configuration is an array of these settings stored in `source_system_settings.yaml`. Each item in this array is an object defined by a unique URN, which acts as a key to identify and associate the settings with the relevant source object(s) in your data vault model. ### Properties [​](https://s2v.reeeliance.com/docs/tutorials/configuration-files/source-configuration/index.html\#properties "Direct link to Properties") The table below provides a comprehensive overview of each property available in the `source_system_settings.yaml` file, including its expected data type and a detailed description of its purpose. | Property Name | Expected Type | Description | | --- | --- | --- | | (URN Key itself) | string | **A unique identifier for the source system configuration**. Must follow the pattern: `urn:s2v:source_setting:`. | | `hashkey_escape_char` | string | The character used to **escape the `hashkey_delimiter`** if the delimiter character itself appears within a source value. This prevents misinterpretation during hash key concatenation. | | `empty_value_is_null` | boolean | Specifies if **empty string values from the source should be treated as NULL values** in the Data Vault. Set to true to normalize empty strings to NULL | | `trim_whitespaces` | bolean | Specifies if **leading and trailing whitespaces should be trimmed** from source column values before processing. Set to true to ensure data consistency. | | `load_timestamp_column_name` | string | The **name of the column in the source system that represents the load timestamp**. | | `cdc_flag_column_name` | string | The **name of the column in the source system that acts as a CDC flag** (e.g., indicating insert, update, delete). | | `cdc_value_mapping` | object | An object that defines the **mapping between CDC operation types (insert, update, delete) and their corresponding values** found in the source system's `cdc_flag_column_name`. | | `cdc_value_mapping.insert` | string | The **value in the source's CDC flag column that indicates an insert** operation. | | `cdc_value_mapping.update` | string | The **value in the source's CDC flag column that indicates an update** operation. Can also be `"[delete_value]+[insert_value]"` for delete+insert logic. | | `cdc_value_mapping.delete` | string | The **value in the source's CDC flag column that indicates a delete** operation. | | `cdc_window_threshold` | integer | Indicates **a time window in seconds that needs to pass to consider a record as a true delete**. This is useful for handling soft deletes or asynchronous CDC processes where an update might be quickly followed by a delete. | ### Examples [​](https://s2v.reeeliance.com/docs/tutorials/configuration-files/source-configuration/index.html\#examples "Direct link to Examples") Below is an example of a `source_system_settings.yaml` file with common configurations. You can adapt these settings to match your specific Data Vault environment and requirements. ```codeBlockLines_e6Vv source_system_settings: # Source configuration - urn:s2v:source_setting:SAP: hashkey_escape_char: '\\' empty_value_is_null: false trim_whitespaces: true load_timestamp_column_name: 'LOAD_DATE' cdc_flag_column_name: 'CDC_FLAG' cdc_value_mapping: insert: 'I' update: 'D+I' delete: 'D' cdc_window_threshold: 1 # Source configuration - urn:s2v:source_setting:SALESFORCE: hashkey_escape_char: '\\' empty_value_is_null: false trim_whitespaces: true load_timestamp_column_name: 'CDC_LOAD_TIMESTAMP' cdc_flag_column_name: 'CDC_FLAG' cdc_value_mapping: insert: 'INSERT' update: 'INSERT' delete: 'DELETE' cdc_window_threshold: 1 # Additional source configurations can be added here... # ... ``` - [Properties](https://s2v.reeeliance.com/docs/tutorials/configuration-files/source-configuration/index.html#properties) - [Examples](https://s2v.reeeliance.com/docs/tutorials/configuration-files/source-configuration/index.html#examples) ## Data Vault Hubs Overview [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/hub/index.html#__docusaurus_skipToContent_fallback) On this page A Hub is a core component in Data Vault 2.0 methodology, representing a central business concept or entity that is uniquely identified by one or more business keys. Hubs form the backbone of the Data Vault model, providing a stable, integrated list of unique business keys from across the enterprise. They are designed to be resilient to changes in source systems and business processes. ### Key Characteristics: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/hub/index.html\#key-characteristics "Direct link to Key Characteristics:") - **Business Key Focus:** Stores unique business keys (e.g., Customer ID, Product Code, Employee Number). - **Integration Point:** Integrates business keys from various source systems. - **Stability:** Designed to be stable over time; changes to descriptive data about the business entity are stored in Satellites. - **Minimal Attributes:** Contains only the business key(s), a load timestamp, and a record source. It does not store descriptive attributes. ### Role in Data Vault: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/hub/index.html\#role-in-data-vault "Direct link to Role in Data Vault:") Hubs are fundamental for establishing a single version of the truth for core business entities. They ensure that each unique business entity is represented only once, regardless of how many source systems might refer to it. This helps in creating a consistent and auditable data integration layer. ### Simple Hub Example: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/hub/index.html\#simple-hub-example "Direct link to Simple Hub Example:") This example defines a Hub for `CUSTOMER`, identified by a single business key `CUSTOMER_BK`. ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' entity_type: 'hub' concatenate_business_keys: false requires_source_business_key: false enable_refresh: true target_business_key_columns: - 'CUSTOMER_BK' # The business key column in the target Hub table entity_sources: - urn:s2v:hub_source:SAP_MASTERDATA: # Unique URN for this source entity_source: (SAP_MASTERDATA, KNA1) # Source: (Schema, Table) source_filter: '' # Optional filter on the source source_system_configuration_urn: 'urn:s2v:source_setting:SAP' # Link to global source settings business_key_mapping: - CUSTOMER_BK: # Target business key column name - 'KUNNR' # Source column from ERP Customers table source_business_key: '' # Optional: for multi-master scenarios ``` - For a detailed step-by-step guide on building a Hub, please refer to the [How to build a Hub?](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hub) tutorial. - For a comprehensive guide on all available properties and detailed explanations for defining a Hub, please refer to the [Hub Reference](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/hub/hub). - [Key Characteristics:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/hub/index.html#key-characteristics) - [Role in Data Vault:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/hub/index.html#role-in-data-vault) - [Simple Hub Example:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/hub/index.html#simple-hub-example) ## Data Vault Links Overview [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/link/index.html#__docusaurus_skipToContent_fallback) On this page A Link represents a relationship, transaction, or association between two or more Hubs (core business entities). Links establish the connections that define how business entities interact. They are essential for modeling the "unit of work" or events that occur between business concepts. ### Key Characteristics: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/link/index.html\#key-characteristics "Direct link to Key Characteristics:") - **Relationship Focus:** Captures relationships or transactions between business keys stored in Hubs. - **Connects Hubs:** Contains foreign keys referencing the Hubs involved in the relationship. The primary key of a Link is typically a composite of the Hub hash keys it connects. - **No Descriptive Attributes (Typically):** Standard Links do not store descriptive attributes about the relationship itself. Such attributes are stored in Link Satellites. - **Granularity:** The granularity of a Link is defined by the combination of Hubs it connects. ### Role in Data Vault: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/link/index.html\#role-in-data-vault "Direct link to Role in Data Vault:") Links are crucial for understanding how different parts of the business are connected. They provide the context for transactions and interactions, enabling analysis across different business domains. By separating relationships from descriptive data, Links contribute to the flexibility and scalability of the Data Vault model. ### Simple Link Example: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/link/index.html\#simple-link-example "Direct link to Simple Link Example:") This example defines a Link `LNK_SALES_ORDER` that represents an item on an order, connecting `HUB_SALES_ORDER`, `HUB_MATERIAL` , `HUB_PROFIT_CENTER` and `HUB_CUSTOMER`. ```codeBlockLines_e6Vv name: 'LNK_SALES_ORDER' entity_type: 'link' enable_refresh: True connected_hubs: - CUSTOMER: 'HUB_CUSTOMER' # : - SALES_ORDER: 'HUB_SALES_ORDER' - PROFIT_CENTER: 'HUB_PROFIT_CENTER' - MATERIAL: 'HUB_MATERIAL' entity_sources: - urn:s2v:link_source:SAP_OE: # Unique URN for this source entity_source: '(SAP_OE, 2LIS_11_VAITM)' # Source: (Schema, Table) source_filter: # Optional filter source_system_configuration_urn: 'urn:s2v:source_setting:SAP' # Link to global source settings use_source_cdc_flag: False connected_hub_relations: - CUSTOMER: source_business_key: '' # Optional source system identifier business_key_mapping: - CUSTOMER_BK: # Business key name from HUB_CUSTOMER - 'KUNNR' # Source column KUNNR participating in business key - SALES_ORDER: source_business_key: 'SAP_OE_SD' business_key_mapping: - SALES_ORDER_BK: - 'VBELN' - 'POSNR' - PROFIT_CENTER: source_business_key: '' business_key_mapping: - PROFIT_CENTER_BK: - "PRCTR" - "KOKRS" - MATERIAL: source_business_key: '' business_key_mapping: - MATERIAL_BK: - 'MATNR' ``` - For a detailed step-by-step guide on building a Link, please refer to the [How to build a Link?](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-link) tutorial. - For a comprehensive guide on all available properties and detailed explanations for defining a Hub, please refer to the [Link Reference](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/link_attributes). - [Key Characteristics:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/link/index.html#key-characteristics) - [Role in Data Vault:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/link/index.html#role-in-data-vault) - [Simple Link Example:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/link/index.html#simple-link-example) ## Non-Historized Links Overview [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/nhl/index.html#__docusaurus_skipToContent_fallback) On this page A Non-Historized Link (NHL) in Data Vault 2.0 is a specialized type of Link that captures the current state of a relationship between Hubs and can also store descriptive attributes about that relationship directly within itself, without requiring a separate Link Satellite. It is "non-historized" in the sense that it typically represents the latest known attributes of the relationship, though S2V allows for defining `historized_columns` within an NHL if needed. ### Key Characteristics: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/nhl/index.html\#key-characteristics "Direct link to Key Characteristics:") - **Connects Hubs:** Like regular Links, it establishes relationships between two or more Hubs. - **Stores Attributes Directly:** Unlike standard Links, NHLs can store descriptive attributes about the relationship (e.g., status of a connection, preference settings between two entities). - **Single Source Definition:** In S2V, NHLs are typically fed from a single `entity_source` definition, similar to how Satellites are defined. ### Role in Data Vault: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/nhl/index.html\#role-in-data-vault "Direct link to Role in Data Vault:") NHLs are useful for modeling relationships where the attributes of the relationship itself are important and you want to query them directly alongside the relationship, often reflecting the current operational state. They can simplify queries for current-state relationship attributes compared to joining a Link with its Satellite. However, if full historization of relationship attributes is paramount, a Link with an associated Link Satellite is the more traditional DV2.0 approach. ### Simple Non-Historized Link Example: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/nhl/index.html\#simple-non-historized-link-example "Direct link to Simple Non-Historized Link Example:") This example defines an NHL `NHL_SALES_ORDER_CONDITION_TYPES` that connects `HUB_SALES_ORDER` and `HUB_CONDITION_TYPE`. ```codeBlockLines_e6Vv name: 'NHL_SALES_ORDER_CONDITION_TYPES' enable_refresh: true entity_type: 'non_historized_link' connected_hubs: - SALES_ORDER: 'HUB_SALES_ORDER' - CONDITION_TYPE: 'HUB_CONDITION_TYPE' entity_source: urn:s2v:link_source:2LIS_13_VDKON: # URN identifying this source configuration entity_source: '(SAP_OE, 2LIS_13_VDKON)' # Source: (Schema, Table) source_filter: '' # Optional filter source_system_configuration_urn: 'urn:s2v:source_setting:SAP' use_source_cdc_flag: true connected_hub_relations: - SALES_ORDER: business_key_mapping: - SALES_ORDER_BK: # Business key name from HUB_SALES_ORDER - 'VBELN' # Source column VBELN from table 2LIS_13_VDKON - 'POSNR' # Source column POSNR from table 2LIS_13_VDKON source_business_key: 'SAP_OE_SD' # Optional source system identifier - CONDITION_TYPE: business_key_mapping: - CONDITION_TYPE_BK: - 'KSCHL' source_business_key: 'SAP_OE' non_historized_columns: - 'VBELN' - 'POSNR' - 'KSCHL' historized_columns: - 'GBSTK' - 'WBSTK' - 'AEDAT' - 'KUNRG' - 'KUNAG' - 'BUKRS' - 'BZIRK' ``` - For a detailed step-by-step guide on building a Link, please refer to the [How to build a Non-Historized Link?](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-nhl) tutorial. - For a comprehensive guide on all available properties and detailed explanations for defining a Hub, please refer to the [Non-Historized Link Reference](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/link/nhl_attributes). - [Key Characteristics:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/nhl/index.html#key-characteristics) - [Role in Data Vault:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/nhl/index.html#role-in-data-vault) - [Simple Non-Historized Link Example:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/nhl/index.html#simple-non-historized-link-example) ## Data Vault Reference Tables [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/reference/index.html#__docusaurus_skipToContent_fallback) On this page A Reference Table in the context of Data Vault (and data warehousing in general) is used to store relatively static lookup or classification data. While not a core Data Vault 2.0 entity type like Hubs, Links, or Satellites, Reference tables play a crucial supporting role. They provide descriptive context to codes or keys found in other tables, making the data more understandable and usable for reporting and analysis. ### Key Characteristics: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/reference/index.html\#key-characteristics "Direct link to Key Characteristics:") - **Lookup Data:** Stores codes, descriptions, categories, types, statuses, etc. (e.g., country codes and names, order status descriptions, product category names). - **Relatively Static:** Data in reference tables changes infrequently. - **Enhances Readability:** Provides human-readable descriptions for codes used in operational systems or other Data Vault tables. - **Simple Structure:** Typically has a simple key (the code) and one or more descriptive attributes. ### Role in Data Warehousing: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/reference/index.html\#role-in-data-warehousing "Direct link to Role in Data Warehousing:") Reference tables are essential for enriching data and supporting business intelligence. They allow users to see meaningful descriptions instead of cryptic codes in their reports and dashboards. In a Data Vault context, they can be used to provide context to data loaded into Satellites or other informational marts built on top of the Raw Vault. ### Simple Reference Table Example: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/reference/index.html\#simple-detailed-guides-table-example "Direct link to Simple Reference Table Example:") This example defines a Reference table `REF_ORDER_STATUS` to store descriptions for order status codes. ```codeBlockLines_e6Vv name: 'REF_SAPSE_GENERAL_T003T' entity_type: 'reference' enable_refresh: true enable_history: false ordering_columns: skip_hashdiff_comparison: false concatenate_business_keys: true target_business_key_columns: - 'REF_SAPSE_GENERAL_T003T_BK' entity_source: (SAP_SE, T003T): # Source: (Schema, Table) source_filter: # Optional filter use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - REF_SAPSE_GENERAL_T003T_BK: # Business key column name - 'BLART' # Corresponding column in T003T table - 'MANDT' # Corresponding column in T003T table - 'SPRAS' # Corresponding column in T003T table historized_columns: # Attributes whose history is tracked - 'BLART' - 'MANDT' - 'SPRAS' - 'LTEXT' non_historized_columns: [] # Attributes whose history is NOT tracked (current value stored) ``` - For a detailed step-by-step guide on building a Reference, please refer to the [How to build a Reference?](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-detailed-guides) tutorial. - For a comprehensive guide on all available properties and detailed explanations for defining a Reference, please refer to the [Reference](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/reference/reference). - [Key Characteristics:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/reference/index.html#key-characteristics) - [Role in Data Warehousing:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/reference/index.html#role-in-data-warehousing) - [Simple Reference Table Example:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/reference/index.html#simple-detailed-guides-table-example) ## Data Vault Satellites Overview [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/satellite/index.html#__docusaurus_skipToContent_fallback) On this page A Satellite in Data Vault 2.0 stores descriptive attributes and context for a specific Hub or Link. Satellites are designed to track historical changes to these attributes over time. Each Satellite is attached to exactly one parent Hub or Link. ### Key Characteristics: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/satellite/index.html\#key-characteristics "Direct link to Key Characteristics:") - **Descriptive Attributes:** Contains the contextual, descriptive data related to a Hub or Link (e.g., customer name, address, product description, order status). - **Historical Tracking:** Designed to capture changes to attributes over time. Each change typically results in a new versioned record in the Satellite, identified by a load timestamp. - **Single Parent:** A Satellite is always attached to one specific Hub or Link. ### Role in Data Vault: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/satellite/index.html\#role-in-data-vault "Direct link to Role in Data Vault:") Satellites provide the rich contextual information that makes the data meaningful. By separating descriptive attributes from the structural Hubs and Links, Data Vault models achieve flexibility and scalability. When source systems change or new attributes are added, new Satellites can be added without impacting the existing core structure. This historical tracking is vital for auditability, trend analysis, and understanding data evolution. There are two main types based on their parent: - **Hub Satellites ( `hubsat`):** Store attributes for core business entities (Hubs). - **Link Satellites ( `linksat`):** Store attributes for relationships or transactions (Links). ### Simple Hub Satellite Example: [​](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/satellite/index.html\#simple-hub-satellite-example "Direct link to Simple Hub Satellite Example:") This example defines a Hub Satellite `SAT_SAP_CEPC` that stores descriptive attributes for `HUB_PROFIT_CENTER`. ```codeBlockLines_e6Vv name: 'SAT_SAP_CEPC' entity_type: 'hubsat' # Could also be 'linksat' for a Link Satellite enable_refresh: true skip_hashdiff_comparison: false ordering_columns: [] # Optional: for sequencing records with same load timestamp connected_entity: 'HUB_PROFIT_CENTER' # Name of the parent Hub entity_source: (SAP_MASTERDATA, CEPC): # Source: (Schema, Table) source_filter: '' # Optional filter source_system_configuration_urn: 'urn:s2v:source_setting:SAP' use_source_cdc_flag: false source_business_key: '' # Optional source system identifier business_key_mapping: # How to get the parent Hub's business key - PROFIT_CENTER_BK: # Business key column name in HUB_PRODUCT - "PRCTR" # Corresponding column in CEPC table - "KOKRS" # Corresponding column in CEPC table historized_columns: # Attributes whose history is tracked - 'ABTEI' - 'VERAK' - 'USNAM' - 'ERSDA' - 'WAERS' non_historized_columns: # Attributes whose history is NOT tracked (current value stored) - "PRCTR" - "KOKRS" ``` For a detailed step-by-step guide on building a Satellite, please refer to: - [How to build a Hub Satellite?](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hubsat) tutorial. - [How to build a Link Satellite?](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-linksat) tutorial. For a comprehensive guide on all available properties and detailed explanations for defining a Satellite, please refer to: - [Hub Satellite Reference](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/hub-satellite). - [Link SatelliteReference](https://s2v.reeeliance.com/docs/detailed-guides/model-detailed-guides/satellite/link-satellite). - [Key Characteristics:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/satellite/index.html#key-characteristics) - [Role in Data Vault:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/satellite/index.html#role-in-data-vault) - [Simple Hub Satellite Example:](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/satellite/index.html#simple-hub-satellite-example) ## Best Practices Guide [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/general-notes/index.html#__docusaurus_skipToContent_fallback) - Maybe rename to Best-practices depending on content? ## S2V Tutorials Overview [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/overview/index.html#__docusaurus_skipToContent_fallback) This section provides practical tutorials to help you get started with S2V. You'll learn how to: - **Understand and configure essential files**, such as data vault settings. - **Grasp key S2V concepts**, like the definition and usage of entity sources for different Data Vault objects. - **Use the S2V Command Line Interface (CLI)** for validating your models and generating the code, covering both default and custom project structures. These tutorials are designed to guide you through common scenarios and build a foundational understanding of S2V's capabilities. [**Configuration Files**\\ \\ Explore how to set up Data Vault settings, source system configurations, and other essential S2V files.](https://s2v.reeeliance.com/docs/tutorials/configuration-files/data-vault-configuration) [**Data Vault Objects**\\ \\ Discover the structure and properties of different Data Vault objects like Hubs, Links, and Satellites.](https://s2v.reeeliance.com/docs/tutorials/data-vault-objects/hub) [**S2V Commands Overview**\\ \\ Get a high-level overview of the available S2V commands and their primary functions.](https://s2v.reeeliance.com/docs/tutorials/s2v-commands) [**Step-by-Step Guides**\\ \\ Follow step-by-step guides for common S2V tasks, such as using CLI commands with different project structures.](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure) [**Sample Data**\\ \\ Access and understand sample datasets to practice with S2V tutorials and explore its features.](https://s2v.reeeliance.com/docs/tutorials/sample-data) ## S2V Command Overview [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#__docusaurus_skipToContent_fallback) On this page As you've read so far, you know that your data vault model is made up of various data vault objects and settings, all neatly configured in individual YAML files and stored in a special "model folder." You also learned that S2V is a tool you interact with by typing commands, and you might have already used the [Login](https://s2v.reeeliance.com/docs/getting-started/login) command to get authorized. What you haven't learned yet is what other commands S2V offers, how we make sure your model follows all the data vault rules, or how to turn your model into deployable code. This section will guide you through the most commonly used S2V commands. For more detailed information about how to use them, the code they generate, or specific error messages, you can find more details in the [S2V References](https://s2v.reeeliance.com/docs/detailed-guides/overview). ## Validate [​](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html\#validate "Direct link to Validate") As the name suggests, the `validate` command acts as a checker for your data vault model. It looks for any issues that might stop you from generating code, from simple problems like incorrect YAML formatting to more complex checks on how your data vault objects relate to each other. ### Usage [​](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html\#usage "Direct link to Usage") To run the validate command, you just need to tell it where your model files are located (using the `-i` flag). ```codeBlockLines_e6Vv s2v validate -i path/to/my-input-folder/ ``` ### Output [​](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html\#output "Direct link to Output") If no errors are found, the S2V tool will give you a success message, meaning you're ready to move on to generating your code. However, even with a success message, the tool will also show you any warnings it found, just in case there are minor things to consider. ## Generate [​](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html\#generate "Direct link to Generate") Once you're happy with your data vault model (and ideally, you've checked it with the `validate` command), the next step is to `generate` the actual code. This is the code that will be deployed to your target database. Think of the `generate` command as the builder. You give it your model, and it constructs all the necessary pieces of code based on that. ### Usage [​](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html\#usage-1 "Direct link to Usage") Using the `generate` command is straightforward. You need to tell it where your model files are and where you want the generated code to be saved. ```codeBlockLines_e6Vv s2v generate -i path/to/your-model-folder/ -o path/to/your-output-folder/ ``` ### Output [​](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html\#output-1 "Direct link to Output") Each code generation run includes a validate step before proceeding with the generation. Depending on the validation result, it returns either a list of error messages or the generated code. More details about the generated code can be found in [Generated Files](https://s2v.reeeliance.com/docs/detailed-guides/generated-files). ## Visualize [​](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html\#visualize "Direct link to Visualize") The `visualize` command allows you to see and explore your data vault model as a network graph. ### Usage [​](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html\#usage-2 "Direct link to Usage") ```codeBlockLines_e6Vv s2v visualize -i path/to/my-input-folder/ ``` - [Validate](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#validate) - [Usage](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#usage) - [Output](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#output) - [Generate](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#generate) - [Usage](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#usage-1) - [Output](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#output-1) - [Visualize](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#visualize) - [Usage](https://s2v.reeeliance.com/docs/tutorials/s2v-commands/index.html#usage-2) ## Sample Data Vault Models [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/sample-data/index.html#__docusaurus_skipToContent_fallback) On this page This page provides sample data vault models used in the examples and tutorials. These datasets are designed to illustrate different scenarios and data complexities. ## Available DV models [​](https://s2v.reeeliance.com/docs/tutorials/sample-data/index.html\#available-dv-models "Direct link to Available DV models") ### 1\. Default Project Structure [​](https://s2v.reeeliance.com/docs/tutorials/sample-data/index.html\#1-default-project-structure "Direct link to 1. Default Project Structure") - **Description:** A sample Data Vault model where all S2V configuration files ( `data_vault_settings.yaml`, `source_system_settings.yaml`, `information_schema.csv`) are located in the root of the `dv_model/` folder. - **Download:** [Download Default Project Structure](https://s2v.reeeliance.com/assets/files/sample_data-d4fd14d9180b5b0a29b24ab7e637c3a1.zip) ### 2\. Custom Project Structure [​](https://s2v.reeeliance.com/docs/tutorials/sample-data/index.html\#2-custom-project-structure "Direct link to 2. Custom Project Structure") - **Description:** A sample Data Vault model demonstrating a custom project layout. Configuration files are organized into subdirectories: `data_vault_settings.yaml` and `source_system_settings.yaml` are in `dv_model/configuration/`, and information schema files (as `dev_inf_schema.csv`) are in `dv_model/sources/`. - **Download:** [Download Custom Project Structure](https://s2v.reeeliance.com/assets/files/sample_data_custom-5a1801a54f3c47be3c9269273c02633c.zip) - [Available DV models](https://s2v.reeeliance.com/docs/tutorials/sample-data/index.html#available-dv-models) - [1\. Default Project Structure](https://s2v.reeeliance.com/docs/tutorials/sample-data/index.html#1-default-project-structure) - [2\. Custom Project Structure](https://s2v.reeeliance.com/docs/tutorials/sample-data/index.html#2-custom-project-structure) ## Building Hub Object [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hub/index.html#__docusaurus_skipToContent_fallback) On this page This tutorial will guide you through building a complete Hub object YAML definition step-by-step. A Hub represents a core business entity and stores a unique list of business keys. We'll build a Hub named `HUB_CUSTOMER`. 1. **Hub Name ( `name`):** This is the unique identifier for your Hub object. ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' # 1. Hub Name ``` 2. **Entity Type ( `entity_type`):** Specifies that this object is a Hub. ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' entity_type: 'hub' # 2. Entity Type ``` 3. **Concatenate Business Keys ( `concatenate_business_keys`):** A boolean flag. - If `true`, and multiple source columns are mapped to business keys, their values will be concatenated before hashing to form the Hub's hash key. This implies that the combination of these source columns uniquely identifies the business entity. When `true`, `target_business_key_columns` _must_ list a single column representing this concatenated key. - If `false` (more common for Hubs with a single, clearly defined business key or when business keys are sourced independently and should not be combined), each business key defined in `target_business_key_columns` is processed individually. ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' entity_type: 'hub' concatenate_business_keys: false # 3. Concatenate Business Keys ``` 4. **Requires Source Business Key ( `requires_source_business_key`):** A boolean flag. If `true`, each source definition within `entity_sources` _must_ provide a non-empty `source_business_key` value. This is crucial for multi-master Hubs where the same business key might originate from different source systems, and you need to distinguish these instances. If `false`, the `source_business_key` _must_ stay empty in all sources. ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' entity_type: 'hub' concatenate_business_keys: false requires_source_business_key: false # 4. Requires Source Business Key ``` 5. **Enable Refresh ( `enable_refresh`):** A boolean flag indicating whether the hub receives new data. Set to `true` if the object can receive new data, set to `false` if the object contains only historical records. ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' entity_type: 'hub' concatenate_business_keys: false requires_source_business_key: false enable_refresh: true # 5. Enable Refresh ``` 6. **Target Business Key Columns ( `target_business_key_columns`):** A list defining the names of the business key(s) in the target Hub. For a Hub representing a single business concept (like 'Material'), this list typically contains one column name (e.g., `CUSTOMER_BK`). ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' entity_type: 'hub' concatenate_business_keys: false requires_source_business_key: false enable_refresh: true target_business_key_columns: # 6. Target Business Key Columns - 'CUSTOMER_BK' ``` 7. **Entity Sources ( `entity_sources`):** This property defines the source(s) that feed data into this Hub. It's a list, meaning a Hub can be populated from multiple sources. We'll define one source first, then add more to illustrate a multi-source Hub. - **7.1. Source URN Identifier**: This is the unique URN key for this specific source definition within the list. The prefix is typically `hub_source`. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: # 7. Entity Sources - urn:s2v:hub_source:SAP_MASTERDATA: # 7.1. URN Identifier ``` - **7.2. Source Location ( `entity_source`):** Specifies the physical location of this source's data. - For multiple source databases, use a triple: `(source_database, source_schema, source_table)`. - For a single source database context: `(source_schema, source_table)`. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: - urn:s2v:hub_source:SAP_MASTERDATA: entity_source: '(SAP_MASTERDATA, KNA1)' # 7.2. Source Location ``` - **7.3. Source Filter ( `source_filter`):** An optional SQL condition applied to the source data. You can use any valid SQL code that can be inserted into a WHERE clause. Leave it empty if no filters are required. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: - urn:s2v:hub_source:sap_MASTERDATA: entity_source: '(SAP_MASTERDATA, KNA1)' source_filter: '' # 7.3. Source Filter (empty means no filter) ``` - **7.4. Source System Configuration URN ( `source_system_configuration_urn`):** Links this specific source to its system-wide settings. The connection is established via a URN following the fixed format: `urn:s2v:source_setting:`. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: - urn:s2v:hub_source:SAP_MASTERDATA: entity_source: '(SAP_MASTERDATA, KNA1)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' # 7.4. Source System Configuration URN ``` - **7.5. Business Key Mapping ( `business_key_mapping`):** Defines how the `target_business_key_columns` (e.g., `CUSTOMER_BK`) are populated from the columns of this specific source. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: - urn:s2v:hub_source:SAP_MASTERDATA: entity_source: '(SAP_MASTERDATA, KNA1)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: # 7.5. Business Key Mapping - CUSTOMER_BK: # Target business key column name - 'KUNNR' # Source column from '(SAP_MASTERDATA, KNA1)' ``` - **7.6. Source Business Key ( `source_business_key`):** Used in multi-master Hubs (if `requires_source_business_key: true`) to identify the contributing source system for this key. If not required or not a multi-master scenario, it can be an empty string. ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' entity_type: 'hub' concatenate_business_keys: false requires_source_business_key: false enable_refresh: true target_business_key_columns: - 'CUSTOMER_BK' entity_sources: - urn:s2v:hub_source:SAP_MASTERDATA: entity_source: '(SAP_MASTERDATA, KNA1)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - CUSTOMER_BK: - 'KUNNR' source_business_key: '' # 7.6. Source Business Key ``` ### Complete Hub YAML (Multi-Source Example) [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hub/index.html\#complete-hub-yaml-multi-source-example "Direct link to Complete Hub YAML (Multi-Source Example)") Here is the complete YAML definition for our `HUB_CUSTOMER`, now showing multiple sources. This demonstrates how a Hub can integrate business keys from various systems. ```codeBlockLines_e6Vv name: 'HUB_CUSTOMER' entity_type: 'hub' concatenate_business_keys: false requires_source_business_key: false enable_refresh: true target_business_key_columns: - 'CUSTOMER_BK' entity_sources: - urn:s2v:hub_source:SAP_MASTERDATA: entity_source: '(SAP_MASTERDATA, KNA1)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - CUSTOMER_BK: - 'KUNNR' source_business_key: '' - urn:s2v:hub_source:SFDC_SALESFORCE_ACCOUNT: entity_source: (SFDC, SALESFORCE_ACCOUNT) source_filter: source_system_configuration_urn: urn:s2v:source_setting:SALESFORCE source_business_key: business_key_mapping: - CUSTOMER_BK: - "ID" ``` - [Complete Hub YAML (Multi-Source Example)](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hub/index.html#complete-hub-yaml-multi-source-example) ## Build Hub Satellite [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hubsat/index.html#__docusaurus_skipToContent_fallback) On this page This tutorial will guide you through building a complete Hub Satellite object YAML definition step-by-step. A Hub Satellite holds descriptive attributes for a parent Hub and tracks their history. We'll build a Hub Satellite named `SAT_SAP_CEPC` connected to a `HUB_PROFIT_CENTER`. 1. **Satellite Name ( `name`):** This is the unique identifier for your Hub Satellite object. ```codeBlockLines_e6Vv name: 'SAT_SAP_CEPC' # 1. Satellite Name ``` 2. **Entity Type ( `entity_type`):** Specifies that this object is a Hub Satellite. For Link Satellites, you would use `linksat`. ```codeBlockLines_e6Vv name: 'SAT_SAP_CEPC' entity_type: 'hubsat' # 2. Entity Type ``` 3. **Connected Entity ( `connected_entity`):** The name of the parent Hub (or Link for Link Satellites) to which this Satellite is attached. This Hub must be defined in your data vault model. ```codeBlockLines_e6Vv name: 'SAT_SAP_CEPC' entity_type: 'hubsat' connected_entity: 'HUB_PROFIT_CENTER' # 3. Connected Entity (Parent Hub) ``` 4. **Enable Refresh ( `enable_refresh`):** A boolean flag indicating whether the Satellite receives new data. Set to `true` if the object can receive new data (typical for Satellites), or `false` if it only contains historical records and will not be updated. ```codeBlockLines_e6Vv name: 'SAT_SAP_CEPC' entity_type: 'hubsat' connected_entity: 'HUB_PROFIT_CENTER' enable_refresh: true # 4. Enable Refresh ``` 5. **Skip Hashdiff Comparison ( `skip_hashdiff_comparison`):** A boolean flag. If `true`, S2V will skip comparing the hashdiff of incoming records with existing records in the Satellite. This means every incoming record for a given business key will be treated as a new version, regardless of whether its descriptive attributes have changed. ```codeBlockLines_e6Vv name: 'SAT_SAP_CEPC' entity_type: 'hubsat' connected_entity: 'HUB_PROFIT_CENTER' enable_refresh: true skip_hashdiff_comparison: false # 5. Skip Hashdiff Comparison ``` 6. **Ordering Columns ( `ordering_columns`):** A list of column names from the source that determine the sequence of records when multiple changes occur for the same business key at the same load datetime. This helps in correctly identifying the latest change e.g. if the entire batch od records contains the same load datetime. Can be an empty list `[]` if not needed. ```codeBlockLines_e6Vv name: 'SAT_SAP_CEPC' entity_type: 'hubsat' connected_entity: 'HUB_PROFIT_CENTER' enable_refresh: true skip_hashdiff_comparison: false ordering_columns: [] # 6. Ordering Columns (empty list in this case) ``` 7. **Entity Source ( `entity_source`):** For Hub Satellites, this defines the single source that feeds data into this Satellite. It's a tuple-keyed map. - **7.1. Source Location Tuple (Key):** The key of the `entity_source` map is a tuple `(SOURCE_DATABASE, SOURCE_SCHEMA, SOURCE_TABLE)` or `(SOURCE_SCHEMA, SOURCE_TABLE)` specifying the physical location of the source data. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: # 7. Entity Source (SAP_MASTERDATA, CEPC): # 7.1. Source Location Tuple ``` - **7.2. Source Filter ( `source_filter`):** An optional SQL condition applied to the source data. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: (SAP_MASTERDATA, CEPC): source_filter: '' # 7.2. Nested Source Filter ``` - **7.3. Use Source CDC Flag ( `use_source_cdc_flag`):** Boolean indicating if the source provides a CDC (Change Data Capture) flag. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: (SAP_MASTERDATA, CEPC): source_filter: '' use_source_cdc_flag: false # 7.3. Nested Use Source CDC Flag ``` - **7.4. Source System Configuration URN ( `source_system_configuration_urn`):** Links this source to its global system settings. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: (SAP_MASTERDATA, CEPC): source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:SAP' # 7.4. Source System Config URN ``` - **7.5. Business Key Mapping ( `business_key_mapping`):** Defines how the business key(s) of the parent Hub ( `HUB_PROFIT_CENTER`) are identified from the columns of this Satellite's source table. This mapping ensures the Satellite record is correctly linked to its parent Hub record. - The key (e.g., `PROFIT_CENTER_BK`) is the name of the business key column in the parent `HUB_PROFIT_CENTER`. - The value (e.g., `PRCTR` and `KOKRS`) is the list of corresponding column(s) in the Satellite's source table ( `CEPC`). ```codeBlockLines_e6Vv # ... (previous properties) entity_source: (SAP_MASTERDATA, CEPC): source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: # 7.5. Business Key Mapping - PROFIT_CENTER_BK: # Business key column name in HUB_PROFIT_CENTER - 'PRCTR' # Source column from CEPC - "KOKRS" # Source column from CEPC ``` - **7.6. Source Business Key ( `source_business_key`):** If the parent Hub is a multi-master Hub, the Satellite needs to align with it's specific source settings, thought inherit the value from the hub. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: (SAP_MASTERDATA, CEPC): source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - PROFIT_CENTER_BK: - 'PRCTR' - 'KOKRS' source_business_key: '' # 7.6. Source Business Key ``` Lookup Mapping for Satellites Typically, a Hub Satellite's source is also a Hub's source and data directly contains the business key of its parent Hub. In such cases, you use `business_key_mapping` to link them, as shown in the example above. However, sometimes the Satellite's source data might _not_ have the parent Hub's business key directly. Instead, it might have a different identifier (like a foreign key or an alternative ID). If this identifier can be used to find the correct Hub business key by looking it up in one of the _parent Hub's own source tables_, then you should use `lookup_mapping` for the Satellite. With `lookup_mapping`, you tell S2V: 1. Which of the parent Hub's sources to look into (using the Hub's source URN). 2. Which column(s) from the Satellite's source to use for the lookup. 3. Which column(s) in the Hub's source table to match against. This allows S2V to resolve the correct parent Hub key for the Satellite record. For a detailed guide on both mapping types, please see the "Business Key and Lookup Mappings" documentation. 8. **Historized Columns ( `historized_columns`):** A list of column names from the source that contain descriptive attributes whose changes over time should be tracked in the Satellite. Each change to these attributes for a given business key will result in a new record in the Satellite. ```codeBlockLines_e6Vv # ... (previous properties) historized_columns: # 8. Historized Columns - 'ABTEI' - 'VERAK' - 'USNAM' - 'ERSDA' - 'WAERS' ``` 9. **Non-Historized Columns ( `non_historized_columns`):** A list of column names from the source that contain descriptive attributes whose current value should be stored, but whose history is not tracked. ```codeBlockLines_e6Vv # ... (previous properties) non_historized_columns: # 9. Non-Historized Columns - "PRCTR" - "KOKRS" ``` ### Complete Hub Satellite YAML [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hubsat/index.html\#complete-hub-satellite-yaml "Direct link to Complete Hub Satellite YAML") Here is the complete YAML definition for our `SAT_SAP_CEPC` Hub Satellite: ```codeBlockLines_e6Vv name: 'SAT_SAP_CEPC' entity_type: 'hubsat' # Could also be 'linksat' for a Link Satellite enable_refresh: true skip_hashdiff_comparison: false ordering_columns: [] # Optional: for sequencing records with same load timestamp connected_entity: 'HUB_PROFIT_CENTER' # Name of the parent Hub entity_source: (SAP_MASTERDATA, CEPC): # Source: (Schema, Table) source_filter: '' # Optional filter source_system_configuration_urn: 'urn:s2v:source_setting:SAP' use_source_cdc_flag: false source_business_key: '' # Optional source system identifier business_key_mapping: # How to get the parent Hub's business key - PROFIT_CENTER_BK: # Business key column name in HUB_PRODUCT - "PRCTR" # Corresponding column in CEPC table - "KOKRS" # Corresponding column in CEPC table historized_columns: # Attributes whose history is tracked - 'ABTEI' - 'VERAK' - 'USNAM' - 'ERSDA' - 'WAERS' non_historized_columns: # Attributes whose history is NOT tracked (current value stored) - "PRCTR" - "KOKRS" ``` - [Complete Hub Satellite YAML](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-hubsat/index.html#complete-hub-satellite-yaml) ## Build Link Object Tutorial [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-link/index.html#__docusaurus_skipToContent_fallback) On this page This tutorial will guide you through building a complete Link object YAML definition step-by-step. A Link establishes relationships between two or more Hubs. We'll build a Link named `LNK_SALES_ORDER` that represents an item on an order, connecting `HUB_SALES_ORDER`, `HUB_MATERIAL` , `HUB_PROFIT_CENTER` and `HUB_CUSTOMER`. 1. **Link Name ( `name`):** This is the unique identifier for your Link object. ```codeBlockLines_e6Vv name: 'LNK_SALES_ORDER' # 1. Link Name ``` 2. **Entity Type ( `entity_type`):** Specifies that this object is a Link. ```codeBlockLines_e6Vv name: 'LNK_SALES_ORDER' entity_type: 'link' # 2. Entity Type ``` 3. **Enable Refresh ( `enable_refresh`):** A boolean flag indicating whether the link receives new data. Set to `true` if the object can receive new data, set to `false` if the object contains only historical records. ```codeBlockLines_e6Vv name: 'LNK_SALES_ORDER' entity_type: 'link' enable_refresh: true # 3. Enable Refresh ``` 4. **Connected Hubs ( `connected_hubs`):** This property defines the hubs a link connects, specified as a list of key-value pairs. - **Key**: Represents the hub's alias, or the specific name of the relationship. This is necessary because a link can connect to the same hub multiple times, requiring unique aliases to distinguish each connection. The hub alias must be unique within the connected\_hubs list. - **Value**: Represents the actual name of the hub, which must refer to an existing hub definition. - First, we define the list: ```codeBlockLines_e6Vv name: 'LNK_SALES_ORDER' entity_type: 'link' enable_refresh: true connected_hubs: # 4. Connected Hubs - CUSTOMER: 'HUB_CUSTOMER' # : - SALES_ORDER: 'HUB_SALES_ORDER' - PROFIT_CENTER: 'HUB_PROFIT_CENTER' - MATERIAL: 'HUB_MATERIAL' ``` 5. **Entity Sources ( `entity_sources`):** This property defines the source(s) that feed data into this Link. For Links, it's a list, meaning a Link can be populated from multiple sources. We'll define one source for this example. - **5.1. Source URN Identifier**: This is the unique URN key for this specific source definition within the list. The prefix often indicates the DV object type (e.g., `hub_source`, `link_source`). ```codeBlockLines_e6Vv name: 'LNK_SALES_ORDER' entity_type: 'link' enable_refresh: true connected_hubs: - CUSTOMER: 'HUB_CUSTOMER' - SALES_ORDER: 'HUB_SALES_ORDER' - PROFIT_CENTER: 'HUB_PROFIT_CENTER' - MATERIAL: 'HUB_MATERIAL' entity_sources: # 5. Entity Sources - urn:s2v:link_source:SAP_OE: # 5.1. Source URN Identifier ``` - **5.2. Source Location ( `entity_source`):** Specifies the physical location of this source's data. - For multiple source databases, use a triple: `(source_database, source_schema, source_table)`. - For a single source database context: `(source_schema, source_table)`. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: - urn:s2v:link_source:SAP_OE: entity_source: '(SAP_OE, 2LIS_11_VAITM)' # 5.2. Source Location ``` - **5.3. Source Filter ( `source_filter`):** An optional SQL condition applied to the source data. You can use any valid SQL code that can be inserted into a WHERE clause. Leave it empty if no filters are required. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: - urn:s2v:link_source:SAP_OE: entity_source: '(SAP_OE, 2LIS_11_VAITM)' source_filter: '' # 5.3. Source Filter ``` - **5.4. Source System Configuration URN ( `source_system_configuration_urn`):** Links this specific source to its system-wide settings. The connection is established via a URN following the fixed format: `urn:s2v:source_setting:`. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: - urn:s2v:link_source:SAP_OE: entity_source: '(SAP_OE, 2LIS_11_VAITM)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' # 5.4. Source System Configuration URN ``` - **4.5. CDC Flag ( `use_source_cdc_flag`):** A boolean property indicating whether the source data contains a Change Data Capture (CDC) flag. - If `false`, an insert-only source is expected. - If `true`, the source is expected to capture other actions (e.g., deletes or updates). In this case, ensure appropriate mappings are defined in the linked source system settings. ```codeBlockLines_e6Vv # ... (previous properties) entity_sources: - urn:s2v:link_source:SAP_OE: entity_source: '(SAP_OE, 2LIS_11_VAITM)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' use_source_cdc_flag: false # 4.5. CDC flag ``` - **5.6. Connected Hub Relations ( `connected_hub_relations`):** This crucial property defines how the business keys for each connected Hub (defined in step 3) are sourced from _this specific entity source_. It's a list, and each item corresponds to a Hub alias from the `connected_hubs` property. For each Hub alias, you define its `business_key_mapping`. This maps the Hub's business key (as named in `connected_hubs`) to the corresponding column(s) in the current source table ( `SD_SALES_ORDERS` in this case). In case the source table doesn't contain source columns that might be directly mapped to the Hub's business key, use `lookup_mapping`. Read more about differences in \[TBD REFERENCE\]. ```codeBlockLines_e6Vv name: 'LNK_SALES_ORDER' entity_type: 'link' enable_refresh: true connected_hubs: - CUSTOMER: 'HUB_CUSTOMER' - SALES_ORDER: 'HUB_SALES_ORDER' - PROFIT_CENTER: 'HUB_PROFIT_CENTER' - MATERIAL: 'HUB_MATERIAL' entity_sources: - urn:s2v:link_source:SAP_OE: entity_source: '(SAP_OE, 2LIS_11_VAITM)' source_filter: '' source_system_configuration_urn: 'urn:s2v:source_setting:SAP' use_source_cdc_flag: false connected_hub_relations: # 5.6. Connected Hub Relations - CUSTOMER: # Corresponds to alias in connected_hubs business_key_mapping: - CUSTOMER_BK: # Hub's business key name - 'KUNNR' # Source column from CUSTOMER_BK source_business_key: '' # Optional: for multi-master scenarios - SALES_ORDER: source_business_key: 'SAP_OE_SD' business_key_mapping: - SALES_ORDER_BK: - 'VBELN' - 'POSNR' - PROFIT_CENTER: source_business_key: '' business_key_mapping: - PROFIT_CENTER_BK: - "PRCTR" - "KOKRS" - MATERIAL: source_business_key: '' business_key_mapping: - MATERIAL_BK: - 'MATNR' ``` ### Complete Link YAML [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-link/index.html\#complete-link-yaml "Direct link to Complete Link YAML") Here is the complete YAML definition for our `LNK_SALES_ORDER` Link: ```codeBlockLines_e6Vv name: 'LNK_SALES_ORDER' entity_type: 'link' enable_refresh: True connected_hubs: - CUSTOMER: 'HUB_CUSTOMER' # : - SALES_ORDER: 'HUB_SALES_ORDER' - PROFIT_CENTER: 'HUB_PROFIT_CENTER' - MATERIAL: 'HUB_MATERIAL' entity_sources: - urn:s2v:link_source:SAP_OE: # Unique URN for this source entity_source: '(SAP_OE, 2LIS_11_VAITM)' # Source: (Schema, Table) source_filter: # Optional filter source_system_configuration_urn: 'urn:s2v:source_setting:SAP' # Link to global source settings use_source_cdc_flag: False connected_hub_relations: - CUSTOMER: source_business_key: '' # Optional source system identifier business_key_mapping: - CUSTOMER_BK: # Business key name from HUB_CUSTOMER - 'KUNNR' # Source column KUNNR participating in business key - SALES_ORDER: source_business_key: 'SAP_OE_SD' business_key_mapping: - SALES_ORDER_BK: - 'VBELN' - 'POSNR' - PROFIT_CENTER: source_business_key: '' business_key_mapping: - PROFIT_CENTER_BK: - "PRCTR" - "KOKRS" - MATERIAL: source_business_key: '' business_key_mapping: - MATERIAL_BK: - 'MATNR' ``` - [Complete Link YAML](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-link/index.html#complete-link-yaml) ## Build Link Satellite [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-linksat/index.html#__docusaurus_skipToContent_fallback) On this page This tutorial will guide you through building a complete Link Satellite object YAML definition step-by-step. A Link Satellite holds descriptive attributes for a parent Link and tracks their history, specifically in relation to one of the Link's sources. We'll build a Link Satellite named `LNK_SAT_SAP_OE_SALES_ORDER` connected to a parent Link `LNK_SALES_ORDER` and specifically to one of its sources. 1. **Satellite Name ( `name`):** This is the unique identifier for your Link Satellite object. ```codeBlockLines_e6Vv name: 'LNK_SAT_SAP_OE_SALES_ORDER' # 1. Satellite Name ``` 2. **Entity Type ( `entity_type`):** Specifies that this object is a Link Satellite. For Hub Satellites, you would use `hubsat`. ```codeBlockLines_e6Vv name: 'LNK_SAT_SAP_OE_SALES_ORDER' entity_type: 'linksat' # 2. Entity Type ``` 3. **Connected Entity ( `connected_entity`):** The name of the parent Link to which this Satellite is attached. This Link must be defined elsewhere in your data vault model. ```codeBlockLines_e6Vv name: 'LNK_SAT_SAP_OE_SALES_ORDER' entity_type: 'linksat' connected_entity: 'LNK_SALES_ORDER' # 3. Connected Entity (Parent Link) ``` 4. **Connected Entity Source Reference ( `connected_entity_source_ref`):** This is a crucial property for Link Satellites. It's the URN of the specific source _within the parent Link_ ( `LNK_SALES_ORDER` in this case) from which this Satellite inherites the source attributes. The Link Satellite's attributes are contextual to this particular source of the Link. ```codeBlockLines_e6Vv name: 'LNK_SAT_SAP_OE_SALES_ORDER' entity_type: 'linksat' connected_entity: 'LNK_SALES_ORDER' connected_entity_source_ref: 'urn:s2v:link_source:SAP_OE' # 4. URN of the parent Link's source ``` 5. **Enable Refresh ( `enable_refresh`):** A boolean flag indicating whether the Satellite receives new data. ```codeBlockLines_e6Vv name: 'LNK_SAT_SAP_OE_SALES_ORDER' entity_type: 'linksat' connected_entity: 'LNK_SALES_ORDER' connected_entity_source_ref: 'urn:s2v:link_source:SAP_OE' enable_refresh: true # 5. Enable Refresh ``` 6. **Skip Hashdiff Comparison ( `skip_hashdiff_comparison`):** A boolean flag. If `true`, S2V will skip comparing the hashdiff of incoming records. ```codeBlockLines_e6Vv name: 'LNK_SAT_SAP_OE_SALES_ORDER' entity_type: 'linksat' connected_entity: 'LNK_SALES_ORDER' connected_entity_source_ref: 'urn:s2v:link_source:SAP_OE' enable_refresh: true skip_hashdiff_comparison: false # 6. Skip Hashdiff Comparison ``` 7. **Ordering Columns ( `ordering_columns`):** A list of column names from the source (defined in the parent Link's referenced source) that determine the sequence of records when multiple changes occur for the same relationship instance at the same load datetime. Can be an empty list `[]`. ```codeBlockLines_e6Vv name: 'LNK_SAT_SAP_OE_SALES_ORDER' entity_type: 'linksat' connected_entity: 'LNK_SALES_ORDER' connected_entity_source_ref: 'urn:s2v:link_source:SAP_OE' enable_refresh: true skip_hashdiff_comparison: false ordering_columns: [] # 7. Ordering Columns ``` 8. **Historized Columns ( `historized_columns`):** A list of column names from the parent Link's referenced source that contain descriptive attributes whose changes over time should be tracked in this Link Satellite. ```codeBlockLines_e6Vv # ... (previous properties) historized_columns: # 8. Historized Columns - 'ZMENG' - 'ZIEME' - 'SMENG' - 'LFMNG' ``` 9. **Non-Historized Columns ( `non_historized_columns`):** A list of column names from the parent Link's referenced source that contain descriptive attributes whose current value should be stored, but whose history is not tracked in this Link Satellite. ```codeBlockLines_e6Vv # ... (previous properties) non_historized_columns: # 9. Non-Historized Columns - 'KUNNR' - 'VBELN' - 'POSNR' - 'MATNR' - "PRCTR" - 'KOKRS' ``` ### Complete Link Satellite YAML [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-linksat/index.html\#complete-link-satellite-yaml "Direct link to Complete Link Satellite YAML") Here is the complete YAML definition for our `LNK_SAT_SAP_OE_SALES_ORDER` Link Satellite: ```codeBlockLines_e6Vv name: 'LNK_SAT_SAP_OE_SALES_ORDER' entity_type: 'linksat' connected_entity: 'LNK_SALES_ORDER' # Name of the parent Link connected_entity_source_ref: 'urn:s2v:link_source:SAP_OE' # URN of the source within LNK_SALES_ORDER enable_refresh: true skip_hashdiff_comparison: false ordering_columns: [] # Optional: for sequencing records historized_columns: # Attributes from the Link's source whose history is tracked - 'ZMENG' - 'ZIEME' - 'SMENG' - 'LFMNG' non_historized_columns: # Attributes from the Link's source, current value stored - 'KUNNR' - 'VBELN' - 'POSNR' - 'MATNR' - "PRCTR" - 'KOKRS' ``` info Remember that the actual source table ( `SAP_OE`, `2LIS_11_VAITM`) are defined within the `urn:s2v:link_source:SAP_OE` source of the `LNK_SALES_ORDER` Link. - [Complete Link Satellite YAML](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-linksat/index.html#complete-link-satellite-yaml) ## Build NHL Tutorial [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-nhl/index.html#__docusaurus_skipToContent_fallback) On this page This tutorial will guide you through building a complete Non-Historized Link (NHL) object YAML definition step-by-step. An NHL captures relationships between Hubs and can also store descriptive attributes about that relationship, similar to a Satellite. Unlike regular Links, NHLs are fed from a single source. We'll build an NHL named `NHL_SALES_ORDER_CONDITION_TYPES` that connects `HUB_SALES_ORDER` and `HUB_CONDITION_TYPE` and stores condition types applied for each sold item in the sales order. 1. **NHL Name ( `name`):** This is the unique identifier for your NHL object. ```codeBlockLines_e6Vv name: 'NHL_SALES_ORDER_CONDITION_TYPES' # 1. NHL Name ``` 2. **Entity Type ( `entity_type`):** Specifies that this object is a Non-Historized Link. ```codeBlockLines_e6Vv name: 'NHL_SALES_ORDER_CONDITION_TYPES' entity_type: 'non_historized_link' # 2. Entity Type ``` 3. **Enable Refresh ( `enable_refresh`):** A boolean flag indicating whether the NHL receives new data or updates. ```codeBlockLines_e6Vv name: 'NHL_SALES_ORDER_CONDITION_TYPES' entity_type: 'non_historized_link' enable_refresh: true # 3. Enable Refresh ``` 4. **Connected Hubs ( `connected_hubs`):** This section lists the Hubs that this NHL connects, similar to a regular Link. Each Hub is given an alias, and you specify the name of the business key column from that Hub as it will appear in the NHL table. ```codeBlockLines_e6Vv name: 'NHL_SALES_ORDER_CONDITION_TYPES' entity_type: 'non_historized_link' enable_refresh: true connected_hubs: # 4. Connected Hubs - SALES_ORDER: 'HUB_SALES_ORDER' # Alias: Hub Name - CONDITION_TYPE: 'HUB_CONDITION_TYPE' # Alias: Hub Name ``` 5. **Entity Source ( `entity_source`):** For NHLs, this defines the single source that feeds data. It's a URN-keyed map, where the URN identifies this source definition. - **5.1. Source URN Identifier (Key):** The key of the `entity_source` map. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: # 5. Entity Source urn:s2v:link_source:2lis_13_vdkon: # 5.1. Source URN Identifier ``` - **5.2. Source Location ( `entity_source` tuple):** Specifies the physical location of this source's data. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: urn:s2v:link_source:2lis_13_vdkon: entity_source: '(SAP_OE, 2LIS_13_VDKON)' # 5.2. Source Location ``` - **5.3. Source Filter ( `source_filter`):** An optional SQL condition. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: urn:s2v:link_source:2lis_13_vdkon: entity_source: '(SAP_OE, 2LIS_13_VDKON)' source_filter: '' # 5.3. Source Filter ``` - **5.4. Use Source CDC Flag ( `use_source_cdc_flag`):** Boolean for CDC. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: urn:s2v:link_source:2lis_13_vdkon: entity_source: '(SAP_OE, 2LIS_13_VDKON)' source_filter: '' use_source_cdc_flag: false # 5.4. Use Source CDC Flag ``` - **5.5. Source System Configuration URN ( `source_system_configuration_urn`):** Link to global source settings. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: urn:s2v:link_source:2lis_13_vdkon: entity_source: '(SAP_OE, 2LIS_13_VDKON)' source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:SAP' # 5.5. Source System Config URN ``` - **5.6. Connected Hub Relations ( `connected_hub_relations`):** Defines how the business keys for each connected Hub are sourced from this single entity source. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: urn:s2v:link_source:2lis_13_vdkon: entity_source: '(SAP_OE, 2LIS_13_VDKON)' source_filter: '' use_source_cdc_flag: false source_system_configuration_urn: 'urn:s2v:source_setting:SAP' connected_hub_relations: # 5.6. Connected Hub Relations - SALES_ORDER: business_key_mapping: - SALES_ORDER_BK: # Business key name from HUB_SALES_ORDER - 'VBELN' # Source column - 'POSNR' # Source column source_business_key: 'SAP_OE_SD' - CONDITION_TYPE: business_key_mapping: - CONDITION_TYPE_BK: # Business key name from HUB_CONDITION_TYPE - 'KSCHL' # Source column source_business_key: 'SAP_OE' ``` 6. **Historized Columns ( `historized_columns`):** A list of column names from the source that contain descriptive attributes of the relationship whose changes over time should be tracked. ```codeBlockLines_e6Vv # ... (previous properties) historized_columns: # 6. Historized Columns - 'GBSTK' - 'WBSTK' - 'AEDAT' - 'KUNRG' - 'KUNAG' - 'BUKRS' - 'BZIRK' ``` 7. **Non-Historized Columns ( `non_historized_columns`):** A list of column names from the source that contain descriptive attributes of the relationship whose current value should be stored, but whose history is not tracked. ```codeBlockLines_e6Vv # ... (previous properties) non_historized_columns: # 7. Non-Historized Columns - 'VBELN' - 'POSNR' - 'KSCHL' ``` ### Complete Non-Historized Link YAML [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-nhl/index.html\#complete-non-historized-link-yaml "Direct link to Complete Non-Historized Link YAML") Here is the complete YAML definition for our `NHL_SALES_ORDER_CONDITION_TYPES`: ```codeBlockLines_e6Vv name: 'NHL_SALES_ORDER_CONDITION_TYPES' entity_type: 'non_historized_link' enable_refresh: true connected_hubs: - HUB_SALES_ORDER: 'HUB_SALES_ORDER' - HUB_CONDITION_TYPE: 'HUB_CONDITION_TYPE' entity_source: urn:s2v:link_source:2lis_13_vdkon: # URN identifying this source configuration entity_source: '(SAP_OE, 2LIS_13_VDKON)' # Physical source source_filter: '' # Optional filter use_source_cdc_flag: false # CDC info from source source_system_configuration_urn: 'urn:s2v:source_setting:sap' # Global settings connected_hub_relations: # How to get Hub business keys from this source - SALES_ORDER: business_key_mapping: - SALES_ORDER_BK: # Business key name from HUB_SALES_ORDER - 'VBELN' # Source column VBELN from table 2LIS_13_VDKON - 'POSNR' # Source column POSNR from table 2LIS_13_VDKON source_business_key: 'SAP_OE_SD' # Optional source system identifier - CONDITION_TYPE: business_key_mapping: - CONDITION_TYPE_BK: - 'KSCHL' source_business_key: 'SAP_OE' historized_columns: # Attributes of the relationship whose history is tracked - 'GBSTK' - 'WBSTK' - 'AEDAT' - 'KUNRG' - 'KUNAG' - 'BUKRS' - 'BZIRK' non_historized_columns: # Attributes of the relationship, current value stored - 'VBELN' - 'POSNR' - 'KSCHL' ``` - [Complete Non-Historized Link YAML](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-nhl/index.html#complete-non-historized-link-yaml) ## Reference Object YAML Tutorial [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-detailed-guides/index.html#__docusaurus_skipToContent_fallback) On this page This tutorial will guide you through building a complete Reference object YAML definition step-by-step. Reference tables store lists of codes, categories, or other relatively static data used for lookups or classification. We'll build a Reference table named `REF_SAPSE_GENERAL_T003T` based on the example structure. 01. **Reference Table Name ( `name`):** This is the unique identifier for your Reference table. ```codeBlockLines_e6Vv name: 'REF_SAPSE_GENERAL_T003T' # 1. Reference Table Name ``` 02. **Entity Type ( `entity_type`):** Specifies that this object is a Reference table. ```codeBlockLines_e6Vv name: 'REF_SAPSE_GENERAL_T003T' entity_type: 'reference' # 2. Entity Type ``` 03. **Concatenate Business Keys ( `concatenate_business_keys`):** A boolean flag. If `true`, and multiple source columns are mapped to the `target_business_key_columns`, their values will be concatenated to form the business key of the reference table. If `false`, each mapped source column would typically correspond to a separate target business key column (if multiple are defined). For reference tables, the key is often simple or a natural key from the source. ```codeBlockLines_e6Vv name: 'REF_SAPSE_GENERAL_T003T' entity_type: 'reference' concatenate_business_keys: true # 3. Concatenate Business Keys ``` 04. **Enable Refresh ( `enable_refresh`):** A boolean flag indicating whether the reference object receives new data. Set to `true` if the object can receive new data, set to `false` if the object contains only historical records. ```codeBlockLines_e6Vv name: 'REF_SAPSE_GENERAL_T003T' entity_type: 'reference' concatenate_business_keys: true enable_refresh: true # 4. Enable Refresh ``` 05. **Target Business Key Columns ( `target_business_key_columns`):** A list defining the names of the business key(s) in the target Reference. If `concatenate_business_keys` is `true`, this lists a single column that will hold the concatenated key. ```codeBlockLines_e6Vv name: 'REF_SAPSE_GENERAL_T003T' entity_type: 'reference' concatenate_business_keys: true enable_refresh: true target_business_key_columns: # 5. Target Business Key Columns - 'REF_SAPSE_GENERAL_T003T_BK' ``` 06. **Entity Source ( `entity_source`):** For Reference tables, this defines the single source that feeds data. It's a tuple-keyed map, similar to Hub Satellites. - **6.1. Source Location Tuple (Key):** The key of the `entity_source` map is a tuple `(SOURCE_DATABASE, SOURCE_SCHEMA, SOURCE_TABLE)` or `(SOURCE_SCHEMA, SOURCE_TABLE)` specifying the physical location of the source data. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: # 6. Entity Source (SAP_SE, T003T): # 6.1. Source Location Tuple ``` - **6.2. Source Filter ( `source_filter`):** An optional SQL condition applied to the source data. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: (SAP_SE, T003T): source_filter: '' # 6.2. Source Filter (empty means no filter) ``` - **6.3. Use Source CDC Flag ( `use_source_cdc_flag`):** Boolean indicating if the source provides a CDC (Change Data Capture) flag. For reference data, this might be `true` if the source indicates active/inactive records. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: (SAP_SE, T003T): source_filter: '' use_source_cdc_flag: true # 6.3. Use Source CDC Flag ``` - **6.4. Source System Configuration URN ( `source_system_configuration_urn`):** Links this source to its global system settings. ```codeBlockLines_e6Vv # ... (previous properties) entity_source: (SAP_SE, T003T): source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:SAP' # 6.4. Source System Config URN ``` - **6.5. Business Key Mapping ( `business_key_mapping`):** Defines how the `target_business_key_columns` (e.g., `REF_SAPSE_GENERAL_T003T_BK`) are populated from the columns of this source. Since `concatenate_business_keys` is `true` in our example, multiple source columns ( `BLART`, `MANDT`, `SPRAS`) are mapped to the single target key. ```codeBlockLines_e6Vv name: 'REF_SAPSE_GENERAL_T003T' entity_type: 'reference' concatenate_business_keys: true enable_refresh: true target_business_key_columns: - 'REF_SAPSE_GENERAL_T003T_BK' entity_source: (SAP_SE, T003T): source_filter: '' use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: # 6.5. Business Key Mapping - REF_SAPSE_GENERAL_T003T_BK: - 'BLART' - 'MANDT' - 'SPRAS' ``` info You might notice that, unlike Hubs or Links, the `business_key_mapping` for a Reference table does not include a `source_business_key` property. This is because Reference tables are inherently single-source and standalone entities. The `source_business_key` is primarily used in multi-master scenarios (e.g., for Hubs) to differentiate the origin of a business key when it could come from multiple systems. Since a Reference table draws its data from one defined source, this distinction is not necessary. 07. **Historized Columns ( `historized_columns`):** For standard Reference tables, this is typically an list of columns participating in the business key as they usually don't track history in the same way Satellites do. ```codeBlockLines_e6Vv # ... (previous properties) historized_columns: # 7. Historized Columns - 'BLART' - 'MANDT' - 'SPRAS' - 'LTEXT' ``` 08. **Non-Historized Columns ( `non_historized_columns`):** This list contains descriptive attributes from the source that are not part of the business key but should be included in the Reference table. Their current values are stored. If the source for these columns changes, the Reference table record is updated. In the example, these are empty, implying all relevant data is part of the (concatenated) business key. If there were other descriptive fields, they would go here. ```codeBlockLines_e6Vv # ... (previous properties) non_historized_columns: [] # 8. Non-Historized Columns ``` 09. **Ordering Columns ( `ordering_columns`):** A list of column names from the source that determine the sequence of records if multiple records from the source map to the same business key and have the same load datetime. This helps in picking the "latest" or a specific version if duplicates arise. ```codeBlockLines_e6Vv # ... (previous properties) ordering_columns: [] # 9. Ordering Columns ``` 10. **Skip Hashdiff Comparison ( `skip_hashdiff_comparison`):** A boolean flag. If `true`, S2V will skip comparing the hashdiff of incoming records. For reference tables, this is often `false` (default) to ensure only actual changes are processed if descriptive attributes are present in `non_historized_columns`. If all data is in the key, or if you always want to overwrite, this could be `true`. ```codeBlockLines_e6Vv # ... (previous properties) skip_hashdiff_comparison: false # 10. Skip Hashdiff Comparison ``` ### Complete Reference Table YAML [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-detailed-guides/index.html\#complete-detailed-guides-table-yaml "Direct link to Complete Reference Table YAML") Here is the complete YAML definition for our `REF_SAPSE_GENERAL_T003T` Reference table: ```codeBlockLines_e6Vv name: 'REF_SAPSE_GENERAL_T003T' entity_type: 'reference' enable_refresh: true enable_history: false ordering_columns: skip_hashdiff_comparison: false concatenate_business_keys: true target_business_key_columns: - 'REF_SAPSE_GENERAL_T003T_BK' entity_source: (SAP_SE, T003T): # Source: (Schema, Table) source_filter: # Optional filter use_source_cdc_flag: true source_system_configuration_urn: 'urn:s2v:source_setting:SAP' business_key_mapping: - REF_SAPSE_GENERAL_T003T_BK: # Business key column name - 'BLART' # Corresponding column in T003T table - 'MANDT' # Corresponding column in T003T table - 'SPRAS' # Corresponding column in T003T table historized_columns: # Attributes whose history is tracked - 'BLART' - 'MANDT' - 'SPRAS' - 'LTEXT' non_historized_columns: [] # Attributes whose history is NOT tracked (current value stored) ``` - [Complete Reference Table YAML](https://s2v.reeeliance.com/docs/tutorials/tutorials/build-detailed-guides/index.html#complete-detailed-guides-table-yaml) ## S2V Project Structure Guide [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html#__docusaurus_skipToContent_fallback) On this page This tutorial guides you through using the `s2v validate` and `s2v generate` commands with both default and custom S2V project structures. We'll use a sample project and assume you are running commands from the directory where you've unzipped this project. ## Default Project Structure [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html\#default-project-structure "Direct link to Default Project Structure") In a default project structure, S2V expects all essential configuration files to reside in the root of your input model folder (e.g., `dv_model/`). **Key Configuration Files** (typically found in the root of `dv_model/`): - `data_vault_settings.yaml` - `source_system_settings.yaml` - `information_schema.csv` For more details on these files, refer to the tutorials on Data Vault Settings, Source System Settings, and Information Schema. ### Step-by-Step Guide [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html\#step-by-step-guide "Direct link to Step-by-Step Guide") **1.** **Download Sample Data:** - Download the sample project (e.g., `sample_data.zip`) from the [Sample Data page](https://s2v.reeeliance.com/docs/tutorials/sample-data). **2.** **Unzip the Project:** - Extract the contents of `sample_data.zip` to a new folder. This folder will contain a `dv_model/` subdirectory with the sample Data Vault model. **3.** **Navigate to Project Directory:** - Open your command-line interface (CLI). - Navigate into the directory where you unzipped the sample project (i.e., the directory that now contains `dv_model/`). **4.** **Run `validate`:** - Use the `s2v validate` command, providing the path to your input folder. ```codeBlockLines_e6Vv s2v validate -i dv_model/ ``` - S2V will automatically find the configuration files within the `dv_model/` folder. If successful, you'll see a success message. **5.** **Run `generate`:** - Once validation is successful, you can generate SQL scripts and the deployment scripts. ```codeBlockLines_e6Vv s2v generate -i dv_model/ -o output_files/ ``` - This command will save the generated code into an `output_files/` directory within your current working directory. **6.** **Explore Results:** - Check the `output_files/` directory for the generated code. ## Custom Project Structure [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html\#custom-project-structure "Direct link to Custom Project Structure") Sometimes, your project might have a custom structure where configuration files are not in the root of the input model folder, or you might have multiple information schema files for different environments. ### Step-by-Step Guide [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html\#step-by-step-guide-1 "Direct link to Step-by-Step Guide") **1.** **Prepare Sample Data with Custom Structure:** - Start with the `dv_model/` folder as prepared for the "Default Project Structure" tutorial. - Alternatively, you can download a pre-configured custom project structure example from the [Sample Data page](https://s2v.reeeliance.com/docs/tutorials/sample-data) to follow along. - Modify its structure as follows: - Create a `configuration` subfolder inside `dv_model/`. - Move `dv_model/data_vault_settings.yaml` to `dv_model/configuration/data_vault_settings.yaml`. - Move `dv_model/source_system_settings.yaml` to `dv_model/configuration/source_system_settings.yaml`. - Create a `sources` subfolder inside `dv_model/`. - Move `dv_model/information_schema.csv` to `dv_model/sources/dev_inf_schema.csv` (renaming it to illustrate using a custom schema file name). - (Optional) You might also have other environment-specific schema files like `uat_inf_schema.csv` and `prod_inf_schema.csv` in the `sources/` folder. Your `dv_model/` directory should now look something like this: ```codeBlockLines_e6Vv dv_model/ ├── HUBS/hub_*.yaml ├── LINKS/link_*.yaml ├── SATELLITES/sat_*.yaml ├── REFERENCES/ref_*.yaml configuration/ ├── data_vault_settings.yaml └── source_system_settings.yaml sources/ ├── dev_inf_schema.csv ├── uad_inf_schema.csv └── prod_inf_schema.csv ``` **2.** **Run `validate`:** - Run the validate command as before: ```codeBlockLines_e6Vv s2v validate -i dv_model/ ``` - You will encounter errors because S2V cannot find the configuration files in their default root location within `dv_model/`. The expected error messages would be similar to: ```codeBlockLines_e6Vv [ERROR] data_vault_settings: FileNotFound:Missing data vault settings. Please include data_vault_settings.yaml in the data vault folder. [ERROR] information_schema: FileNotFound:Missing information schema. Please include information_schema.csv in the data vault folder. [ERROR] source_system_settings: FileNotFound:Missing source system settings. Please include source_system_settings.yaml in the data vault folder. ``` **3.** **Run `validate` with Specific Paths:** - To correctly validate a project with a custom structure, you need to provide the paths to your configuration files using the specific options: ```codeBlockLines_e6Vv s2v validate -i dv_model \ --data-vault-settings-path dv_model/configuration/data_vault_settings.yaml \ --source-system-settings-path dv_model/configuration/source_system_settings.yaml \ --information-schema-path dv_model/sources/dev_inf_schema.csv ``` **4.** **Run `generate` with Specific Paths:** - Similarly, for the `generate` command, you'll need to specify these paths: ```codeBlockLines_e6Vv s2v generate -i dv_model/ \ -o output_files/ \ --data-vault-settings-path dv_model/configuration/data_vault_settings.yaml \ --source-system-settings-path dv_model/configuration/source_system_settings.yaml \ --information-schema-path dv_model/sources/dev_inf_schema.csv ``` **5.** **Explore Results:** - Inspect the `output_files/` for the generated code. This approach gives you the flexibility to organize your S2V projects as needed, especially in more complex scenarios or when integrating with existing project structures. For a full list of options for each command, you can always use `s2v validate --help` or `s2v generate --help`. - [Default Project Structure](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html#default-project-structure) - [Step-by-Step Guide](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html#step-by-step-guide) - [Custom Project Structure](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html#custom-project-structure) - [Step-by-Step Guide](https://s2v.reeeliance.com/docs/tutorials/tutorials/project-structure/index.html#step-by-step-guide-1) ## Install GNU Make on Windows [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html#__docusaurus_skipToContent_fallback) On this page This guide explains how to install GNU Make on Windows using the GnuWin32 package. This method is useful if you prefer not to use package managers like Chocolatey or Scoop, or if they are not available in your environment. ### 1\. Download and Install Make from GnuWin32 [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html\#1-download-and-install-make-from-gnuwin32 "Direct link to 1. Download and Install Make from GnuWin32") - Navigate to the [GnuWin32 page for Make](https://gnuwin32.sourceforge.net/packages/make.htm). - Download the setup program (e.g., `make-3.81-setup.exe` or the latest version available). - Run the downloaded installer and follow the on-screen prompts. The default installation directory is typically `C:\Program Files (x86)\GnuWin32`. Make a note of this path for the next step. ### 2\. Add Make to the System PATH [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html\#2-add-make-to-the-system-path "Direct link to 2. Add Make to the System PATH") To run `make` from any directory in PowerShell or Command Prompt, you need to add its installation directory to your system's PATH environment variable. - In the Windows search bar, type "environment variables" and select "Edit the system environment variables." - In the System Properties window that opens, click the "Environment Variables..." button. - In the Environment Variables window, under the "System variables" section, find and select the variable named `Path`. - Click the "Edit..." button. - Click "New" and add the path to the `bin` directory of your GnuWin32 installation. If you used the default installation path, this will be `C:\Program Files (x86)\GnuWin32\bin`. - Click "OK" on all open dialog windows to save the changes. ![img](https://s2v.reeeliance.com/assets/images/windows-4c4d60865e41db789e5a7cf8d8bbd1ad.png) ### 3\. Verify the Installation [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html\#3-verify-the-installation "Direct link to 3. Verify the Installation") - Open a **new** PowerShell or Command Prompt window. (Changes to environment variables only take effect in new terminal sessions). - Type the following command and press Enter: ```codeBlockLines_e6Vv make --version ``` - You should see output displaying the GNU Make version, similar to: ```codeBlockLines_e6Vv GNU Make 3.81 Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ``` ### 4\. Makefile Considerations (Optional) [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html\#4-makefile-considerations-optional "Direct link to 4. Makefile Considerations (Optional)") Some Makefiles, especially those written for Unix-like systems, might use `$(shell pwd)` to determine the current directory. The `pwd` command may not be available by default on Windows, which can cause errors. The standard and portable Make variable for getting the current directory is `$(CURDIR)`. If you encounter a Makefile that uses `BASE_DIR := $(shell pwd)` or similar, it's best to change it to `BASE_DIR := $(CURDIR)` for better cross-platform compatibility. - [1\. Download and Install Make from GnuWin32](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html#1-download-and-install-make-from-gnuwin32) - [2\. Add Make to the System PATH](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html#2-add-make-to-the-system-path) - [3\. Verify the Installation](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html#3-verify-the-installation) - [4\. Makefile Considerations (Optional)](https://s2v.reeeliance.com/docs/tutorials/tutorials/run-make-on-windows/index.html#4-makefile-considerations-optional) ## YAML Data Serialization [Skip to main content](https://s2v.reeeliance.com/docs/tutorials/tutorials/YAML/index.html#__docusaurus_skipToContent_fallback) On this page YAML (a recursive acronym for "YAML Ain't Markup Language") is a human-readable data serialization language. It is commonly used for configuration files and in applications where data is being stored or transmitted. YAML aims to be easily readable by humans and straightforward for software to parse. Stream2Vault uses YAML extensively for defining your Data Vault model objects (Hubs, Links, Satellites, etc.) and their sources. ## Basic Structure [​](https://s2v.reeeliance.com/docs/tutorials/tutorials/YAML/index.html\#basic-structure "Direct link to Basic Structure") YAML's structure is based on indentation to denote structure and relies on a few core concepts: 1. **Key-Value Pairs (Mappings):** The most fundamental structure in YAML is the key-value pair, also known as a mapping or dictionary. ```codeBlockLines_e6Vv key: "value" name: John Doe age: 30 isStudent: false ``` tip Usage of single `'` or double `"` quotes for string variables is **optional**. The following mappings are equivalent: ```codeBlockLines_e6Vv name: Jane Doe name: 'Jane Doe' name: "Jane Doe" ``` 2. **Lists (Sequences):** Lists (or arrays/sequences) are denoted by a hyphen ( `-`) followed by a space. Each item in the list is on a new line with the same indentation. ```codeBlockLines_e6Vv fruits: - Apple - Banana - Orange # Or an inline list (less common for complex items) colors: [Red, Green, Blue] ``` 3. **Nested Structures:** YAML allows for complex data structures by nesting mappings and sequences. Indentation is crucial here. ```codeBlockLines_e6Vv person: name: Jane Doe age: 28 address: street: Mariendorfer Damm city: Berlin hobbies: - cooking - coding ``` 4. **Scalars (Data Types):** YAML automatically detects common data types: - **Strings:** `hello world`, `'quoted string'`, `"double-quoted string"` - **Numbers:** `123` (integer), `3.14` (float) - **Booleans:** `true`, `false`, `yes`, `no`, `on`, `off` - **Nulls:** `null` or `~` - **Dates/Timestamps:** `2023-10-27`, `2023-10-27T10:30:00Z` (ISO 8601 format) 5. **Comments:** Lines beginning with a `#` are comments and are ignored by the parser. ```codeBlockLines_e6Vv # This is a comment setting: value # This is an inline comment ``` 6. **Indentation:** - **Crucial!** Indentation (using spaces, not tabs) defines the structure. - The number of spaces for indentation must be consistent within the same block. Two spaces are a common convention. tip - **Consistent Indentation:** Use spaces, not tabs. The most common convention is 2 spaces per indent level. Be consistent throughout your file. - **Validate Your YAML:** Use a YAML linter or validator (many online tools and IDE extensions are available) to catch syntax errors early. This is especially helpful when you're starting out. - **Quoting Strings:** - Strings don't always need quotes. However, quote strings if they: - Contain special characters (e.g., `:`, `{`, `}` or others). - Start with a character that could be misinterpreted as another type (e.g., `true`, `false`, `null`, numbers, `yes`, `no`). - Need to preserve leading/trailing whitespace. - [Basic Structure](https://s2v.reeeliance.com/docs/tutorials/tutorials/YAML/index.html#basic-structure)