Skip to main content

Understanding the Generated Code Output

Stream2Vault (S2V) generates a structured set of SQL scripts and Makefiles designed to facilitate the deployment and management of your Data Vault 2.0 objects in Snowflake. This document explains the organization of these generated files.

Output Directory Structure

All generated code is placed within a main output directory (e.g., GENERATED_CODE/). This directory contains:

  • An INIT_DYNAMIC_TABLE/ (and/ or INIT_TABLE/) folder for initialization scripts.
  • A DYNAMIC_TABLE/ (and/ or TABLE/) folder containing the SQL definitions for your Data Vault entities.
  • A root Makefile to orchestrate the deployment.
  • Several auxiliary CSV and JSON files providing metadata and lineage information.

Simplified Directory Tree Example

Here's a simplified view of the typical directory structure, focusing on a Hub and its Satellite to illustrate the pattern:

GENERATED_CODE/
├── DYNAMIC_TABLE/
│ ├── HUB/
│ │ ├── HUB_MATERIAL/
│ │ │ ├── MAIN_LAYER/
│ │ │ │ └── HUB_MATERIAL.sql
│ │ │ ├── PREP_LAYER/
│ │ │ │ ├── HUB_MATERIAL[urn_s2v_hub_source_src_1].sql
│ │ │ │ ├── HUB_MATERIAL[urn_s2v_hub_source_src_2].sql
│ │ │ ├── REFRESH_LAYER/
│ │ │ │ └── HUB_MATERIAL.sql
│ │ │ └── STREAMS_LAYER/
│ │ │ └── HUB_MATERIAL.sql
│ ├── HUB_SAT/
│ │ ├── SAT_CDC_FLAG/
│ │ │ ├── MAIN_LAYER/
│ │ │ │ └── SAT_CDC_FLAG.sql
│ │ │ ├── PREP_LAYER/
│ │ │ │ └── SAT_CDC_FLAG[(SOURCE_DATA,CUSTOMER)].sql
│ │ │ ├── REFRESH_LAYER/
│ │ │ │ └── SAT_CDC_FLAG.sql
│ │ │ └── STREAMS_LAYER/
│ │ │ └── SAT_CDC_FLAG.sql
│ ├── LINK_SAT/ # For Link Satellites
│ ├── LOOKUP/ # For Looup Tables
│ ├── NON_HISTORIZED_LINK/ # For Non-Historized Links
│ ├── REFERENCE/ # For Reference
│ ├── REGULAR_LINK/ # For Regular Links
│ └── STATUS_TRACKING_SATELLITE/ # For Status Tracking Satellites
├── INIT_DYNAMIC_TABLE/
│ ├── CREATE_SCHEMAS.sql
│ ├── CREATE_TAGS.sql
│ └── GHOST_RECORD.sql
├── Makefile
├── dependencies.json
├── entity_relationships.csv
├── lineage.csv
└── source_system_urn_check.csv

Core Components

1. Makefiles

  • A Makefile generated in the root of the output directory and individual Makefiles generated in each subdirectory
  • For more details on how this Makefile is used, refer to the Deployment via Makefile documentation.

2. INIT_<TABLE_TYPE>/ Folder

This folder contains essential one-time setup scripts:

  • CREATE_SCHEMAS.sql: SQL to create the necessary database schemas
  • CREATE_TAGS.sql: SQL to create Snowflake tags used for versioning or classification of Data Vault objects
  • GHOST_RECORD.sql: SQL to create the GHOST_RECORD table. This table provides standardized unknown/missing key records, crucial for maintaining referential integrity in Data Vault models.

3. DYNAMIC_TABLE/ or TABLE/ Folder

  • This top-level folder distinguishes between code generated for Snowflake Dynamic Tables and regular Snowflake Tables. The choice depends on your S2V configuration and Data Vault implementation strategy.

    Within this folder, the structure is hierarchical:

    1. Entity Type Folders (e.g., HUB, HUBSAT):

      • Objects are grouped by their Data Vault entity type. Common entity type folders you will see include:
        • HUB: For Hub objects.
        • REGULAR_LINK: For Regular Link objects.
        • HUBSAT: For Hub Satellite objects.
        • LINKSAT: For Link Satellite objects.
        • NON_HISTORIZED_LINK: For Non-Historized Link objects.
        • REFERENCE: For Reference tables.
        • STATUS_TRACKING_SATELLITE: Automatically generated for each Link to track the status of relationship instances.
        • LOOKUP_TABLE: Generated when lookup_mapping is used in an entity's definition.
    2. Entity Name Folders (e.g., HUB_MATERIAL, SAT_MATERIAL_DETAILS):

      • Each specific Data Vault object defined in your YAML metadata will have its own folder under its entity type.
    3. Layer Folders (e.g., PREP_LAYER, MAIN_LAYER):

      • These folders organize the SQL scripts based on their role in the data processing and deployment pipeline, particularly for Dynamic Table implementations:
        • PREP_LAYER (Preparation Layer):
          • Contains SQL scripts to create "preparation" Dynamic Tables, one for each source feeding a main Data Vault object.
        • MAIN_LAYER (Main/Core Layer):
          • Contains the SQL script for the primary Data Vault Dynamic Table (Hub, Link, Satellite).
        • STREAMS_LAYER (for CDC on Dynamic Tables):
          • Contains SQL to create a Snowflake Stream on the main Data Vault Dynamic Table to capture changes.
        • REFRESH_LAYER (Initializing Dynamic Tables):
          • Contains SQL to trigger an the refresh of the main Data Vault Dynamic Table.
info

Auto-generated Objects:

  • Status Tracking Satellites (STS): An STS is automatically generated for every Link object. Its purpose is to track the status and history of the relationships managed by the Link. The STS's column SATELLITE_PAYLOAD includes the values of the source columns that participate in the relationships, together with an additinal fields.
  • Lookup Tables: These are technical objects generated when a lookup_mapping is defined for an entity. They facilitate lookup operations during data processing but are not considered part of the main consumable Data Vault layer.

4. SQL File Content Characteristics

The generated SQL files share common characteristics:

  • Variables: SQL files use placeholder variables (e.g., &{TARGET_DATABASE}, &{TARGET_SCHEMA_DYNAMIC_TABLE}, &{TARGET_LAG}). These are replaced at deployment time by SnowSQL, using values from your .ini configuration file.
  • CREATE OR REPLACE DYNAMIC TABLE ... / CREATE OR REPLACE TABLE ...: The core DDL statements.
    • For Dynamic Tables, this includes definitions for TARGET_LAG, WAREHOUSE, REFRESH_MODE, and INITIALIZE.
  • AS SELECT ...: The query that defines the logic for populating the table.
    • Prep Layer: Selects from source tables and applies transformations
    • Main Layer: Typically unions data from corresponding prep layer DTs

5. Auxiliary Metadata Files

Several other files are generated in the root of the output directory to provide insights into your model:

  • dependencies.json (Optional):
    • A JSON representation of dependencies between Data Vault objects, useful for understanding model structure or custom deployment orchestration.
  • entity_relationships.csv (Optional):
    • A CSV file detailing parent-child relationships within the Data Vault model (e.g., which Satellites belong to which Hubs/Links).
  • lineage.csv (Optional):
    • A CSV file providing data lineage information, tracing data from source columns to their target columns in the Data Vault.
  • source_system_urn_check.csv (Optional):
    • A CSV file that helps in verifying the consistency and usage of source system URNs across the model.