Understanding the Generated Code Output
Stream2Vault (S2V) generates a structured set of SQL scripts and Makefiles designed to facilitate the deployment and management of your Data Vault 2.0 objects in Snowflake. This document explains the organization of these generated files.
Output Directory Structure
All generated code is placed within a main output directory (e.g., GENERATED_CODE/). This directory contains:
- An
INIT_DYNAMIC_TABLE/(and/ orINIT_TABLE/) folder for initialization scripts. - A
DYNAMIC_TABLE/(and/ orTABLE/) folder containing the SQL definitions for your Data Vault entities. - A root
Makefileto orchestrate the deployment. - Several auxiliary CSV and JSON files providing metadata and lineage information.
Simplified Directory Tree Example
Here's a simplified view of the typical directory structure, focusing on a Hub and its Satellite to illustrate the pattern:
GENERATED_CODE/
├── DYNAMIC_TABLE/
│ ├── HUB/
│ │ ├── HUB_MATERIAL/
│ │ │ ├── MAIN_LAYER/
│ │ │ │ └── HUB_MATERIAL.sql
│ │ │ ├── PREP_LAYER/
│ │ │ │ ├── HUB_MATERIAL[urn_s2v_hub_source_src_1].sql
│ │ │ │ ├── HUB_MATERIAL[urn_s2v_hub_source_src_2].sql
│ │ │ ├── REFRESH_LAYER/
│ │ │ │ └── HUB_MATERIAL.sql
│ │ │ └── STREAMS_LAYER/
│ │ │ └── HUB_MATERIAL.sql
│ ├── HUB_SAT/
│ │ ├── SAT_CDC_FLAG/
│ │ │ ├── MAIN_LAYER/
│ │ │ │ └── SAT_CDC_FLAG.sql
│ │ │ ├── PREP_LAYER/
│ │ │ │ └── SAT_CDC_FLAG[(SOURCE_DATA,CUSTOMER)].sql
│ │ │ ├── REFRESH_LAYER/
│ │ │ │ └── SAT_CDC_FLAG.sql
│ │ │ └── STREAMS_LAYER/
│ │ │ └── SAT_CDC_FLAG.sql
│ ├── LINK_SAT/ # For Link Satellites
│ ├── LOOKUP/ # For Looup Tables
│ ├── NON_HISTORIZED_LINK/ # For Non-Historized Links
│ ├── REFERENCE/ # For Reference
│ ├── REGULAR_LINK/ # For Regular Links
│ └── STATUS_TRACKING_SATELLITE/ # For Status Tracking Satellites
├── INIT_DYNAMIC_TABLE/
│ ├── CREATE_SCHEMAS.sql
│ ├── CREATE_TAGS.sql
│ └── GHOST_RECORD.sql
├── Makefile
├── dependencies.json
├── entity_relationships.csv
├── lineage.csv
└── source_system_urn_check.csv
Core Components
1. Makefiles
- A
Makefilegenerated in the root of the output directory and individualMakefilesgenerated in each subdirectory - For more details on how this Makefile is used, refer to the Deployment via Makefile documentation.
2. INIT_<TABLE_TYPE>/ Folder
This folder contains essential one-time setup scripts:
CREATE_SCHEMAS.sql: SQL to create the necessary database schemasCREATE_TAGS.sql: SQL to create Snowflake tags used for versioning or classification of Data Vault objectsGHOST_RECORD.sql: SQL to create theGHOST_RECORDtable. This table provides standardized unknown/missing key records, crucial for maintaining referential integrity in Data Vault models.
3. DYNAMIC_TABLE/ or TABLE/ Folder
-
This top-level folder distinguishes between code generated for Snowflake Dynamic Tables and regular Snowflake Tables. The choice depends on your S2V configuration and Data Vault implementation strategy.
Within this folder, the structure is hierarchical:
-
Entity Type Folders (e.g.,
HUB,HUBSAT):- Objects are grouped by their Data Vault entity type. Common entity type folders you will see include:
HUB: For Hub objects.REGULAR_LINK: For Regular Link objects.HUBSAT: For Hub Satellite objects.LINKSAT: For Link Satellite objects.NON_HISTORIZED_LINK: For Non-Historized Link objects.REFERENCE: For Reference tables.STATUS_TRACKING_SATELLITE: Automatically generated for each Link to track the status of relationship instances.LOOKUP_TABLE: Generated whenlookup_mappingis used in an entity's definition.
- Objects are grouped by their Data Vault entity type. Common entity type folders you will see include:
-
Entity Name Folders (e.g.,
HUB_MATERIAL,SAT_MATERIAL_DETAILS):- Each specific Data Vault object defined in your YAML metadata will have its own folder under its entity type.
-
Layer Folders (e.g.,
PREP_LAYER,MAIN_LAYER):- These folders organize the SQL scripts based on their role in the data processing and deployment pipeline, particularly for Dynamic Table implementations:
PREP_LAYER(Preparation Layer):- Contains SQL scripts to create "preparation" Dynamic Tables, one for each source feeding a main Data Vault object.
MAIN_LAYER(Main/Core Layer):- Contains the SQL script for the primary Data Vault Dynamic Table (Hub, Link, Satellite).
STREAMS_LAYER(for CDC on Dynamic Tables):- Contains SQL to create a Snowflake Stream on the main Data Vault Dynamic Table to capture changes.
REFRESH_LAYER(Initializing Dynamic Tables):- Contains SQL to trigger an the refresh of the main Data Vault Dynamic Table.
- These folders organize the SQL scripts based on their role in the data processing and deployment pipeline, particularly for Dynamic Table implementations:
-
Auto-generated Objects:
- Status Tracking Satellites (STS): An STS is automatically generated for every Link object. Its purpose is to track the status and history of the relationships managed by the Link. The STS's column
SATELLITE_PAYLOADincludes the values of the source columns that participate in the relationships, together with an additinal fields. - Lookup Tables: These are technical objects generated when a
lookup_mappingis defined for an entity. They facilitate lookup operations during data processing but are not considered part of the main consumable Data Vault layer.
4. SQL File Content Characteristics
The generated SQL files share common characteristics:
- Variables: SQL files use placeholder variables (e.g.,
&{TARGET_DATABASE},&{TARGET_SCHEMA_DYNAMIC_TABLE},&{TARGET_LAG}). These are replaced at deployment time by SnowSQL, using values from your.iniconfiguration file. CREATE OR REPLACE DYNAMIC TABLE .../CREATE OR REPLACE TABLE ...: The core DDL statements.- For Dynamic Tables, this includes definitions for
TARGET_LAG,WAREHOUSE,REFRESH_MODE, andINITIALIZE.
- For Dynamic Tables, this includes definitions for
AS SELECT ...: The query that defines the logic for populating the table.- Prep Layer: Selects from source tables and applies transformations
- Main Layer: Typically unions data from corresponding prep layer DTs
5. Auxiliary Metadata Files
Several other files are generated in the root of the output directory to provide insights into your model:
dependencies.json(Optional):- A JSON representation of dependencies between Data Vault objects, useful for understanding model structure or custom deployment orchestration.
entity_relationships.csv(Optional):- A CSV file detailing parent-child relationships within the Data Vault model (e.g., which Satellites belong to which Hubs/Links).
lineage.csv(Optional):- A CSV file providing data lineage information, tracing data from source columns to their target columns in the Data Vault.
source_system_urn_check.csv(Optional):- A CSV file that helps in verifying the consistency and usage of source system URNs across the model.