Skip to main content

Initialize Your First Project

This guide will help you set up and run your first Stream2Vault (S2V) project.

Before you start, please ensure:

  • You have the Stream2Vault client installed (see Installation).
  • You have met all Prerequisites.
  • You have logged in via s2v login -c <AUTHENTICATION.JSON>

Step 1: Get Your Project Files

The easiest way to start is by using a pre-configured sample project. Alternatively, you can create the minimal files manually.

  1. Download Sample Project:
  2. Extract the Project:
    • Extract the contents of the downloaded ZIP file into a new folder on your computer. Let's call this folder my_first_s2v_project/.
    • Inside my_first_s2v_project/, you should find a subfolder, often named dv_model/ or similar, containing all the necessary YAML files and configuration. For this guide, we'll assume it's named dv_model/.

Option B: Create a Minimal Project Manually

If you prefer to start from scratch:

  1. Create Project Folders:

    • Create a main project folder, for example, my_first_s2v_project/.
    • Inside my_first_s2v_project/, create a subfolder named dv_model/. This will be your input model directory.
  2. Create Essential Configuration Files inside dv_model/:

    • data_vault_settings.yaml: Defines global settings for your Data Vault generation. See Data Vault Settings.
    • source_system_settings.yaml: Defines settings for your source systems. See Source System Settings.
    • information_schema.csv: Contains metadata about your source tables (columns, data types). S2V uses this to validate your model definitions against actual source structures. If you are not using sample data, you might need to manually create or obtain the information schema. See Information Schema.
  3. Create a Simple Data Vault Object File:

    • You can create your own Hub definition file (e.g., hub_customer.yaml) in the dv_model/ directory using the template below.
    • Alternatively, follow the Build a Hub Tutorial for a guided example.
    • Ensure the source table and columns you reference here exist in your information_schema.csv.
# dv_model/hub.yaml
name: '<HUB_NAME>'
entity_type: 'hub'
enable_refresh: true
concatenate_business_keys: false
requires_bussines_key: false

target_business_key_columns:
- '<BUSINESS_KEY_NAME>'
entity_sources:
- urn:s2v:hub_source:src_customers:
entity_source: '(DATABASE_NAME, SCHEMA_NAME, TABLE_NAME)'
source_system_configuration_urn: 'urn:s2v:source_setting:<SOURCE_SYSTEM_NAME>'
business_key_mapping:
- <BUSINESS_KEY_NAME>:
- '<SOURCE_COLUMN_NAME'
source_business_key: ''

Step 2: Understand the Default Project Structure

By default, S2V expects all your configuration YAMLs files (from the Step 2) to be in the root directory of your project (e.g., my_first_s2v_project/). All other YAMLs as Data Vault object YAMLs might be stored anywhere in the project directory. For more information, refer to the Project Structure Tutorial .

Example Project Structure

my_first_s2v_project/         # This is your input project directory (-i parameter)
├── data_vault_settings.yaml
├── source_system_settings.yaml
├── information_schema.csv
└── dv_model/
├── hub_customer.yaml # Example DV object file
└── ... # Other DV object YAML files (links, satellites, etc.)