Skip to main content
Version: Next

Snowflake Iceberg Dynamic Tables

Stream2Vault supports Snowflake Iceberg Dynamic Tables as an alternative output technology to standard Snowflake Dynamic Tables. Iceberg tables store data in the open Apache Iceberg table format on external object storage (e.g. AWS S3, Azure ADLS, GCS), while still being orchestrated and queried through Snowflake.

When to Use Iceberg vs Standard Dynamic Tables

ConsiderationStandard Dynamic Tables (sf_dynamic_tables)Iceberg Dynamic Tables (sf_iceberg_dynamic_tables)
StorageSnowflake-managed internal storageExternal object storage (S3 / ADLS / GCS)
Data portabilitySnowflake onlyOpen format; readable by Spark, Flink, Trino, and others
Cross-engine sharingNot directlyYes — external engines can read the same files
Timestamp precisionTIMESTAMP_LTZ(9) (nanoseconds)TIMESTAMP_LTZ(6) (microseconds)
Semi-structured columnsFlexible OBJECT / VARIANTTyped OBJECT(...) — explicit schema required
Setup complexityLowerRequires an external volume pre-configured in Snowflake

Choose Iceberg when you need open-format interoperability — for example, sharing your Data Vault layer with a Spark-based analytics platform without duplicating data.

Prerequisites

Before generating Iceberg code, two objects must be configured in your Snowflake account:

1. External Volume

An External Volume is a Snowflake object that connects Snowflake to external object storage (S3, ADLS, or GCS). It defines where Iceberg data files will be physically written.

Refer to the Snowflake documentation on External Volumes for setup instructions.

2. Iceberg Catalog

An Iceberg Catalog tracks the metadata of your Iceberg tables (schema, snapshots, partitions). S2V-generated tables use Snowflake as the catalog (CATALOG = 'SNOWFLAKE'), which means Snowflake manages the table metadata automatically — no external catalog service (e.g. Glue, Polaris) is required unless you need cross-engine writes.

If your organisation uses an external catalog (e.g. AWS Glue, Apache Polaris), consult your Snowflake administrator to ensure the external volume and catalog are configured to share the same storage path before deploying.

Refer to the Snowflake documentation on Iceberg Catalogs for further details.

How to Generate Iceberg Code

Pass -t sf_iceberg_dynamic_tables to the generate command:

s2v generate -i path/to/my-input-folder/ \
-o path/to/my-output-folder/ \
-t sf_iceberg_dynamic_tables

Your YAML model files remain unchanged — the technology flag only controls which SQL is emitted.

Additional .ini Variables Required

Standard Dynamic Table deployments use a SnowSQL .ini file with variables such as TARGET_DATABASE, TARGET_WAREHOUSE, and TARGET_LAG. Iceberg deployments require two additional variables:

VariableDescriptionExample
EXTERNAL_VOLUMEName of the Snowflake External Volume to use for Iceberg storagemy_s3_ext_volume
ICEBERG_BASE_LOCATIONBase path prefix inside the external volumedv_iceberg/

Add these to your .ini file alongside the existing variables:

[connections]
...

[variables]
TARGET_DATABASE = <TARGET_DATABASE>
TARGET_SCHEMA_DYNAMIC_TABLE = <TARGET_SCHEMA_DYNAMIC_TABLE>
TARGET_WAREHOUSE = <TARGET_WAREHOUSE>
...
EXTERNAL_VOLUME = <EXTERNAL_VOLUME> # Iceberg-specific
ICEBERG_BASE_LOCATION = <ICEBERG_BASE_LOCATION/> # Iceberg-specific

Differences in Generated SQL

The Iceberg generator produces SQL that is structurally identical to standard Dynamic Tables, with the following differences applied to every generated object:

1. ICEBERG keyword in the DDL

-- Standard Dynamic Table
CREATE OR REPLACE DYNAMIC TABLE "&{TARGET_DATABASE}"."DV"."HUB_CUSTOMER"

-- Iceberg Dynamic Table
CREATE OR REPLACE DYNAMIC ICEBERG TABLE "&{TARGET_DATABASE}"."DV"."HUB_CUSTOMER"

2. Additional Iceberg storage properties

Iceberg tables require three additional properties in the table definition:

    TARGET_LAG = '&{TARGET_LAG}'
WAREHOUSE = &{TARGET_WAREHOUSE}
REFRESH_MODE = INCREMENTAL
INITIALIZE = ON_SCHEDULE
EXTERNAL_VOLUME = '&{EXTERNAL_VOLUME}'
CATALOG = 'SNOWFLAKE'
BASE_LOCATION = '<data_vault_name>/HUB_CUSTOMER/'
  • EXTERNAL_VOLUME — injected from your .ini at deploy time.
  • CATALOG = 'SNOWFLAKE' — always SNOWFLAKE for S2V-managed Iceberg tables.
  • BASE_LOCATION — automatically derived from your data_vault_name setting in data_vault_settings.yaml and the object name.

3. Reduced timestamp precision

Iceberg format supports up to microsecond precision (6), whereas Snowflake native storage supports nanoseconds (9):

-- Standard
"LOAD_DATE" TIMESTAMP_LTZ(9)

-- Iceberg
"LOAD_DATE" TIMESTAMP_LTZ(6)

4. Typed semi-structured columns

Iceberg requires explicit type definitions for semi-structured columns. The RECORD_SOURCE column and SATELLITE_PAYLOAD column in Status Tracking Satellites are affected:

-- Standard (OBJECT without schema)
,"RECORD_SOURCE" OBJECT

-- Iceberg (typed OBJECT with explicit schema)
,"RECORD_SOURCE" OBJECT("schema" VARCHAR, "table" VARCHAR)