Skip to main content
You can include CSV or Parquet files directly in your packages.

Why Embed Data?

Embedding data files in your packages is valuable when you need to:
  • Package sample data - Example datasets for testing or demos
  • Build standalone models - Models that don’t require database connections
  • Version control data - Keep data synchronized with model changes in your package
Embedded files are loaded into DuckDB when your package is published, making them queryable in your models.
Currently, embedded data files work best for standalone models. Support for querying embedded data alongside database connections (e.g., joining embedded lookup tables with warehouse data) is coming soon.

Adding Data Files

File Structure

Create a data/ folder in your package directory and add your CSV or Parquet files:
my-package/
├── publisher.json
├── ecommerce.malloy
└── data/
    ├── country_codes.csv
    ├── product_categories.parquet
    └── exchange_rates.csv

Supported File Formats

  • CSV files (.csv) - Comma-separated values with header row
  • Parquet files (.parquet) - Columnar binary format, efficient for larger datasets

Referencing Embedded Data in Models

Use duckdb.table() to reference embedded files in your Malloy models:
// Reference a CSV file
source: country_codes is duckdb.table('data/country_codes.csv') extend {
  dimension:
    country_code is code
    country_name is name
    region is geographic_region
}

// Reference a Parquet file
source: product_categories is duckdb.table('data/product_categories.parquet') extend {
  dimension:
    category_id is id
    category_name is name
    parent_category is parent_id
}

Next Steps

Build a Model

Learn how to build semantic models in VS Code
I