Warning
🚧 Work in Progress: This page is currently under construction. Content may be incomplete or subject to change. To contribute, see the contribution guide.
Data Modeling Standards
Medallion architecture
The entire data platform follows a three-layer architecture:
| Layer | Dataset | Responsibility | Transformations |
|---|---|---|---|
| Raw | raw_* | Raw data, original source format | None — ingestion only |
| Stage | stage_* | Clean, standardized data | Typing, deduplication, normalization |
| Gold | gold_* | Modeled data for consumption | Aggregations, joins, business metrics |
Fact × dimension tables (star schema)
For analytical datasets in the gold layer:
fact_fundraising dim_fund
├── fund_id (FK) ──────→ ├── fund_id (PK)
├── reference_date ├── fund_name
├── raised_amount ├── strategy
└── _load_date └── inception_date
dim_period
├── date (PK)
├── year
├── month
└── quarter
Mandatory rules for gold layer
- Every fact table must have a surrogate primary key (
{entity}_id+GENERATE_UUID()) - Every table must have load metadata:
_load_date TIMESTAMP,_source STRING - Partitioning mandatory on tables > 1 GB by date column
- Clustering mandatory on columns most frequently used in WHERE/JOIN