Warning

🚧 Work in Progress: This page is currently under construction. Content may be incomplete or subject to change. To contribute, see the contribution guide.

Data Modeling Standards


Medallion architecture

The entire data platform follows a three-layer architecture:

LayerDatasetResponsibilityTransformations
Rawraw_*Raw data, original source formatNone — ingestion only
Stagestage_*Clean, standardized dataTyping, deduplication, normalization
Goldgold_*Modeled data for consumptionAggregations, joins, business metrics

Fact × dimension tables (star schema)

For analytical datasets in the gold layer:

fact_fundraising          dim_fund
├── fund_id (FK) ──────→  ├── fund_id (PK)
├── reference_date        ├── fund_name
├── raised_amount         ├── strategy
└── _load_date            └── inception_date

dim_period
├── date (PK)
├── year
├── month
└── quarter

Mandatory rules for gold layer

  • Every fact table must have a surrogate primary key ({entity}_id + GENERATE_UUID())
  • Every table must have load metadata: _load_date TIMESTAMP, _source STRING
  • Partitioning mandatory on tables > 1 GB by date column
  • Clustering mandatory on columns most frequently used in WHERE/JOIN