Warning

🚧 Work in Progress: This page is currently under construction. Content may be incomplete or subject to change. To contribute, see the contribution guide.

Data Architecture

Principles

Single source of truth: BigQuery as the central repository — data must not be replicated in local silos
Medallion architecture: three data maturity layers (raw → stage → gold)
Quality before consumption: mandatory validations in the stage layer
Cataloging: every production dataset must be registered in the Data Catalog
Access governance: access by AD group, no service credential sharing

Data lake layers

Layer	Standard dataset	Content	Retention	Access
Raw	`project.raw_domain`	Raw data, no transformation, original format	90 days	Data engineering only
Stage	`project.stage_domain`	Cleaned, typed, deduplicated data	1 year	Data engineering
Gold	`project.gold_domain`	Modeled data, ready for consumption	Permanent	Analysts, BI, APIs

Dataset naming standard

See Standards > Naming for full conventions.

Summary:

{layer}_{domain}_{subdomain}
e.g.: raw_investments_fundraising
      stage_finance_accounts_payable
      gold_corporate_headcount

Technology stack

Component	Technology	Use
Data lake / DW	BigQuery (GCP)	Storage and analytical queries
Orchestration	Airflow — Cloud Composer	Pipeline scheduling and dependencies
Ingestion APIs	Cloud Run	REST ingestion for systems without native connectors
Low-code integration	N8N	External API and webhook ingestion
Transformation	SQL (BigQuery) + dbt (under evaluation)	Stage → gold transformations
BI	(fill in — Looker / Power BI / etc.)	Dashboards and reports
AI Models	Vertex AI / Cloud Run	Production models

Patria Tech Docs

Explorer

Patria Tech Docs

data architecture

Data Architecture

Principles

Data lake layers

Dataset naming standard

Technology stack

Table of Contents

Backlinks