Warning
🚧 Work in Progress: This page is currently under construction. Content may be incomplete or subject to change. To contribute, see the contribution guide.
BigQuery / Data Lake
| Field | Value |
|---|---|
| Function | Centralized data lake and data warehouse |
| Status | ✅ Production |
| Owning squad | Data Squad |
| Platform | Google Cloud Platform — BigQuery |
| Runbook | View runbook |
Overview
BigQuery is the central data platform at Patria, as per ADR-001. All data ingestion, transformation, and consumption passes through this platform.
Medallion architecture
flowchart LR subgraph Ingestion S1[Operational systems] S2[External APIs] S3[Files / SharePoint] end subgraph BigQuery RAW[dataset_raw Raw data] STAGE[dataset_stage Cleaned data] GOLD[dataset_gold Modeled data] end subgraph Consumption BI[BI / Dashboards] API[Internal APIs] ML[AI Models] end Ingestion --> RAW --> STAGE --> GOLD --> Consumption
Datasets in production
| Dataset | Layer | Domain | Update frequency |
|---|---|---|---|
| (fill in) | raw / stage / gold | (fill in) | (fill in) |
Orchestration
Pipelines managed via Airflow (Cloud Composer). See Pipelines.
Access
- Access via Identity Federation with Entra ID
- Access groups managed in Entra ID
- Requests via ServiceNow