Warning

🚧 Work in Progress: This page is currently under construction. Content may be incomplete or subject to change. To contribute, see the contribution guide.

BigQuery / Data Lake

FieldValue
FunctionCentralized data lake and data warehouse
Status✅ Production
Owning squadData Squad
PlatformGoogle Cloud Platform — BigQuery
RunbookView runbook

Overview

BigQuery is the central data platform at Patria, as per ADR-001. All data ingestion, transformation, and consumption passes through this platform.


Medallion architecture

flowchart LR
    subgraph Ingestion
        S1[Operational systems]
        S2[External APIs]
        S3[Files / SharePoint]
    end

    subgraph BigQuery
        RAW[dataset_raw
Raw data]
        STAGE[dataset_stage
Cleaned data]
        GOLD[dataset_gold
Modeled data]
    end

    subgraph Consumption
        BI[BI / Dashboards]
        API[Internal APIs]
        ML[AI Models]
    end

    Ingestion --> RAW --> STAGE --> GOLD --> Consumption

Datasets in production

DatasetLayerDomainUpdate frequency
(fill in)raw / stage / gold(fill in)(fill in)

Orchestration

Pipelines managed via Airflow (Cloud Composer). See Pipelines.


Access

  • Access via Identity Federation with Entra ID
  • Access groups managed in Entra ID
  • Requests via ServiceNow

2 items under this folder.