Warning

🚧 Work in Progress: This page is currently under construction. Content may be incomplete or subject to change. To contribute, see the contribution guide.

Data Quality


Quality dimensions

DimensionDescriptionCheck
CompletenessRequired fields filled% of nulls in critical columns
UniquenessNo duplicates by primary keyCOUNT DISTINCT vs COUNT
ValidityValues within expected domainsRange and enum checks
ConsistencyData coherent across tablesReferential integrity checks
TimelinessData within the update SLAMAX(load_date) vs NOW()

Mandatory checks in the stage layer

Every pipeline that feeds the stage layer must implement:

-- 1. Check completeness of critical fields
SELECT COUNT(*) FROM table WHERE key_field IS NULL;
-- expected result: 0
 
-- 2. Check for duplicates
SELECT COUNT(*) - COUNT(DISTINCT id) AS duplicates FROM table;
-- expected result: 0
 
-- 3. Check timeliness
SELECT MAX(load_date) FROM table;
-- expected result: within SLA

Check catalog per dataset

DatasetCheckFrequencyAlert
(fill in)(fill in)(fill in)(fill in)