Warning
🚧 Work in Progress: This page is currently under construction. Content may be incomplete or subject to change. To contribute, see the contribution guide.
Data Quality
Quality dimensions
| Dimension | Description | Check |
|---|---|---|
| Completeness | Required fields filled | % of nulls in critical columns |
| Uniqueness | No duplicates by primary key | COUNT DISTINCT vs COUNT |
| Validity | Values within expected domains | Range and enum checks |
| Consistency | Data coherent across tables | Referential integrity checks |
| Timeliness | Data within the update SLA | MAX(load_date) vs NOW() |
Mandatory checks in the stage layer
Every pipeline that feeds the stage layer must implement:
-- 1. Check completeness of critical fields
SELECT COUNT(*) FROM table WHERE key_field IS NULL;
-- expected result: 0
-- 2. Check for duplicates
SELECT COUNT(*) - COUNT(DISTINCT id) AS duplicates FROM table;
-- expected result: 0
-- 3. Check timeliness
SELECT MAX(load_date) FROM table;
-- expected result: within SLACheck catalog per dataset
| Dataset | Check | Frequency | Alert |
|---|---|---|---|
| (fill in) | (fill in) | (fill in) | (fill in) |