Catalog & Governance
If you don't trust your data, you can't use it. The Unified Catalog is the central nerve system, providing active metadata tracking, interactive data dictionaries, automatic lineage mapping, and stringent SLA contracts.
Automated Indexing
Every table generated in the Lakehouse is immediately documented. Columns are analyzed textually by AI to suggest clear business descriptions.
Column-Level Lineage
Trace the exact journey of a metric. If `fct_revenue` drops, trace it visually back up the graph to find the exact broken Airflow task or upstream API.
PII Security Masking
LLMs scan schemas during discovery. Columns resembling Emails, SSNs, or CC numbers are tagged `Restricted` and logically hashed before landing in the Bronze layer.
Enforcing Data Contracts
Engineers often face issues where software teams push migrations (e.g., dropping a legacy column or changing a string to an integer) that entirely crashes downstream BI dashboards. Data Contracts prevent this mathematically.
schema_contract:
table: silver.dim_users
owner: data_platform_team
validations:
- column: user_id
type: uuid
tests: [not_null, unique]
- column: signup_date
type: timestamp
tests: [not_in_future]
enforcement_mode: BLOCK- WARN Mode: Notifies Slack channels and adds a 'Warning' flag in the catalog, but allows the pipeline to succeed.
- BLOCK Mode: Quarantines the bad records into a dead-letter queue, forcing human intervention before corrupting the Gold layer.
© 2026 DataFlow AI Docs