Docs

Catalog & Governance

If you don't trust your data, you can't use it. The Unified Catalog is the central nerve system, providing active metadata tracking, interactive data dictionaries, automatic lineage mapping, and stringent SLA contracts.

Automated Indexing

Every table generated in the Lakehouse is immediately documented. Columns are analyzed textually by AI to suggest clear business descriptions.

Column-Level Lineage

Trace the exact journey of a metric. If `fct_revenue` drops, trace it visually back up the graph to find the exact broken Airflow task or upstream API.

PII Security Masking

LLMs scan schemas during discovery. Columns resembling Emails, SSNs, or CC numbers are tagged `Restricted` and logically hashed before landing in the Bronze layer.

Enforcing Data Contracts

Engineers often face issues where software teams push migrations (e.g., dropping a legacy column or changing a string to an integer) that entirely crashes downstream BI dashboards. Data Contracts prevent this mathematically.

schema_contract:
  table: silver.dim_users
  owner: data_platform_team
  validations:
    - column: user_id
      type: uuid
      tests: [not_null, unique]
    - column: signup_date
      type: timestamp
      tests: [not_in_future]
  enforcement_mode: BLOCK
  • WARN Mode: Notifies Slack channels and adds a 'Warning' flag in the catalog, but allows the pipeline to succeed.
  • BLOCK Mode: Quarantines the bad records into a dead-letter queue, forcing human intervention before corrupting the Gold layer.
← Back to Main App

© 2026 DataFlow AI Docs