Data Observability
Full-stack visibility into your data workflows. The Observability component acts as an MRI scanner for your infrastructure, tracking pipeline latencies, compute performance, query costs, and data health in real-time.
SLA & Freshness Monitoring
Measure how often essential metrics are updated. Set strict SLA thresholds (e.g., "Gold Revenue Table must refresh by 9am EST") and track delivery probability models.
Infrastructure Telemetry
Stream granular metrics regarding your underlying compute clusters (Spark memory utilization, Flink checkpointing duration, and Kafka consumer lag).
FinOps Cost Optimitazion
Track actual dollar spend down to the specific user query or pipeline node scale-out event. AI agents flag unused intermediate tables and overly expensive full-scans.
Pipeline Health & Alerts
Observability integrates natively with the Data Contracts engine. If thousands of blank customer records suddenly stream in during a Black Friday event, the engine will halt the specific partition flow, quarantine the anomalies, and trigger severity 1 PagerDuty alarms while permitting healthy pipelines around it to continue.
Automated Performance Tuning
The Observability suite doesn't just display colorful graphs — it actively optimizes the system.
- Spark Shuffle Optimization: If the agent detects excessive data spill during complex joins, it will preemptively rewrite the cluster configuration to assign higher memory nodes before the next DAG run.
- Index Suggestions: If `dim_users` is constantly queried via full scans across Snowflake, the AI suggests optimal clustering push-downs.
© 2026 DataFlow AI Docs