Docs

Pipelines Engine

The DataFlow AI Pipelines Engine orchestrates data movement and transformation across your entire architecture. It transparently unifies batch processing (via Apache Spark) and real-time streaming (via Apache Flink) into a single visual interface or code-first API.

Visual DAG Builder

Construct Directed Acyclic Graphs (DAGs) using our infinite canvas. Drag and drop sources, transformations, ML scripts, and sinks. The engine automatically computes dependencies and resolves the execution order.

AI Prompt to Pipeline

Don't want to drag nodes? Just describe what you want. E.g., "Pull orders from Postgres, mask the phone number, join with Stripe payments, and sink to Snowflake." The AI generates the Airflow/dbt codebase instantly.

Under the Hood

Intelligent Computing Engines

Unlike basic ELT tools, DataFlow AI dynamically routes computations based on latency requirements:

  • Serverless PySpark (Micro-Batch): Designed for heavy, stateful transformations spanning TBs of historical data. Features dynamic auto-scaling per step.
  • Apache Flink (Continuous Stream): Ideal for Kafka CDC pipelines requiring sub-second latencies. Supports stateful aggregations (e.g., sliding window counts).
  • dbt Core (Pushdown SQL): If transforming inside a warehouse (Snowflake, BigQuery), the pipeline pushes standard SQL via dbt, avoiding massive network egress costs.
Intelligent Auto-Healer

If upstream columns drop or change names, traditional orchestrators fail. DataFlow AI's Auto-Healer intercepts the failure, reads the metadata diff, rewrites the transformation logic on the fly via AI, runs a dry-test, and resumes the pipeline with an alert sent to the Data Steward.

Code Export & Versioning

We do not believe in vendor lock-in. Any pipeline built visually is compiled into standard, version-controlled Python (Apache Airflow/Prefect) and SQL (dbt). You can export your entire infrastructure as raw boilerplate at any time.

← Back to Main App

© 2026 DataFlow AI Docs