API ReferencePython SDK Hooks

Python SDK Hooks

The dataflow-sdk pip package allows you to deeply integrate pipeline execution commands from existing Jupyter Notebooks or standard Airflow environments seamlessly.

Declarative Graph Registration

Data Scientists building complex machine learning models directly inside Jupyter Notebooks frequently need to pull perfectly sanitized arrays from the Silver or Gold data layer without ever touching the frontend UI.

The official pip install dataflow-ai-sdk library provides fully Pandas-compatible DataFrames dynamically requested across the internal network, ensuring algorithms train on the freshest available CDC states.

Airflow Directed Acyclic Graphs (DAGs)

If your organization strictly orchestrates everything—including non-data tasks—via Apache Airflow centrally, you can embed our managed Python operator natively to trigger downstream DataFlow executions.

from airflow import DAG from dataflow_ai.airflow import DataFlowOperator trigger_silver_build = DataFlowOperator( task_id="build_silver_layer", pipeline_id="pl_abc123", wait_for_completion=True, fail_on_data_contract_breach=True )

Native Standard Output Hooking

When wait_for_completion=True is passed, this operator natively polls our background deployment cluster continuously. It seamlessly intercepts PySpark print log events, channeling them directly into your Airflow execution stdout so your central DevOps monitoring never loses vital observability capabilities.