Python SDK Hooks
The dataflow-sdk pip package allows you to deeply integrate pipeline execution commands from existing Jupyter Notebooks or standard Airflow environments seamlessly.
Declarative Graph Registration
Data Scientists building complex machine learning models directly inside Jupyter Notebooks frequently need to pull perfectly sanitized arrays from the Silver or Gold data layer without ever touching the frontend UI.
The official pip install dataflow-ai-sdk library provides fully Pandas-compatible DataFrames dynamically requested across the internal network, ensuring algorithms train on the freshest available CDC states.
If your organization strictly orchestrates everything—including non-data tasks—via Apache Airflow centrally, you can embed our managed Python operator natively to trigger downstream DataFlow executions.
Native Standard Output Hooking
When wait_for_completion=True is passed, this operator natively polls our background deployment cluster continuously. It seamlessly intercepts PySpark print log events, channeling them directly into your Airflow execution stdout so your central DevOps monitoring never loses vital observability capabilities.
© 2026 DataFlow AI Docs