Deploying to AWS
Deploy the DataFlow AI execution plane directly into your private AWS VPC. Securely leverage native integrations with AWS MSK for Kafka streaming and AWS EMR for Apache Spark batch processing.
AWS Control Plane Architecture
While DataFlow AI provides a completely managed SaaS interface, enterprise customers handling heavily regulated, sensitive healthcare (HIPAA) or financial (SOC2) data natively prefer deploying the Compute Engine straight into their Amazon Web Services environment.
This dual-plane approach guarantees that the DataFlow AI centralized control systems instruct the execution graphs, but your highly sensitive PII datasets strictly never leave your own controlled AWS network topology. All computations are securely executed entirely inside your personal boundaries.
IAM Cross-Account Trust
DataFlow AI requires a highly constrained Cross-Account IAM Role to provision transient EMR nodes inside your VPC. We strongly enforce the principles of least privilege. The exact JSON IAM policy required for secure provisioning is available inside your workspace settings.
Native Integrated Compute Services
To minimize cross-network egress fees, DataFlow AI seamlessly intercepts native AWS data mechanisms rather than relying on external clusters to perform heavy analysis.
Amazon MSK (Managed Kafka)
DataFlow AI automatically manages and routes your Kafka brokers directly into sub-second Flink CDC streaming applications instantly without the enormous overhead of manual ZooKeeper administrative operations.
Amazon EMR (Elastic MapReduce)
Transient PySpark clusters spin up automatically mere seconds before your massive data pipeline jobs are slated to begin. They process the required transformations natively against S3, and turn down instantly to ensure your AWS bill remains completely optimized.
AWS S3 & Glue Data Catalog
Native ingestion directly into S3 object storage seamlessly registers schema definitions backwards to the AWS Glue Data Catalog. This directly implies that DataFlow AI's generated catalog intelligently mirrors exactly what your other AWS machine learning products observe natively.
© 2026 DataFlow AI Docs