Step Functions vs Apache Airflow: Which One Fits Your Orchestration Needs?
In modern cloud-native applications, orchestration tools play a vital role in managing and automating workflows. But when it comes to visibility at the task level, not all tools are created equal.
In this post, I’ll walk you through a real-world architectural decision I had to make: choosing between AWS Step Functions and Apache Airflow (via Amazon MWAA) for a workflow orchestration platform that required:
Detailed task-level observability
Minimal vendor lock-in
Clean separation of business and platform logic
The Problem: Missing Task-Level Events in Step Functions
I started with AWS Step Functions because it fits so well in the AWS ecosystem. It’s serverless, scales automatically, and is deeply integrated with services like Lambda, SQS, SNS, and more.
However, one thing stood out: it doesn’t emit events for individual tasks. You only get events for execution state changes (e.g., started, succeeded, failed). If you’re looking to track when a task starts and completes or how long each step takes, this becomes a challenge.
To work around it, you’d either need to:
Parse CloudWatch logs, which are delayed and not reliable for real-time monitoring.
Modify each Lambda function (or task handler) to emit events to something like EventBridge.
Both approaches come with drawbacks. CloudWatch isn’t real-time, and instrumenting every Lambda with event logic breaks clean separation between business logic and platform responsibilities—especially when those functions are developed by external vendors.
Apache Airflow to the Rescue?
That’s when I considered Apache Airflow, specifically the managed version from AWS: Amazon MWAA.
Here’s what makes Airflow compelling:
Every task’s status is tracked in a metadata database: queued, running, succeeded, failed.
It has a built-in web UI that gives a clear view of DAG runs and task durations.
You can use callbacks like on_success_callback and on_failure_callback to emit custom events or trigger downstream workflows.
In short: Airflow provides native task-level observability, without requiring business logic functions to be aware of it.
What About Cost?
Cost is where Step Functions shines—especially for lightweight, high-frequency workflows.
Step Functions is significantly cheaper—but at the cost of native observability and flexibility.
Step Functions or Airflow — Which Should You Choose?
Here’s a decision guide, without using a table:
Choose Step Functions if:
You’re building within AWS
You want a fully managed, serverless experience
Your workflows are straightforward
You can live with basic observability or are okay with adding wrappers
Choose Airflow if:
You need task-level tracking out of the box
You want to emit custom events per task without modifying core logic
You’re okay managing infrastructure (even in a managed setting)
You’re not tightly coupled to AWS and value flexibility
Final Thoughts
This isn’t a matter of one tool being better than the other. It’s about trade-offs:
Simplicity and cost vs. control and visibility.
If you’ve tackled similar challenges or have creative ways to extend Step Functions for better observability, I’d love to hear your thoughts.
