Case Study:

Billing Platform: Data Orchestration With Dagster

Services Rendered: DevOps + Infrastructure | Cloud Architecture | Workflow Automation

favicon

Challenge:

Our client, a Series B billing platform, approached us to improve their data orchestration platform setup. Although their system was operational using AWS Step Functions – a service for orchestrating multi-step cloud workflows – they found this setup to be fragile and unreliable.

The client’s chief complaint was the inability to properly test any new changes in a staging environment that closely resembled their production environment. Due to the massive size of their production databases and extremely high throughput of roughly 5k write operations per second, replicating data for staging tests was unfeasible. This limitation forced the client to adopt the risky strategy of relying solely on testing in a staging environment that lacked true production data, in essence testing in production.

In addition to these testing limitations, AWS Step Functions present other challenges: brittle interactions, difficult debugging, subpar logging, and poor developer support. Frequent data crashes and production errors were occurring – many of which could have been avoided with a more robust setup.

favicon

Solution:

Our team evaluated three major vendors – Airflow, Prefect and Dagster – along with a proposal for an internally built solution. We recommended a “buy” rather than “build” approach to leverage the strengths of established platforms, accelerate deployment, and ease developer burden. Airflow was also eliminated from consideration due to similar testing limitations as AWS Step Functions.

We ultimately aligned on Dagster as the best solution of the three options for this client’s needs. The main advantage was Dagster’s branch-based deployments (explained in detail below), which allowed the client to run a parallel test pipeline in production with live data without impacting users.

Our team was pleased with the ease of implementing Dagster – implementation was straightforward, with clear documentation and very attentive vendor support. We had the first pipeline running on production in less than a month. It was easy to create repeatable patterns and to set up good automated testing. Velocity even increased after that first month as we leveraged repeatable patterns in subsequent pipeline setups.

While AWS Step functions were controlled by multiple repositories in multiple locations, our Dagster implementation centralized most code into a single monorepo. This helps prevent unnecessary errors, reduces incident frequency, and prevents knowledge silo-ing.

AWS Step Functions Alerts

The diagram above shows how getting alerts on AWS’s Step Functions requires a lot of extra (and expensive) infrastructure to merely understand if your step function failed. In contrast, the Dagster setup enables teams to create custom, detailed alerts without the need for any extra infrastructure.  These alerts can also be triggered on data issues (outliers, datatype violations, etc.) or on whole pipeline failures, which allows for greater insight when diagnosing why something went wrong. 

Multi-tab, cluttered AWS UI made debugging difficult (left) compared with (right) Dagster’s user-friendly presentation

Our team did all of the initial setup and deployment, connecting databases, setting up repositories and setting up patterns. We migrated three high-impact business-critical pipelines to the new platform and provided comprehensive documentation, enabling the client’s own engineering teams to efficiently take over and migrate the rest of the company’s pipelines with ease. We even set up a cookie-cutter template to allow engineers to easily create new pipelines from our tested best standards. Since then, they have successfully completed all targeted pipelines for migration to Dagster.

favicon

Transformation:

With the transition to Dagster, the client can now thoroughly test data pipeline changes, resulting in a more stable, less error-prone platform with streamlined debugging and faster deployment of changes and new features. Their engineering team can also leverage established patterns to efficiently add new pipelines and steps, freeing up time for feature development that drives the growth of the company.

Let Aprime help you overcome your challenges

and build your core technology

Are you ready to accelerate?