Introduction
In some situations, it is necessary to make predictions on sample batches. Often this type of requirement is accompanied by the following specification:
- Execution must be frequent on a recurring schedule.
In this post, we’ll see how we can transform that into a pipeline that works in production.
The problem
To be clear, let’s imagine a scenario where the data science team builds a scoring model. Stakeholders say they need to score our new users every week, our task is to put it into production.
The approach
To tackle the problem, we will build a Python CLI to pass a command and run the score over the users, we also want to only run the score when we needed. To do that we use the following tools:
- S3
- ECS
- Typer
- Docker
- Airflow (MWAA)
- Github Actions