Disclaimer
This post is WIP but I would to share you.
Introduction
One of the DATA ENGINEERING TEAM’s main objectives is to ensure data reliability and scalability; to achieve that, the infrastructure plays a main role.
Data Architecture
The DevOps and SWE best practices play a fundamental role, we try to implement both in DE.
So far I used at least one year the following architecture: The idea behind is focuses in:
- TDD
- Quick deploy
- Decoupling SQL scripts
- Continous Deployment and Integration
- Production and Development enviroments
How works?
We work following the Gitflow Workflow, which allows us to work on the differents requirements at the same time.
Every time somebody does a PR automatically runs the following tests (These tests also can run local):
- Flake8
- PyDocStyle
- MyPY
We need somebody in the team to check our PR to merge in the development instance.
Finally it all works ok, we make a PR from DEV to MASTER. (The master branch only accepts PR from the DEV branch. )