AWS Data Engineer
Posted on Oct 31, 2020 by Trust In Soda
AWS Data Engineer - Belgium (On-Site) - 11 Month Contract
AWS Data Engineer
Length: 11 Months
Location: Brussels, Belgium
Soda's key client in Belgium is working on a central policy support platform that collects the energy consumption data of all buildings and infrastructure used by the public sector with the goal of presenting a report about the evolution consumption. The platform will allow the client to visualizes savings measures, grants and funds, succession tools and the 2030 targets.
The platform is set up on Amazon Web Services and uses various PAAS and IAAS solutions.
The data lake/data warehouse used for all kinds of reporting and analysis purposes uses the following Amazon Web services:
- Data pipelines and EMR for loading and processing data.
- S3 for storing files
- Amazon Redshift (DWH)
- Tableau as a reporting environment
- Athena for querying flat files in the data lake
- SQL server as Back End database of the Terra web application
- Lambda for running Back End services
- DMS for synchronization of data including from SQL server to redshift
For example, for processing the consumption data provided on a daily basis, Python and Spark are used as an ETL process to merge the new consumption data in the Redshift DWH with pre-existing data.
In addition, links are made with many other data sources for enriching the data warehouse. The code's version control uses GIT, and the release process (dev, stg, prod) uses MS Devops. The latter is also used for following up sprints, user stories, tasks, ...
- Manage the existing stack of data flows and pipelines in the AWS cloud environment that uses, among other things, S3, EMR, Spark, Redshift, Lambda, Python.
- Build complex data models to generate further analytical insight
- Write high-quality code to further develop the data platform.
- Link/integrate new data sources of various origins into the existing solution.
- Analyse the existing processes and advise where performance gains and cost savings can be achieved.
- Implement quality controls of the data and alert the stakeholders so that the necessary actions can be taken.
- Data Pipelines
- SQL Server
- Agile SCRUM methodology