Dutch Speaking Data Engineer
Posted on Nov 22, 2020 by Applause IT Ltd
Dutch Speaking Data Engineer
- You manage the existing stack of data flows and pipelines in the AWS cloud environment that uses S3, EMR, Spark, Redshift, etc., Lambda, Python.
- You will develop (further) complex data models to generate further analytic insight.
- You will write high-quality code to further develop the data platform. The platform is scalable and easy to maintain.
- You will link/integrate new data sources from various origins into the existing solution.
- You analyse existing processes and advise where performance gains and cost savings can be achieved.
- You implement data quality controls and alert stakeholders so that the necessary actions can be taken. All projects within the VEB are developed according to the Agile SCRUM methodology.
As a data engineer you work closely with the other members of the multidisciplinary team. You report to the project manager of the team.
- Higher education (Master or Bachelor) with technical/engineering or business economics background or equivalent through experience.
- Experience with project-oriented work and data warehouse modelling or a demonstrable training in data warehouse modelling
- At least 3 years of experience in data warehouse modelling
- Language requirement: Dutch at European CEFR level C2
- Advanced knowledge of planning, designing and building a data warehouse application using of tools such as Cognos or Datastage.
- Good knowledge of data warehousing concepts, processes and architectures. - Knowledge of data warehouse security aspects.
- Knowledge of building the following data warehouse components: data integration (ETL), reporting & analysis (BI), databases and metadata.
- Determining data loading and data refreshing strategies The Terra platform is set up on Amazon Web Services and uses various PAAS and IAAS solutions. The data lake/data warehouse that is used for all kinds of reporting and analysis purposes uses, among others, the following Amazon Web services:
- Data pipelines and EMR for loading and processing data.
- S3 for file storage
- Amazon Redshift (DWH)
- Tableau as reporting environment
- Athena for querying flat files in the data lake
- SQL server as Back End database of the Terra web application
- Lambda for running Back End services
- DMS for synchronisation of data eg from SQL server to redshift. For example, for processing the usage data that are supplied on a daily basis, Python and Spark are used as ETL processes to merge the new usage data in the Redshift DWH with existing data.
In addition, links are made with numerous other data sources to enrich the data warehouse. GIT is used to version control the code and the release process (dev, stg, prod) uses MS DevOps. The latter is also used for following up sprints, user stories and tasks.