Data Engineer - Spark/Python
Posted on Jun 2, 2021 by Gazelle Global Consulting
My client seeks a data engineer who will help build new or improve existing data pipelines.
You should be comfortable working with large or fast moving data, have a solid understanding of distributed processing frameworks, and a software engineering mindset
This role is not a data scientist role. It is not expected to know statistics or business or any python libraries used for creating ML models
This is not a cloud Or devops role
This role is not a python/Scala/Java programmer role. It would be good if you have used Python in Spark programming, but you are not expected to code in python
Role involves knowing and coding in big data, transforming data in the data pipeline, scheduling data pipelines, writing performant big data pipelines. If you have not done this using Spark, this is not
Over all 7 to 12 years of IT experience. Extensive experience in Big Data, Analytics, ETL technologies
Minimum 2 to 4 years of experience in Spark programming using either Python/Scala/Java.
Application Development background on big data along with knowledge of Analytics libraries and big data computing libraries
Hands on experience in coding, designing and development of complex data pipelines using big data technologies
Experience in developing applications on Big Data. Design and build highly scalable data pipelines
Experience in Python, SQL Database, Spark, non-relational databases
Responsible to ingest data from files, streams and databases. Process the data using Spark, Python
Develop programs in PySpark as part of data cleaning and processing
Responsible to design and develop distributed, high volume, high velocity multi-threaded event processing system
Nice To Have Skills:
Experience in Palantir
Knowledge of CI/CD Pipelines, Git, Jenkins
Have worked with large datasets
Proficient reading and understanding enterprise-grade PySpark code
For immediate consideration, please forward your latest CV.