Senior Data Solutions Engineer

Posted on Sep 19, 2024 by Epsilon
Westminster, CO
Engineering
Immediate Start
Annual Salary
Full-Time
Job Description

This Senior Data Developer position will focus on designing, developing, and supporting our Hadoop data solutions in Spark and Python (PySpark) while working with other components of the Hadoop ecosystem such as HDFS, Hive, Hue, Impala, Jupyter, and Airflow.  A successful candidate will work closely with business and portfolio leads to understand requirements then design and build innovative data solutions.  

Duties and Responsibilities

Design and development centered around PySpark, Python and Hadoop Framework.

Working with gigabytes/terabytes of data and must understand the challenges of transforming and enriching such large datasets.

Automation of new data source integration processes to evaluate impact on existing products

Leverage AI/LLMs to improve data matching techniques

Identify opportunities to migrate solutions from On-Prem to AWS Cloud

Provide effective solutions to address the business problems – strategic and tactical.

Collaboration with team members, project managers, business analysts and QA teams in conceptualizing, estimating and developing new solutions and enhancements.

Work closely with the stake holders to define and refine the big data platform to achieve sales, product, and strategic objectives.

Collaborate with other technology teams and architects to define and develop cross-function technology stack interactions.

Read, extract, transform, stage and load data to multiple targets, including Hadoop and Oracle.

Develop scripts around Hadoop framework to automate processes and existing flows.

Modify existing programming/code for new requirements.

Estimate work, and track progress through SDLC with JIRA/Confluence

Unit testing and debugging. Perform root cause analysis (RCA) for any failed processes.

Convert business requirements into technical design specifications and execute on them.

Participate in code reviews and keep applications/code base in sync with version control (GIT/Bitbucket).

Effective communication, self-motivation, and ability to work independently while remaining fully aligned within a distributed team environment.

Required Skills

Bachelors Degree in Computer Science (or Engineering equivalent) or Masters with 5+ years of experience with big data ingestion, transformation and staging using the following technologies/principles/methodologies:

Analysis, design and implementation experience with Hadoop distributed frameworks, including Python & Spark (SparkSQL, PySpark), HDFS, Hive, Impala, Hue, Cloudera Hadoop, Zeppelin, Jupyter, etc

Extensive experience with large volumes of data (measured in Terabytes/Billions of Transactions)

AWS experience (especially related to porting solutions from On-Prem àCloud)

AI project work, and exposure to leveraging LLMs/Machine Learning

Proficient knowledge of SQL with any RDBMS

Familiarity with RDD and Data Frames within Spark

Working knowledge of data analytics

Troubleshooting and complex problem-solving skills

Working knowledge of Linux/Unix environments and comfort with Unix Shell scripts (ksh, bash)

Basic Hadoop administration knowledge

DevOps Knowledge is an advantage

Ability to work within deadlines and effectively prioritize and execute on tasks

Strong communication skills (verbal and written) with ability to communicate across teams, internal and external at all levels

Certifications

Any of these:

CCA Spark and Hadoop Developer.

MapR Certified Spark Developer (MCSD).

MapR Certified Hadoop Developer (MCHD).

HDP Certified Apache Spark Developer.

HDP Certified Developer.

Preferred Skills

Technical:Working knowledge of Oracle databases and PL/SQL.

Hadoop Admin & Dev-Ops.

Non-Technical:Good analytical thinking and problem-solving skills.

Ability to diagnose and troubleshoot problems quickly.

Motivated to learn new technologies, applications, and domains.

Possess appetite for learning through exploration and reverse engineering.

Strong time management skills.

Ability to take full ownership of tasks and projects.

Behavioral Attributes:Team player with excellent interpersonal skills.

Good verbal and written communication.

Possess Can-Do attitude to overcome any kind of challenges.

Salary Range: $75,000.00 - $(phone number removed)

The application deadline for this job posting is 11/5/2024.

Reference: 201922697

https://jobs.careeraddict.com/post/95432970

Senior Data Solutions Engineer

Posted on Sep 19, 2024 by Epsilon

Westminster, CO
Engineering
Immediate Start
Annual Salary
Full-Time
Job Description

This Senior Data Developer position will focus on designing, developing, and supporting our Hadoop data solutions in Spark and Python (PySpark) while working with other components of the Hadoop ecosystem such as HDFS, Hive, Hue, Impala, Jupyter, and Airflow.  A successful candidate will work closely with business and portfolio leads to understand requirements then design and build innovative data solutions.  

Duties and Responsibilities

Design and development centered around PySpark, Python and Hadoop Framework.

Working with gigabytes/terabytes of data and must understand the challenges of transforming and enriching such large datasets.

Automation of new data source integration processes to evaluate impact on existing products

Leverage AI/LLMs to improve data matching techniques

Identify opportunities to migrate solutions from On-Prem to AWS Cloud

Provide effective solutions to address the business problems – strategic and tactical.

Collaboration with team members, project managers, business analysts and QA teams in conceptualizing, estimating and developing new solutions and enhancements.

Work closely with the stake holders to define and refine the big data platform to achieve sales, product, and strategic objectives.

Collaborate with other technology teams and architects to define and develop cross-function technology stack interactions.

Read, extract, transform, stage and load data to multiple targets, including Hadoop and Oracle.

Develop scripts around Hadoop framework to automate processes and existing flows.

Modify existing programming/code for new requirements.

Estimate work, and track progress through SDLC with JIRA/Confluence

Unit testing and debugging. Perform root cause analysis (RCA) for any failed processes.

Convert business requirements into technical design specifications and execute on them.

Participate in code reviews and keep applications/code base in sync with version control (GIT/Bitbucket).

Effective communication, self-motivation, and ability to work independently while remaining fully aligned within a distributed team environment.

Required Skills

Bachelors Degree in Computer Science (or Engineering equivalent) or Masters with 5+ years of experience with big data ingestion, transformation and staging using the following technologies/principles/methodologies:

Analysis, design and implementation experience with Hadoop distributed frameworks, including Python & Spark (SparkSQL, PySpark), HDFS, Hive, Impala, Hue, Cloudera Hadoop, Zeppelin, Jupyter, etc

Extensive experience with large volumes of data (measured in Terabytes/Billions of Transactions)

AWS experience (especially related to porting solutions from On-Prem àCloud)

AI project work, and exposure to leveraging LLMs/Machine Learning

Proficient knowledge of SQL with any RDBMS

Familiarity with RDD and Data Frames within Spark

Working knowledge of data analytics

Troubleshooting and complex problem-solving skills

Working knowledge of Linux/Unix environments and comfort with Unix Shell scripts (ksh, bash)

Basic Hadoop administration knowledge

DevOps Knowledge is an advantage

Ability to work within deadlines and effectively prioritize and execute on tasks

Strong communication skills (verbal and written) with ability to communicate across teams, internal and external at all levels

Certifications

Any of these:

CCA Spark and Hadoop Developer.

MapR Certified Spark Developer (MCSD).

MapR Certified Hadoop Developer (MCHD).

HDP Certified Apache Spark Developer.

HDP Certified Developer.

Preferred Skills

Technical:Working knowledge of Oracle databases and PL/SQL.

Hadoop Admin & Dev-Ops.

Non-Technical:Good analytical thinking and problem-solving skills.

Ability to diagnose and troubleshoot problems quickly.

Motivated to learn new technologies, applications, and domains.

Possess appetite for learning through exploration and reverse engineering.

Strong time management skills.

Ability to take full ownership of tasks and projects.

Behavioral Attributes:Team player with excellent interpersonal skills.

Good verbal and written communication.

Possess Can-Do attitude to overcome any kind of challenges.

Salary Range: $75,000.00 - $(phone number removed)

The application deadline for this job posting is 11/5/2024.

Reference: 201922697

Share this job:
CareerAddict

Alert me to jobs like this:

Amplify your job search:

CV/résumé help

Increase interview chances with our downloads and specialist services.

CV Help

Expert career advice

Increase interview chances with our downloads and specialist services.

Visit Blog

Job compatibility

Increase interview chances with our downloads and specialist services.

Start Test