Staff Software Engineer
Posted on Oct 7, 2024 by ServiceNow
San Diego, CA
IT
Immediate Start
Annual Salary
Full-Time
Job Description
Key Responsibilities:
Develop and maintain APIs using NVIDIA Triton Inference Server for scalable deployment of Large Language Models (LLM).
Implement and optimize pre-processing and post-processing pipelines tailored for LLMs to improve accuracy and efficiency.
Work with Retrieval-Augmented Generation (RAG) frameworks to enhance the model’s response generation capabilities.
Collaborate with data scientists, software engineers, and product teams to integrate and deploy ML solutions into production.
Troubleshoot and resolve issues related to model inference, performance, and scalability.
Key Responsibilities:
Develop and maintain APIs using NVIDIA Triton Inference Server for scalable deployment of Large Language Models (LLM).
Implement and optimize pre-processing and post-processing pipelines tailored for LLMs to improve accuracy and efficiency.
Work with Retrieval-Augmented Generation (RAG) frameworks to enhance the model’s response generation capabilities.
Collaborate with data scientists, software engineers, and product teams to integrate and deploy ML solutions into production.
Troubleshoot and resolve issues related to model inference, performance, and scalability.
Reference: 199168249
https://jobs.careeraddict.com/post/95999651
Staff Software Engineer
Posted on Oct 7, 2024 by ServiceNow
San Diego, CA
IT
Immediate Start
Annual Salary
Full-Time
Job Description
Key Responsibilities:
Develop and maintain APIs using NVIDIA Triton Inference Server for scalable deployment of Large Language Models (LLM).
Implement and optimize pre-processing and post-processing pipelines tailored for LLMs to improve accuracy and efficiency.
Work with Retrieval-Augmented Generation (RAG) frameworks to enhance the model’s response generation capabilities.
Collaborate with data scientists, software engineers, and product teams to integrate and deploy ML solutions into production.
Troubleshoot and resolve issues related to model inference, performance, and scalability.
Key Responsibilities:
Develop and maintain APIs using NVIDIA Triton Inference Server for scalable deployment of Large Language Models (LLM).
Implement and optimize pre-processing and post-processing pipelines tailored for LLMs to improve accuracy and efficiency.
Work with Retrieval-Augmented Generation (RAG) frameworks to enhance the model’s response generation capabilities.
Collaborate with data scientists, software engineers, and product teams to integrate and deploy ML solutions into production.
Troubleshoot and resolve issues related to model inference, performance, and scalability.
Reference: 199168249
Share this job:
Alert me to jobs like this:
Amplify your job search:
Expert career advice
Increase interview chances with our downloads and specialist services.
Visit Blog