GPU Infrastructure Engineer

Full Time · Engineering · On-Site

Bengaluru, Karnataka, India

Company Overview

Sarvam.ai is a pioneering generative AI startup headquartered in Bengaluru, India. Our mission is to make generative AI accessible and impactful for Bharat. Founded by a team of AI experts, Sarvam.ai is dedicated to developing cost-effective, high-performance AI agents tailored for the Indian market, enabling enterprises to tap into new opportunities and foster deeper customer connections. Join us in reshaping AI for India and beyond.

Job Summary

We are looking for a GPU Infrastructure Engineer to build and manage high-performance AI infrastructure, optimize model deployment, and streamline model inferencing at scale. You will work across cloud and on-prem GPU clusters, manage model CI/CD pipelines, and optimize AI workloads for efficiency and performance. This role requires expertise in GPU-accelerated computing, cloud infrastructure, offline deployments, and monitoring AI workloads.

Key Responsibilities

Design, deploy, and optimize GPU infrastructure for AI workloads in cloud and on-prem environments.
Manage and scale GPU instances on AWS, Azure, GCP, and on-prem setups.
Build and maintain model CI/CD pipelines for efficient model training, deployment, and inferencing.
Optimize deep learning models for inference using TensorRT, ONNX, or Nvidia NVCF.
Implement and manage Cloud PTU (Preemptible/Tensor Processing Units) and offline deployments for AI models.
Develop and optimize data pipelines to support real-time and batch AI processing.
Deploy and monitor AI models in production, ensuring high availability, low latency, and cost efficiency.
Implement logging, monitoring, and alerting solutions to track model performance and infrastructure health.
Work closely with full-stack engineers, ML engineers, and data teams to ensure seamless model deployment.

Must-Have Skills and Qualifications

Educational Background: Bachelor's degree in Computer Science, Engineering, or related field (2024/2025 graduates).
GPU & Model Deployment: Hands-on experience with Nvidia GPUs, CUDA, TensorRT, and AI model inferencing.
Cloud & Infra Management: Experience deploying and managing AI workloads on AWS, Azure, or GCP.
Model CI/CD: Proficiency in setting up continuous integration and deployment pipelines for ML models.
Offline & Edge AI Deployments: Understanding of running models in offline and on-device environments.
Infrastructure as Code (IaC): Experience with Terraform, Kubernetes, and Docker.
Monitoring & Alerts: Experience with tools like Prometheus, Grafana, ELK Stack, or CloudWatch.
Fullstack & Data Pipelines: Familiarity with backend APIs, data processing workflows, and ML pipelines.
Version Control & Collaboration: Strong Git skills and experience working in cross-functional teams.

Good to Have

Experience with Nvidia NVCF for model optimization.
Knowledge of LLM inference frameworks like vLLM, DeepSpeed, or Hugging Face Triton.
Familiarity with FP16/INT8 quantization and pruning techniques for model optimization.
Exposure to serverless AI inference (AWS Lambda, SageMaker, Azure ML).

Contributions to open-source AI infrastructure projects or a strong GitHub portfolio showcasing ML model deployment expertise.

Made with