Researcher, Vision

Job type: Full Time · Department: Engineering · Work type: On-Site

Bengaluru, Karnataka, India

About Sarvam

Sarvam is building the bedrock of Sovereign AI for India. The company is developing India's full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India's leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.

About the Role

You will work across the full lifecycle of vision-language model (VLM) development — data, training, evaluation, and production. The team's scope will evolve as the field does; we want researchers who are comfortable with that and can lead.

What You'll Do

Research vision-language architectures — encoders, fusion mechanisms, pretraining objectives, and scaling behaviour
Design training methods (pretraining, SFT, RLHF, DPO) adapted for multilingual VLMs
Investigate data strategies — what mixtures, quality signals, and synthetic data approaches actually move the needle
Build evaluation frameworks and benchmarks, especially for Indic multimodal tasks
Study model failure modes, robustness, and interpretability
Work closely with engineers to ensure ideas are testable at scale — prototype fast, then validate properly
Engage with the broader research community through open-source contributions and collaborations

What We're Looking For

Deep understanding of vision-language models — training dynamics, architecture tradeoffs, and failure modes
Track record of good research — through publications, technical reports, or impactful shipped work
Rigorous experimental design — able to isolate variables and draw defensible conclusions
Strong PyTorch skills — runs experiments end to end
Intellectual range — willing to work across data, training, and evaluation problems

Bonus Points

PhD/Master's with relevant research experience in ML, Computer Vision, NLP, or related field
Research papers published at A/A* venues
Experience with multilingual or low-resource language modelling
Familiarity with document understanding, OCR, or structured visual prediction
Experience with large-scale data curation and its effect on model quality

Why Sarvam?

Sarvam is a fast-moving, high talent-density team building full-stack AI for India, working on problems that push the frontiers of AI with real population-scale impact.

Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar
High ownership and high impact, from day one
Everything we do is AI-first, from the way we build and ship to the way we think about problems
You can work on problems that could change how an entire country learns, works, and communicates

If you want to work on problems at the frontier of AI in India, Sarvam is the place to be.

Made with