Return to jobs list

Researcher, Vision

Job type: Full Time · Department: Engineering · Work type: On-Site

Bengaluru, Karnataka, India

About Sarvam

Sarvam is building the bedrock of Sovereign AI for India. The company is developing India's full-stack sovereign AI platform, building across research, models, infrastructure and applications with a singular focus on making AI genuinely work for India. Sarvam works with leading enterprises and public institutions and is backed by Lightspeed, Peak XV, and Khosla Ventures. Sarvam partners with India's leading brands, including Tata Capital, SBI Life, CRED, IDFC, and LIC.

About the Role

You will work across the full lifecycle of vision-language model (VLM) development — data, training, evaluation, and production. The team's scope will evolve as the field does; we want researchers who are comfortable with that and can lead.

What You'll Do

  • Research vision-language architectures — encoders, fusion mechanisms, pretraining objectives, and scaling behaviour

  • Design training methods (pretraining, SFT, RLHF, DPO) adapted for multilingual VLMs

  • Investigate data strategies — what mixtures, quality signals, and synthetic data approaches actually move the needle

  • Build evaluation frameworks and benchmarks, especially for Indic multimodal tasks

  • Study model failure modes, robustness, and interpretability

  • Work closely with engineers to ensure ideas are testable at scale — prototype fast, then validate properly

  • Engage with the broader research community through open-source contributions and collaborations

What We're Looking For

  • Deep understanding of vision-language models — training dynamics, architecture tradeoffs, and failure modes

  • Track record of good research — through publications, technical reports, or impactful shipped work

  • Rigorous experimental design — able to isolate variables and draw defensible conclusions

  • Strong PyTorch skills — runs experiments end to end

  • Intellectual range — willing to work across data, training, and evaluation problems

Bonus Points

  • PhD/Master's with relevant research experience in ML, Computer Vision, NLP, or related field

  • Research papers published at A/A* venues

  • Experience with multilingual or low-resource language modelling

  • Familiarity with document understanding, OCR, or structured visual prediction

  • Experience with large-scale data curation and its effect on model quality

Why Sarvam?

Sarvam is a fast-moving, high talent-density team building full-stack AI for India, working on problems that push the frontiers of AI with real population-scale impact.

  • Work alongside researchers, engineers, builders, and business leaders who move fast and hold each other to a very high bar

  • High ownership and high impact, from day one

  • Everything we do is AI-first, from the way we build and ship to the way we think about problems

  • You can work on problems that could change how an entire country learns, works, and communicates

If you want to work on problems at the frontier of AI in India, Sarvam is the place to be.

Made with