Backend Software Engineer: Inference
Job type: Full Time · Department: Backend Engineer · Work type: Remote
San Mateo, California, United States
Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming real-world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real-time physical environment and everything that happens in it.
Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently at the Series A stage and is progressing rapidly to develop technology for their next stage. This presents a unique and once-in-a-lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley.
Our team is headquartered in San Mateo, California, with team members throughout the US and Europe.
We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and don’t see a role that exactly fits you below you can contact us directly with your resume via jobs<at>archetypeai<dot>io.
We’re looking for a highly motivated backend engineer with extensive experience in designing and developing performant, scalable, and resilient inference services.
You’ll work closely with researchers, ML engineers, and product teams to bring cutting-edge AI capabilities into production—at scale, with reliability, and under real-world constraints.
This is an opportunity to own key services across our inference platform, from intelligent request routing to fleet-wide orchestration across diverse AI accelerators, and to contribute to some of the most advanced real-time AI serving systems in production today.
Architect, implement, and maintain distributed inference serving systems that support high-throughput, low-latency model serving across multiple AI accelerator families and cloud platforms.
Enable breakthrough research by providing scientists with high-performance inference infrastructure to develop next-generation models.
Continuously optimize inference performance—including batching, caching, and request routing strategies—to maximize compute efficiency under explosive customer growth.
Build tooling and observability to monitor system health, identify bottlenecks, and proactively resolve instability.
Introduce new techniques, architectures, and best practices to push the limits of scalability, efficiency, and reliability.
Own problems end-to-end—from design to deployment—with a strong bias toward quality, automation, and continuous improvement.
Balance rapid iteration on early-stage systems with long-term maintainability and architectural soundness.
Contribute to a culture of engineering excellence, mentorship, and team-first collaboration.
7+ years of professional software engineering experience, with a focus on inference.
Deep understanding of machine learning systems at scale including load balancing, request routing, or traffic management.
Experience with inference optimization, batching, and caching strategies
Ability to design APIs and service interfaces for real-time and latency-sensitive use cases..
Experience building and operating production-grade systems at scale in cloud environments (e.g., Azure, AWS, GCP).
Strong debugging, instrumentation, and observability skills across distributed systems.
Demonstrated ownership of complex technical problems and ability to learn and adapt quickly.
Proven track record of scaling systems through rapid growth and rebuilding or refactoring for new demands.
Experience building systems that degrade gracefully under load: backpressure, rate limiting, circuit breaking, bulkheading, and queuing.
Strong understanding of failure modes in distributed systems and mitigation techniques.
Proven experience owning high-availability services (e.g., SLOs, incident response, on-call), including capacity planning and load testing.
Proficiency in multiple programming languages (e.g., Rust, C++, Python).
Experience designing internal tools or platforms to support developer productivity and experimentation.
Strong product intuition, and ability to collaborate closely with cross-functional teams including research and design.
Ownership – You take initiative, follow through, and care deeply about quality and outcomes.
Motivation – You’re driven to solve complex problems and continuously raise the bar for yourself and your team.
Excellence – You bring discipline, clarity, and rigor to your craft—and help others do the same.
Collaboration – You work well with others, mentor generously, and contribute to a high-trust, high-performance culture.
Autofill application
Save time by importing your resume in one of the following formats: .pdf or .docx.