Return to jobs list

Site Reliability Engineer

Full Time · Backend Engineer · Remote

Palo Alto, California, United States

About Archetype AI

Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming real-world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real-time physical environment and everything that happens in it.

Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently pre-Series A, progressing rapidly to develop technology for their next stage. This presents a unique and once-in-a-lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley.

Our team is headquartered in Palo Alto, California, with team members throughout the US and Europe.

We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and don’t see a role that exactly fits you below you can contact us directly with your resume via jobs<at>archetypeai<dot>io.

About the Role

As a Site Reliability Engineer (SRE) at Archetype AI, you will be responsible for designing, scaling, and maintaining the infrastructure that powers our AI-driven products. You will collaborate with backend engineers and ML researchers to ensure that our distributed platforms are fault-tolerant, performant, and highly available.

Core Responsibilities

  • Design, build, and operate highly available distributed systems.

  • Collaborate with engineering and ML teams to ensure reliable deployment of backend services (in Rust, C++ or similar).

  • Implement monitoring, alerting, and observability solutions across infrastructure.

  • Automate deployments, scaling, and infrastructure provisioning using infrastructure-as-code.

  • Diagnose and resolve performance bottlenecks, system outages, and production incidents.

  • Support AI/ML infrastructure for training and serving models at scale, including GPU clusters, pipelines, and inference services.

  • Contribute to infrastructure architecture, standards, and operational best practices.

Minimum Qualifications

  • 5+ years of experience as SRE, DevOps, or Systems Engineer.

  • Strong expertise in distributed systems, fault-tolerant architectures, and large-scale production environments.

  • Proficiency in Rust, C++, or other backend languages with willingness to learn.

  • Solid experience with Kubernetes, containers, and cloud platforms (AWS, GCP, Azure).

  • Hands-on experience with monitoring and observability tools (Prometheus, Grafana, ELK, OpenTelemetry).

  • Experience with data pipelines, messaging systems, and streaming technologies (Kafka, Pulsar, etc.).

  • Familiarity with AI/ML infrastructure (training pipelines, GPU clusters, inference systems).

  • Strong debugging, problem-solving, and automation mindset (Terraform, Ansible, Pulumi, scripting).

  • Excellent communication and collaboration skills.

Preferred Qualifications

  • Experience with real-time or low-latency systems.

  • Open-source contributions to distributed systems or infrastructure projects.

  • Knowledge of security best practices for distributed environments.

  • Experience with edge or embedded systems and sensor-based infrastructure.

  • Background in multimodal data fusion or physical-world perception systems.

What We Value

  • Ownership – You take initiative, follow through, and care deeply about quality and outcomes.

  • Motivation – You’re driven to solve complex problems and continuously raise the bar for yourself and your team.

  • Excellence – You bring discipline, clarity, and rigor to your craft—and help others do the same.

  • Collaboration – You work well with others, mentor generously, and contribute to a high-trust, high-performance culture.

Made with