QA Engineer – Distributed Systems & AI
Full Time · Backend Engineer · Remote
Palo Alto, California, United States
Archetype AI is developing the world's first AI platform to bring AI into the real world. Formed by an exceptionally high-caliber team from Google, Archetype AI is building a foundation model for the physical world, a real-time multimodal LLM for real life, transforming real-world data into valuable insights and knowledge that people will be able to interact with naturally. It will help people in their real lives, not just online, because it understands the real-time physical environment and everything that happens in it.
Supported by deep tech venture funds in Silicon Valley, Archetype AI is currently pre-Series A, progressing rapidly to develop technology for their next stage. This presents a unique and once-in-a-lifetime opportunity to be part of an exciting AI team at the beginning of their journey, located in the heart of Silicon Valley.
Our team is headquartered in Palo Alto, California, with team members throughout the US and Europe.
We are actively growing, so if you are an exceptional candidate excited to work on the cutting edge of physical AI and don’t see a role that exactly fits you below you can contact us directly with your resume via jobs<at>archetypeai<dot>io.
We’re seeking a Quality Assurance (QA) Engineer with deep experience in testing complex distributed systems. You will help ensure the reliability, correctness, performance, and security of our AI platform by designing robust QA strategies and automation pipelines.
In this role, you’ll go beyond traditional QA, validating production-grade systems that support high-throughput inference and GPU clusters,. You’ll collaborate across engineering, product, and research to ensure what we build performs as expected—at scale and under pressure.
Design and implement comprehensive test plans for large-scale distributed systems and cloud services.
Build automated end-to-end tests, including performance, integration, stress, and regression testing.
Ensure system robustness by validating fault tolerance, failover recovery, and scalability in cloud-native environments (Kubernetes, etc.).
Collaborate with engineers to embed quality gates into CI/CD pipelines and deployment processes.
Debug system-level issues involving concurrency, race conditions, distributed state, or resource contention.
Develop and monitor key QA metrics.
Proactively raise quality standards across engineering through documentation, best practices, and code/test reviews.
7+ years of QA, test automation, or systems engineering experience for backend or distributed systems.
Deep knowledge of QA methodologies and system-level testing for large-scale services.
Experience writing automated tests in Python, JS; familiarity with test frameworks like PyTest or Robot.
Strong understanding of distributed systems principles (e.g., consistency, partition tolerance, replication).
Hands-on experience with container orchestration (e.g., Kubernetes), observability tools (e.g., Prometheus, Grafana), and cloud platforms (AWS, Azure, GCP).
Experience testing ML inference platforms, GPU workloads, or high-performance computing environments.
Familiarity with chaos testing, fault injection, and performance benchmarking tools.
Understanding of security testing practices in cloud and distributed environments.
Proven ability to scale QA infrastructure as systems and teams grow.
Ownership – You take initiative, follow through, and care deeply about quality and outcomes.
Motivation – You’re driven to solve complex problems and continuously raise the bar for yourself and your team.
Excellence – You bring discipline, clarity, and rigor to your craft—and help others do the same.
Collaboration – You work well with others, mentor generously, and contribute to a high-trust, high-performance culture.
Autofill application
Save time by importing your resume in one of the following formats: .pdf or .docx.