Principal Engineer
Job type: Full Time · Department: Engineering (R&D) · Work type: Hybrid
San Francisco, California, United States
In an unmarked building somewhere in Silicon Valley, a small team of engineers is working on what could become one of the most transformative technologies in enterprise computing: autonomous infrastructure that manages itself. Sailplane, backed by AI kingmaker Khosla Ventures (OpenAI's first investor) and seed specialist True Ventures, is building a "self-driving cloud" - intelligent agents capable of autonomously managing the largest and most advanced AI infrastructure on the planet.
Sailplane is solving one of the most complex challenges in modern computing: autonomous management of massive AI data centers. We are creating intelligent agents that operate rack-scale systems worth millions of dollars. Think Waymo for cloud infrastructure.
"We're building million-dollar agents," explains co-founder Sam Ramji, who previously led product at Google Cloud Platform and brought Linux to Microsoft. "These aren't consumer-grade chatbots - they're sophisticated autonomous systems managing rack-scale hardware worth millions per unit."
Sailplane is an early-stage AI infrastructure startup. Expect to wear many hats (writing code and evaluations, building infrastructure, working with customers). You will bring a startup mindset, eager to take ownership of projects, navigate ambiguity, and move quickly to solve challenging problems in a fast-paced environment.
As a Principal Engineer, you'll be a first-class generalist and leader moving across Sailplane agents and infrastructure. You are a hands-on technical leader who can turn AI research into a developer-friendly platform for infrastructure and platform teams. This is a senior technical individual contributor role focused on hands-on coding, systems thinking, and prototyping. You will act as a technical multiplier rather than a people manager.
This hybrid position reports to the CEO and is expected to work from our downtown San Francisco office 2 to 3 days per week.
Design and evolve the control plane for agents: planning and execution loops, workflows, callbacks, and state models
Implement sandboxing, dry-run/preview modes, invariants, approvals, and rollback strategies so agents can safely change real infrastructure and applications
Take hierarchical planning / agent research prototypes and build production-grade agents, services and APIs around them (auth, rate limits, quotas, retries)
Anticipate production system needs (security, networking, SLOs/SLAs) and instrument for reliability and usage via observability (logs, traces, guardrails)
Guide architecture trade-offs and reason about IP boundaries as an engineering design dimension
Build, deploy, monitor, and operate LLMs in production on-premises in diverse customer environments and implement MLOps best practices (CI/CD pipelines, containerization, continuous monitoring) to ensure reliable performance
Lead through influence, setting engineering standards for code quality, testing, documentation, and on-call practices
Partner with founders to prioritize what to build next based on customer needs, market demand and technical implementation
~10+ years of software engineering experience, including time operating at Staff/Principal scope or equivalent
Deep experience in designing and building AI systems, large-scale distributed systems, infra/devtools, SaaS platforms, or other safety-critical automation domains
Strong proficiency in at least one backend language (e.g., Go, Rust, Java, C++), plus comfort with Python for glue/ML/agent work
Hands-on with cloud platforms (AWS/GCP/Azure), containers, CI/CD, and modern deployment practices (Kubernetes or equivalent)
Track record of building systems and shipping with reliability and safety in mind
Build a high-trust, inclusive, feedback-driven engineering culture that values autonomy, learning, and accountability
Excellent written and verbal communication skills, with the ability to influence across technical and business stakeholders
How you work
Customer-centric: you like talking to users and grounding decisions in real workflows
Calm, pragmatic, and low-ego: you can operate with incomplete information and still make clear decisions
Enjoy oscillating between high-level architecture and low-level debugging when something breaks in production
Comprehensive Health, Dental, and Vision coverage beginning on the first day for employees and their families, paid 100% by Sailplane
Equity grant participation
Flexible PTO with no accrual or set annual cap, plus 15 paid holidays per year
Health and Wellness stipend ($3,000 annually) to help support your personal health goals
AI tools stipend ($1,200 annually) to encourage hands-on familiarity with emerging tools
12 weeks of paid parental leave
Autofill application
Save time by importing your resume in one of the following formats: .pdf or .docx.