(YC W20) Is Hiring Site Reliability Engineers takes container images and converts them into fleets of Firecracker VMs, running on our own hardware around the world.
It’s easy on Fly to run applications close to users, no matter where they are in the world. Try it out! If you’ve got a working container already, it can be running here in less than 10 minutes:

We’ve got a lot of fun ops challenges here. We’re HashiCorp stack (Nomad, Consul, and Vault), plus Firecracker, plus WireGuard, which is what our network fabric is built on. Our users drive through a Rails-based GraphQL API. We host a heavy-duty Prometheus-style metrics cluster, an ElasticSearch cluster for logging, a monitoring system using Sensu Go, BGP4 peering with Bird… the list goes on.

We’re hiring SRE-types to help us manage and keep this stuff running smoothly. The role includes:

* Intense observability and monitoring, so that Kurt only gets paged during his on-calls when something important happens.

* Coordinating deployments of new infrastructure across a fleet of servers with custom kernel and networking configurations.

* Enabling us to quickly ship new features to prod with canaries or blues and greens or whatever the cool kids are doing, because some of what we deploy right now is scary enough to slow us down a bit.

We’re a small, almost entirely technical team. Ops and dev are tightly integrated, and devs don’t throw things over the wall expecting ops to magically keep them running.

We’re remote, in Chicago, Montreal, Colorado, Virginia, Utah, Wisconsin, and London.

We all share an on-call rotation, which is a company value that won’t be changing any time soon.

We’re weird about hiring. We’re deeply skeptical both of resumes and interviews. We’re believers in aptitude and of discovering and developing talent. Regardless of your background, we’re interested in hearing from you; you can’t waste our time. More about the role and our hiring process here:

Or just reach out:

