Back

[Remote] Senior Site Reliability Engineer

Worldwide Salaried Open

Note: The job is a remote job and is open to candidates in USA. Doghouse Recruitment is seeking a Senior/Staff Site Reliability Engineer to join their client's team building a cloud platform for high-throughput, compute-heavy workloads. The role involves owning production reliability, defining SLIs/SLOs, and improving deployment safety while working in a bare-metal environment.

Responsibilities

  • Define SLIs/SLOs
  • Run error budget conversations
  • Ship changes that reduce incidents and improve latency (p95/p99)
  • Build automation to kill toil
  • Improve deployment safety (canary/rollback)
  • Turn observability into signal rather than noise

Skills

  • Extensive Production Engineering experience running bare metal / on-prem / data center infrastructure (not public cloud only)
  • Deep hands-on expertise in Linux systems debugging and performance (CPU, memory, IO, - level behaviors)
  • Strong understanding of networking (DNS/TCP/TLS, latency, packet loss, congestion, troubleshooting under load)
  • Strong Kubernetes experience beyond manifests: scheduler behavior, autoscaling edge cases, kubelet pressure/evictions, etcd/control plane
  • Experience with Terraform, Docker, Helm, and modern CI/CD practices
  • Strong coding skills are required for this role either in Go, and/or Python, beyond automation scripting - Real engineering capability is a must
  • Experience in Low Latency environments

Company Overview

  • Recruitment for your technology teams. You don't need another agency flooding your inbox with mismatched candidates. It was founded in 2015, and is headquartered in Amsterdam, North Holland, NL, with a workforce of 11-50 employees. Its website is http://www.doghouse.nl.
  • Apply To This Job

    More jobs