[Remote] Staff AI Engineer - Grafana Ops, AI/ML | USA | Remote
Note: The job is a remote job and is open to candidates in USA. Grafana Labs, the company behind the open observability cloud, is seeking a Staff AI Engineer to help develop AI-driven features for their observability tools. The role involves building high-performance AI solutions, collaborating with cross-functional teams, and taking ownership of projects to enhance user experience and incident management.
Responsibilities
- Build and deliver AI solutions: Take ownership of developing high-performance AI features to help users detect, triage, and resolve incidents using observability data and tools
- Rapid experimentation and iteration: Implement a highly iterative process where you quickly prototype, test, and validate with real users, including shipping and evolving LLM- or agent-powered workflows for incident lifecycle management and automated analysis tasks
- Collaborate cross-functionally: Work with data analysts, product managers, and designers to shape AI-driven product features, including integration of agentic components with internal tools, alerting systems, runbooks, and developer workflows
- Utilize AI tools effectively: Use AI and automation tools to enhance both product functionality and your own development workflows
- Effective communication: You'll be working in a highly dynamic and collaborative environment, so we need someone who can communicate effectively and contribute across teams
- Ownership and impact: Take full ownership of the AI solutions you develop, ensuring they are not only innovative but also scalable, maintainable, and aligned with real user workflows
Skills
- Experience with LLMs, prompt engineering, and building applications powered by GenAI
- Proven track record of delivering software that made it into production and is actively used by users
- Exposure to working in cloud-native environments (e.g., AWS, Google Cloud Platform, Azure)
- Experience using observability tools to understand and troubleshoot system behavior
- Experience building or working with agent frameworks or multi-agent workflows
- Experience with infrastructure / devops related tooling: Kubernetes, Docker, Terraform or similar for deployments
- Familiarity with model fine-tuning techniques
- Experience building observability tooling
Benefits
- Benefits include equity, bonus (if applicable) and other benefits listed here.
- All of our roles include Restricted Stock Units (RSUs), giving every team member ownership in Grafana Labs' success.
- We believe in shared outcomes-RSUs help us stay aligned and invested as we scale globally.
- 100% Remote, Global Culture - As a remote-only company, we bring together talent from around the world, united by a culture of collaboration and shared purpose.
- Scaling Organization - Tackle meaningful work in a high-growth, ever-evolving environment.
- Transparent Communication - Expect open decision-making and regular company-wide updates.
- Innovation-Driven - Autonomy and support to ship great work and try new things.
- Open Source Roots - Built on community-driven values that shape how we work.
- Empowered Teams - High trust, low ego culture that values outcomes over optics.
- Career Growth Pathways - Defined opportunities to grow and develop your career.
- Approachable Leadership - Transparent execs who are involved, visible, and human.
- Passionate People - Join a team of smart, supportive folks who care deeply about what they do.
- In-Person onboarding - We want you to thrive from day 1 with your fellow new 'Grafanistas' to learn all about what we do and how we do it.
- Balance is Key - We operate a global annual leave policy of 30 days per annum. 3 days of your annual leave entitlement are reserved for Grafana Shutdown Days to allow the team to really disconnect. *We will comply with local legislation where applicable.
- We invest heavily in developer productivity. You can use modern AI coding assistants as part of your daily workflow (your choice of tools, within security guidelines), backed by a company-funded usage budget so you can iterate quickly without unnecessary friction.
- You'll also have access to frontier models (e.g., GPT-Codex 5/3, Claude Opus 4.6, Gemini 3 Pro).
Company Overview
Company H1B Sponsorship