Back

[Remote] Senior Cloud DevOps & Infrastructure Engineer

Worldwide Salaried Open

Note: The job is a remote job and is open to candidates in USA. Diverse Lynx is seeking a Senior Cloud DevOps & Infrastructure Engineer with a focus on GCP and AI. The role involves designing, deploying, and maintaining secure and scalable cloud infrastructure, primarily on a multi-cloud platform, while implementing GitOps best practices and supporting AI/ML workloads.

Responsibilities

  • Infrastructure as Code (IaC): Architect and provision production-grade infrastructure using Terraform. Manage state files, modules, and ensure infrastructure immutability
  • AIML: Experience with LLM Models - in multi cloud environment
  • Kubernetes & Containerization: Design and manage clusters. Create and optimize Docker files (multi-stage builds, distroless/hardened images). Manage complex deployments using Helm Charts
  • CI/CD & GitOps: Build end-to-end CI/CD pipelines using GitLab CI. Implement GitOps workflows to synchronize infrastructure and application state
  • Design, configure, and manage scalable and secure cloud infrastructure for MLOps
  • AI Infrastructure Support: Configure and maintain environments suitable for AI/ML workloads (GPU node pools, LLM integration, large model serving, high-performance storage)
  • Production Support & Troubleshooting: Act as the primary escalation point for deployment failures, network and Infra issues. Perform Root Cause Analysis (RCA)
  • Security & Compliance: Implement 'Secure by Design' principles
  • Having good knowledge of network security, identity and privilege access management, landing zone concepts for cloud platforms (Azure, AWS)
  • Multi-Cloud Strategy: While GCP is primary, maintain and support secondary environments in AWS (and potentially Azure) to ensure business continuity

Skills

  • 6 – 8 Years of experience in Cloud Infrastructure & DevOps Engineering
  • Expert in Kubernetes, Terraform, and GitLab CI/CD
  • Experience supporting AI/ML workloads
  • Architect and provision production-grade infrastructure using Terraform
  • Experience with LLM Models in multi cloud environment
  • Design and manage Kubernetes clusters
  • Create and optimize Docker files (multi-stage builds, distroless/hardened images)
  • Manage complex deployments using Helm Charts
  • Build end-to-end CI/CD pipelines using GitLab CI
  • Implement GitOps workflows to synchronize infrastructure and application state
  • Design, configure, and manage scalable and secure cloud infrastructure for MLOps
  • Configure and maintain environments suitable for AI/ML workloads (GPU node pools, LLM integration, large model serving, high-performance storage)
  • Act as the primary escalation point for deployment failures, network and Infra issues
  • Perform Root Cause Analysis (RCA)
  • Implement 'Secure by Design' principles
  • Good knowledge of network security, identity and privilege access management, landing zone concepts for cloud platforms (Azure, AWS)
  • Maintain and support secondary environments in AWS (and potentially Azure)
  • Deep expertise in GCP (Compute Engine, GKE, Cloud Storage, IAM)
  • Strong working knowledge of AWS (EC2, EKS, S3, IAM)
  • Knowledge of using various programming languages (Python required, knowledge of Java, C#, JavaScript is a plus)
  • Advanced proficiency in Kubernetes
  • Ability to write and manage custom Helm charts
  • Experience with Ingress Controllers (Nginx), Service Mesh, and Autoscaling (HPA/VPA/Cluster Autoscaler)
  • Expert-level knowledge of GitLab CI/CD (writing .gitlab-ci.yml, runners, artifacts, caching)
  • Understanding GitOps principles
  • Strong hands-on experience with Terraform for provisioning cloud resources across multiple environments (Dev/Stage/Prod)
  • Proficiency in Bash/Shell scripting and Python
  • Strong Linux administration skills
  • Experience setting up monitoring and using Cloud Native tools, Prometheus, and Grafana
  • Experience with Azure Cloud infrastructure
  • Knowledge of Identity Providers (Keycloak, Azure AD/Entra ID) and OIDC integration
  • Experience with Service Mesh
  • Understanding of ITIL processes (Incident/Change Management) and tools like ServiceNow, JIRA
  • Basic understanding of Python/Flask/Fast API applications to assist developers in troubleshooting

Company Overview

  • Diverse Lynx is a WBENC- and NMSDC-certified partner, helping organizations turn diversity goals into measurable impact through staffing and contingent workforce solutions. It was founded in 2002, and is headquartered in Princeton, New Jersey, US, with a workforce of 1001-5000 employees. Its website is http://www.diverselynx.com.
  • Company H1B Sponsorship

  • Diverse Lynx has a track record of offering H1B sponsorships, with 1 in 2024, 1 in 2021. Please note that this does not guarantee sponsorship for this specific role.
  • Apply To This Job

    More jobs

    [Remote] Senior Product Designer

    Worldwide Salaried

    [Remote] Senior Full Stack Engineer

    Worldwide Salaried

    [Remote] Creative Content Development Copywriter

    Worldwide Salaried

    [Remote] Sr Data Protection Engineer

    Worldwide Salaried

    [Remote] Senior Fullstack Software Engineer

    Worldwide Salaried

    [Remote] Associate Director, Clinical Operations Standards and Planning

    Worldwide Salaried

    [Remote] Microsoft Security Engineer-Client Consulting

    Worldwide Salaried

    [Remote] Account Manager - Albertsons NorCal

    Worldwide Salaried

    [Remote] Staff Engineer, Software

    Worldwide Salaried

    [Remote] Customer Success Team Lead

    Worldwide Salaried

    Associate Dean for Enrollment & Student Services (Enrollment Management & Student Affairs)

    Worldwide Salaried

    YouTube Video Editor Needed for 3D Printing / Maker Channel — 12-Video Batch Project

    Worldwide Salaried

    Remote B2B Customer Service Support Representative – Business Wireless Solutions & Account Management Specialist at arenaflex

    Worldwide Salaried

    [Remote] Engineer/Senior Engineer - Perception Capabilities

    Worldwide Salaried

    Property Manager​/Halls of Brookfield & Weatherby Ridge

    Worldwide Salaried

    Critical Incidents Specialist (REMOTE)

    Worldwide Salaried

    Experienced Part-Time Remote Customer Service Representative – Streaming Entertainment Expert

    Worldwide Salaried

    Lead Backend Engineer

    Worldwide Salaried

    Experienced Full Stack Customer Support Specialist – Remote Home Advisor Role at arenaflex

    Worldwide Salaried

    Chief Operating Officer - Start-up / AI

    Worldwide Salaried