Back

Sr Manager, Cloud Infrastructure Engineer, Scientific Computing and HPC

Worldwide Salaried Open

About the position ROLE SUMMARY Pfizer's committed to the application of computational science in the areas of drug discovery and development. As part of this mission, we have recently embarked on a large-scale migration of our computational infrastructure to cloud. This role leverages extensive experience in cloud engineering and DevOps and requires a hands-on approach to designing and delivering robust High Performance Computing (HPC) solutions supporting computational workloads across the organization. We are seeking an experienced individual to drive architecture, infrastructure automation, migration and operational excellence. You will collaborate with HPC engineers and scientific computing specialists to develop scalable cloud native infrastructure that underpins modernization of the scientific computing platform. ROLE RESPONSIBILITIES Platform Architecture and Engineering In this role you will design, implement, operate, and own robust and dependable infrastructure for HPC and ML/AI workloads in a cloud environment (AWS/GCP). Lead containerization, deployment, and operation of user- and admin-facing HPC platforms (Slurm, Open On Demand, Prometheus/Grafana, batch and distributed computing platforms) across cloud environments. Translate stakeholder input into robust, high-performance, scalable, cost effective computing platforms. Partner with HPC specialists (engineers, administrators, and users) to capture institutional knowledge and manual processes in IaC workflows, transforming ad-hoc deployment practices into reproducible, version-controlled, automated procedures. Automation and DevOps Develop and maintain infrastructure automation using IaC tools like Terraform and CloudFormation to ensure repeatable environment provisioning and scaling. Create reusable Terraform modules. Develop and enforce standards. Be a driver for implementing and maintaining all cloud infrastructure using IaC tools. Operationalize containerized solutions using Docker and Kubernetes. Own the full lifecycle of infrastructure management, from provisioning to operations, support, updating, and teardown of production computing platforms. Perform troubleshooting, system analysis, and benchmarking to resolve issues and maintain a high-performance environment. Monitoring and Reliability Develop and maintain monitoring, logging, and alerting for the infrastructure (e.g., CloudWatch, Prometheus/Grafana). Design new dashboards, workflows, and utilities to improve observability, cost monitoring, workload efficiency, user, or administration experience. Document architecture, deployment processes, and operational procedures. Partner closely with team members to support delivery of scientific computing services including user support, Linux administration, operations, job scheduling, application management, and resource optimization.

Responsibilities

  • Design, implement, operate, and own robust and dependable infrastructure for HPC and ML/AI workloads in a cloud environment (AWS/GCP).
  • Lead containerization, deployment, and operation of user- and admin-facing HPC platforms (Slurm, Open On Demand, Prometheus/Grafana, batch and distributed computing platforms) across cloud environments.
  • Translate stakeholder input into robust, high-performance, scalable, cost effective computing platforms.
  • Partner with HPC specialists (engineers, administrators, and users) to capture institutional knowledge and manual processes in IaC workflows, transforming ad-hoc deployment practices into reproducible, version-controlled, automated procedures.
  • Develop and maintain infrastructure automation using IaC tools like Terraform and CloudFormation to ensure repeatable environment provisioning and scaling.
  • Create reusable Terraform modules.
  • Develop and enforce standards.
  • Be a driver for implementing and maintaining all cloud infrastructure using IaC tools.
  • Operationalize containerized solutions using Docker and Kubernetes.
  • Own the full lifecycle of infrastructure management, from provisioning to operations, support, updating, and teardown of production computing platforms.
  • Perform troubleshooting, system analysis, and benchmarking to resolve issues and maintain a high-performance environment.
  • Develop and maintain monitoring, logging, and alerting for the infrastructure (e.g., CloudWatch, Prometheus/Grafana).
  • Design new dashboards, workflows, and utilities to improve observability, cost monitoring, workload efficiency, user, or administration experience.
  • Document architecture, deployment processes, and operational procedures.
  • Partner closely with team members to support delivery of scientific computing services including user support, Linux administration, operations, job scheduling, application management, and resource optimization.

Requirements

  • B.S. in computer science, life science, data science or similar fields.
  • 6+ years of experience in cloud infrastructure engineering with a proven track record of developing and supporting robust IaC deployments.
  • Experience managing scientific computing workloads in an enterprise environment.
  • Advanced experience with at least one of AWS and GCP, including knowledge of core compute and storage services relevant to HPC.
  • Solid understanding of cloud networking, identity, and security controls.

Nice-to-haves

  • Prior experience with HPC deployment utilities including AWS ParallelCluster, AWS Parallel Computing Services, and Google Cloud Cluster Toolkit.
  • Proficiency with distributed computing environments, especially EKS/GKE/Kubernetes.
  • Familiarity with HPC environments, job schedulers (Slurm), HPC application containers (Docker, Singularity, Apptainer) and NVIDIA GPU computing.
  • Candidate demonstrates a breadth of diverse leadership experiences and capabilities including: the ability to influence and collaborate with peers, develop and coach others, oversee and guide the work of other colleagues to achieve meaningful outcomes and create business impact.

Benefits

  • participation in Pfizer’s Global Performance Plan with a bonus target of 17.5% of the base salary and eligibility to participate in our share based long term incentive program
  • 401(k) plan with Pfizer Matching Contributions and an additional Pfizer Retirement Savings Contribution
  • paid vacation, holiday and personal days
  • paid caregiver/parental and medical leave
  • health benefits to include medical, prescription drug, dental and vision coverage

Apply tot his job Apply To this Job

More jobs

Cloud Operations Engineer II - US REMOTE

Worldwide Salaried

Cloud Operations Engineer II – US REMOTE

Worldwide Salaried

[Remote] Senior Azure Cloud, Security & AI Operations Engineer

Worldwide Salaried

ML/Ops Engineer with strong Azure cloud experience Remote Position Duration: 12+ months Role Overv

Worldwide Salaried

Cloud Cyber Security Consultant – Work Remotely

Worldwide Salaried

Remote Platform Professional Services Consultant – Identity Solutions & Cloud Security Deployment Specialist

Worldwide Salaried

Associate Cloud Operations Technician

Worldwide Salaried

[Remote] M365 Cloud Security Engineer- Remote (Anywhere in the U.S.)

Worldwide Salaried

Cloud Security Engineer (Remote) – Revenue Solutions Inc – Roseville, CA

Worldwide Salaried

Cloud Security Analyst (Remote)

Worldwide Salaried

Experienced Full Stack Software Engineer – Web & Cloud Application Development

Worldwide Salaried

Legal Talent Recruiter - Texas

Worldwide Salaried

Customer Service Representative

Worldwide Salaried

Experienced Digital Campaign Manager – Leading Client Services and Delivering Exceptional Results in the Fast-Paced World of arenaflex Advertising

Worldwide Salaried

Owner | Upto $60/hr Hourly

Worldwide Salaried

Lead Enterprise Sales Recruiter (Contractor)

Worldwide Salaried

Proposal Writer/Manager

Worldwide Salaried

Experienced and Compassionate Kindergarten Teaching Assistant – Grant Funded, One Year Position

Worldwide Salaried

Elementary Special Education Teacher job at Pearson Virtual Schools in OH

Worldwide Salaried

Experienced Digital Marketing Coordinator for Disney+ – Web & Social Media Campaign Development

Worldwide Salaried