Back

Infrastructure/GPU Engineer

Worldwide Salaried Open

Cognizant is seeking a highly skilled hands-on Infrastructure Engineer with proven experience in the physical and technical deployment of AI-ready environments optimized for AI and machine learning workloads. This role focuses on NVIDIA DGX or similar systems, GPU-accelerated compute clusters, high-speed networking, and scalable storage solutions. The ideal candidate will have deep expertise in infrastructure design ,deployment, workload orchestration, and performance optimization in enterprise environments.

This is a remote role in the US. Salary range for this role is between $99,000 and $116,000 depending on skills and qualifications of the candidate. Applications will be accepted till 10/21/2025.

Key Responsibilities

System Design & Deployment

  • Help in rightsizing GPU investment 
  • Architect and deploy NVIDIA DGX systems and GPU-based compute clusters.
  • Design and implement scalable parallel filesystems (e.g., Lustre, BeeGFS, GPFS).
  • Integrate high-speed interconnects using InfiniBand, RoCE, and RDMA.
  • Collaborate on rack planning and airflow optimization.

Cluster & Infrastructure Management

  • Configure and manage Slurm Workload Manager for job scheduling.
  • Deploy and maintain cluster orchestration tools
  • Automate provisioning using PXE boot, Terraform, Redfish, and Kubernetes.
  • Perform firmware updates, BIOS/IPMI/BMC configuration, and OS provisioning
  • Knowledge of Run.ai, ClearML or similar platform 

Networking & Performance Optimization

  • Design and validate network topologies including IPMI, internal/external networks, and InfiniBand fabrics.
  • Optimize RDMA and RoCE configurations for low-latency, high-throughput data transfers.
  • Conduct performance benchmarking using GPU-Burn, NCCL, and NVSM.

Monitoring & Troubleshooting

  • Implement system health checks and diagnostics across compute, storage, and network layers.
  • Troubleshoot hardware/software issues and ensure reliable infrastructure operation.

Required Skills & Qualifications

Technical Expertise

  • Deep understanding of NVIDIA DGX architecture, CUDA, and GPU compute.
  • Strong Linux system administration and shell scripting skills.
  • Experience with Slurm, parallel filesystems, and high-speed networking (InfiniBand/RDMA/RoCE).
  • Familiarity with containerization (Docker), orchestration (Kubernetes), and automation tools (Ansible, Redfish).

Preferred Qualifications

  • Experience with BBCM, and DGX BasePOD/SuperPOD configuration

Certifications by Nvidia or equivalent OEM.

Apply To This Job

More jobs

Senior Quality Assurance Engineer

Worldwide Salaried

Data Exchange Platform Developer

Worldwide Salaried

Digital Marketing Manager, Demand Generation

Worldwide Salaried

Product Owner – Customer Self-Service Portals

Worldwide Salaried

Java Developer

Worldwide Salaried

Digital Marketing Manager, Web & Digital Experience

Worldwide Salaried

Digital Marketing Manager, Social Media

Worldwide Salaried

Denials Recovery Analyst

Worldwide Salaried

Public Policy Manager

Worldwide Salaried

Account Executive (German speaking)

Worldwide Salaried

Contingent/PRN Home Care & Hospice Virtual Care Coordinator RN

Worldwide Salaried

Experienced Part-Time Remote Data Entry Associate – Flexible Work Schedule at arenaflex

Worldwide Salaried

Experienced Customer Support Specialist - Samsung Team at arenaflex

Worldwide Salaried

Benefits and Payroll Manager

Worldwide Salaried

Remote Data Entry Specialist – Entry-Level Computer Operations & Information Management at arenaflex

Worldwide Salaried

Apply Now: Pickers / Medium Lifter

Worldwide Salaried

Experienced Remote Data Entry Clerk - Amazon's Virtual Team - No Prior Experience Needed - Flexible Hours & Comprehensive Training Provided

Worldwide Salaried

[Work From Home] Weekend / Night shift Telemedicine Physician

Worldwide Salaried

Remote Customer Service Representative – Healthcare Patient & Provider Support (Full‑Time, Immediate Hire, Work‑From‑Home)

Worldwide Salaried

Trainer, Clinical Solutions (FT/REMOTE) (RN License Required)

Worldwide Salaried