[Remote] Software Engineer - GPU Kernels
Note: The job is a remote job and is open to candidates in USA. Baseten is an innovative company powering AI solutions for leading firms like Notion and OpenEvidence, and they are seeking a GPU Kernel Engineer to enhance AI model performance. This role focuses on designing high-performance GPU kernels and optimizing computation for machine learning operations, directly impacting production systems for millions of users.
Responsibilities
- Design and implement high-performance GPU kernels for key ML operations, including matrix multiplications, attention mechanisms, and mixture-of-experts routing
- Write and optimize code using CUDA, PTX assembly, and architecture-specific techniques
- Apply advanced performance optimization methods such as memory coalescing, warp-level programming, tensor core acceleration, and compute/memory overlap
- Implement cutting-edge features like quantization (FP8/FP4), sparsity, and compute/communication overlap
- Identify and resolve performance bottlenecks using tools like Nsight Systems, Nsight Compute, and Torch Profiler
- Collaborate with research teams to productionize theoretical advancements
- Contribute to internal and open-source GPU libraries
- Present technical contributions at industry conferences (e.g., NVIDIA GTC, AWS re:Invent)
Skills
- Strong understanding of GPU architecture and programming paradigms: Memory hierarchy (global, shared, registers, L1/L2 cache), Thread/block/grid organization, Synchronization techniques and race condition mitigation
- Proficient in C++ and GPU performance profiling tools
- Knowledge of: CUDA C++ API, Memory access patterns and bandwidth optimization, Numerical precision and quantization strategies, Modern GPU features (e.g., tensor cores, async operations)
- Experience with Transformer models and attention optimization (e.g., Flash Attention)
- Familiarity with GPU kernel libraries: Cutlass, Triton, Thrust, CUB
- Background in GEMM tuning and distributed/multi-GPU compute
- Contributions to open-source GPU projects
- Research publications or conference presentations on GPU performance
Benefits
- Competitive compensation, including meaningful equity.
- 100% coverage of medical, dental, and vision insurance for employee and dependents
- Flexible PTO policy including company wide Winter Break (our offices are closed from Christmas Eve to New Year's Day!)
- Paid parental leave
- Fertility and family-building stipend through Carrot
- Company-facilitated 401(k)
- Exposure to a variety of ML startups, offering unparalleled learning and networking opportunities.
Company Overview
Company H1B Sponsorship