Back

Research Scientist

Worldwide Salaried Open

Active Inference Benchmarking Researcher

Description

Overview

Contribute to the design, implementation, and evaluation of benchmarking frameworks for uncertainty-aware autonomy—specifically active inference—within a teleoperation-augmented robotics platform. This role focuses on quantifying how probabilistic decision-making improves human-in-the-loop scalability, safety under uncertainty, and autonomous productivity across real-world robotic systems.

Key Responsibilities

1. Active Inference Benchmark Design & Execution

  • Co-design and implement benchmarking protocols comparing active inference agents to:
  • Conventional reinforcement learning (RL) baselines
  • RL systems augmented with uncertainty estimation
  • Evaluate performance across:
  • Data efficiency
  • Safety under distribution shift
  • Directed exploration
  • Sim-to-real robustness
  • Teleoperation scaling efficiency
  • Explainability

2. Teleoperation-Aware Evaluation Framework

  • Integrate benchmarking into a standardized teleoperation control protocol where agents decide when to:
  • Continue autonomous execution
  • Request human takeover under a constrained intervention budget
  • Develop metrics capturing:
  • Human scalability (operator-to-robot ratio, intervention allocation efficiency)
  • Safety under uncertainty (timeliness and selectivity of handovers)
  • Autonomous work efficiency (task completion under limited supervision)

3. Platform Integration (Teleoperation Stack)

  • Align benchmarking workloads with the broader teleoperation platform architecture:
  • On-robot control and safety systems
  • Near-edge inference (uncertainty estimation, planning, intervention logic)
  • Cloud-based training, analytics, and fleet management
  • Ensure benchmarks reflect real system constraints:
  • Latency budgets
  • Network degradation and connectivity loss
  • Multi-robot resource sharing

4. Embodiment Ladder Evaluation

  • Execute experiments across a staged pipeline:
  • Tier 1: Controlled simulation (e.g., MuJoCo environments)
  • Tier 2: High-fidelity robotic simulation (e.g., RLBench, ManiSkill)
  • Tier 3: Real-world or dataset-driven validation
  • Maintain consistency via a shared teleoperation surrogate (expert policy / planner) to emulate human intervention

5. Uncertainty & Intervention Analysis

  • Quantify and analyze:
  • Calibration of uncertainty signals
  • Intervention precision/recall
  • Learning from intervention (post-handover improvement)
  • Stability across repeated autonomy–human control cycles
  • Compare whether:
  • Native probabilistic approaches (active inference)
  • Retrofitted uncertainty (ensembles, Bayesian heads, etc.)
  • Heuristic baselines
  • best optimize teleoperation efficiency

6. Systems & Scaling Insights

  • Profile compute and system behavior of active inference workloads within the teleoperation stack:
  • World model rollouts
  • Posterior inference
  • Intervention decision logic
  • Contribute to:
  • Near-edge workload allocation strategies
  • Fleet scaling models (robots per server)
  • Latency vs. safety tradeoffs

7. Deliverables

  • Reproducible benchmarking suite and datasets
  • Technical reports and whitepapers
  • Conference publications (robotics / ML / systems venues)
  • Design recommendations for teleoperation and autonomy stacks
  • Cross-team guidance for infrastructure, controls, and ML teams

Success Criteria

  • Demonstrated improvement in intervention efficiency vs. safety tradeoff
  • Measurable gains in operator scaling (robots per human)
  • Robust performance under distribution shift and real-world noise
  • Clear evidence of when and why uncertainty-aware methods outperform baselines

About the Company

Noumenal Labs is a deep tech AI company closing performance gaps in outdoor robotics. Our uncertainty-aware systems learn and adapt in real time, positioning Noumenal as a core software layer for next-generation robotic hardware operating in uncharted domains. Apply To This Job

More jobs

Sustainment Business Case Analysis Senior Consultant

Worldwide Salaried

Engineer II Process

Worldwide Salaried

Director of Political Education and Convening

Worldwide Salaried

Sales Development Representative Intern

Worldwide Salaried

International Project Manager - EMEA

Worldwide Salaried

Account Manager

Worldwide Salaried

Manager of the Account Management Departement

Worldwide Salaried

Regulatory Affairs / Quality Specialist Medical Device

Worldwide Salaried

Online German Language Teacher

Worldwide Salaried

Data Specialist

Worldwide Salaried

Dynamic Live Chat Specialist – Real‑Time Customer Support & Engagement for Electrical/Electronic Manufacturing at arenaflex

Worldwide Salaried

Need a Gig? Deliver with Gopuff!

Worldwide Salaried

Investment Data Scientist

Worldwide Salaried

[Remote] Engineering Manager - Firmware & Board Support Packages (BSP)

Worldwide Salaried

Supervisor, Utilization Management

Worldwide Salaried

Director, Paid Search

Worldwide Salaried

Experienced Remote Live Chat Support Specialist – Flexible Part-Time Opportunities for Customer Service Enthusiasts, $25-$35/HR

Worldwide Salaried

Experienced Remote Roadside Assistance, Customer Service Representative – Crisis Resolution and Support

Worldwide Salaried

Experienced Data Entry Specialist – Virtual Opportunity with arenaflex

Worldwide Salaried

[Remote] Associate Territory Manager, Middle Market Business Development - Commercial Lines (Commercial Insurance Production Underwriter - Pacific Northwest)

Worldwide Salaried