Research Scientist

Worldwide Salaried Open

Active Inference Benchmarking Researcher

Description

Overview

Contribute to the design, implementation, and evaluation of benchmarking frameworks for uncertainty-aware autonomy—specifically active inference—within a teleoperation-augmented robotics platform. This role focuses on quantifying how probabilistic decision-making improves human-in-the-loop scalability, safety under uncertainty, and autonomous productivity across real-world robotic systems.

Key Responsibilities

1. Active Inference Benchmark Design & Execution

Co-design and implement benchmarking protocols comparing active inference agents to:
Conventional reinforcement learning (RL) baselines
RL systems augmented with uncertainty estimation
Evaluate performance across:
Data efficiency
Safety under distribution shift
Directed exploration
Sim-to-real robustness
Teleoperation scaling efficiency
Explainability

2. Teleoperation-Aware Evaluation Framework

Integrate benchmarking into a standardized teleoperation control protocol where agents decide when to:
Continue autonomous execution
Request human takeover under a constrained intervention budget
Develop metrics capturing:
Human scalability (operator-to-robot ratio, intervention allocation efficiency)
Safety under uncertainty (timeliness and selectivity of handovers)
Autonomous work efficiency (task completion under limited supervision)

3. Platform Integration (Teleoperation Stack)

Align benchmarking workloads with the broader teleoperation platform architecture:
On-robot control and safety systems
Near-edge inference (uncertainty estimation, planning, intervention logic)
Cloud-based training, analytics, and fleet management
Ensure benchmarks reflect real system constraints:
Latency budgets
Network degradation and connectivity loss
Multi-robot resource sharing

4. Embodiment Ladder Evaluation

Execute experiments across a staged pipeline:
Tier 1: Controlled simulation (e.g., MuJoCo environments)
Tier 2: High-fidelity robotic simulation (e.g., RLBench, ManiSkill)
Tier 3: Real-world or dataset-driven validation
Maintain consistency via a shared teleoperation surrogate (expert policy / planner) to emulate human intervention

5. Uncertainty & Intervention Analysis

Quantify and analyze:
Calibration of uncertainty signals
Intervention precision/recall
Learning from intervention (post-handover improvement)
Stability across repeated autonomy–human control cycles
Compare whether:
Native probabilistic approaches (active inference)
Retrofitted uncertainty (ensembles, Bayesian heads, etc.)
Heuristic baselines
best optimize teleoperation efficiency

6. Systems & Scaling Insights

Profile compute and system behavior of active inference workloads within the teleoperation stack:
World model rollouts
Posterior inference
Intervention decision logic
Contribute to:
Near-edge workload allocation strategies
Fleet scaling models (robots per server)
Latency vs. safety tradeoffs

7. Deliverables

Reproducible benchmarking suite and datasets
Technical reports and whitepapers
Conference publications (robotics / ML / systems venues)
Design recommendations for teleoperation and autonomy stacks
Cross-team guidance for infrastructure, controls, and ML teams

Success Criteria

Demonstrated improvement in intervention efficiency vs. safety tradeoff
Measurable gains in operator scaling (robots per human)
Robust performance under distribution shift and real-world noise
Clear evidence of when and why uncertainty-aware methods outperform baselines

About the Company

Noumenal Labs is a deep tech AI company closing performance gaps in outdoor robotics. Our uncertainty-aware systems learn and adapt in real time, positioning Noumenal as a core software layer for next-generation robotic hardware operating in uncharted domains. Apply To This Job

Apply now

Research Scientist

Description

Overview

Key Responsibilities

1. Active Inference Benchmark Design & Execution

2. Teleoperation-Aware Evaluation Framework

3. Platform Integration (Teleoperation Stack)

4. Embodiment Ladder Evaluation

5. Uncertainty & Intervention Analysis

6. Systems & Scaling Insights

7. Deliverables

Success Criteria

About the Company

More jobs

Sustainment Business Case Analysis Senior Consultant

Engineer II Process

Director of Political Education and Convening

Sales Development Representative Intern

International Project Manager - EMEA

Account Manager

Manager of the Account Management Departement

Regulatory Affairs / Quality Specialist Medical Device

Online German Language Teacher

Data Specialist

Dynamic Live Chat Specialist – Real‑Time Customer Support & Engagement for Electrical/Electronic Manufacturing at arenaflex

Need a Gig? Deliver with Gopuff!

Investment Data Scientist

[Remote] Engineering Manager - Firmware & Board Support Packages (BSP)

Supervisor, Utilization Management

Director, Paid Search

Experienced Remote Live Chat Support Specialist – Flexible Part-Time Opportunities for Customer Service Enthusiasts, $25-$35/HR

Experienced Remote Roadside Assistance, Customer Service Representative – Crisis Resolution and Support

Experienced Data Entry Specialist – Virtual Opportunity with arenaflex

[Remote] Associate Territory Manager, Middle Market Business Development - Commercial Lines (Commercial Insurance Production Underwriter - Pacific Northwest)