News · 2026-06-19

A robot that runs its own experiments — and sometimes fails when it matters

At NVIDIA's robotics research lab, a team has built something they call ENPIRE: a system in which an AI coding agent takes full control of a physical robotic arm, designs its own experiments, writes the code to run them, watches the robot execute, and decides whether each attempt succeeded. If it didn't, the agent revises its approach and tries again. No human in the loop during the experiment.

The most striking demo from the paper involves one of the tested agents — including, in some trials, Claude Code, Anthropic's coding assistant — physically picking up a graphics card and seating it into a motherboard's PCIe slot. This requires fine motor precision: the card has to be aligned, held at the right angle, and pressed with enough force to seat the connector without breaking it. The robot does this by itself, under agent direction.

The headline success figure deserves a careful reading, because it's the kind of number that tells a different story depending on how you read it. The reported rate — described as near-perfect across tasks — is measured with up to eight attempts per task. The robot tries something, fails, the system resets the workspace automatically, and the agent revises its approach and tries again. The per-attempt success rate on harder tasks is considerably lower. This matters for interpretation: "near-perfect success with up to eight tries" is a very different capability from "gets it right the first time." The near-perfect number measures retry-and-recovery robustness, which is valuable, but it's not the same as reliable single-shot execution.

The sim-to-real gap is also visible in the results. Two of the three agents tested struggled with a task when moved from a simulated physics environment to actual hardware. This gap — between how robots behave in clean simulation (where physics is idealized and repeatable) and how they behave on real hardware (where surfaces have friction, parts don't quite align as expected, and lighting varies) — is one of the oldest problems in robotics. ENPIRE doesn't solve it. The agents that worked well in simulation didn't all transfer cleanly to the physical robot.

What the paper contributes is a proof of concept for a particular research automation setup, with some components that are genuinely novel. The critical infrastructure pieces are: a robotic arm with a mounted camera, automated mechanisms for resetting the workspace between experiments (so the agent doesn't need a human to move things back to the starting state), and a vision-based success checker that uses a separate visual model to assess whether the robot completed the task. These three things together enable autonomous iteration — try, evaluate, reset, revise, repeat — at a pace no human-supervised experiment could match.

The authors note honestly that the automated reset and success verification are "still hand-built per task." To use ENPIRE for a new experiment, the NVIDIA team has to design a new reset mechanism specific to that experiment, and a new visual evaluation protocol specific to that task. Making these general rather than task-specific is the missing piece. A general-purpose reset and verification system — one that could work across arbitrary tabletop manipulation tasks without per-task engineering — would be the real unlock for open-ended robot self-improvement. Right now, what exists is a sophisticated framework for the tasks the team has already built infrastructure for.

The coding agents in ENPIRE are using off-the-shelf AI tools — they're doing parameter tuning, experiment selection, and code generation. They're not developing new learning algorithms or discovering new physics. That's still a significant capability: automated experiment management at the pace agents work could accelerate certain types of robotics research meaningfully. But it's closer to automated lab management than to the broader vision of a robot that improves itself through unconstrained open-ended exploration.

For the AI-interested observer, the GPU-insertion demo is a fair window into where physical AI is in 2026: impressive in carefully designed scenarios, still fragile when something unexpected changes, and requiring more tries than it looks like from the headline. Progress is real. The asterisks are also real, and they matter for calibrating expectations.

Primary source, verified: read the paper →