News · 2026-06-29

NVIDIA's new method stops AI dream-worlds from breaking the laws of physics

Researchers including NVIDIA have introduced PhysisForcing, a training method that makes video-generating 'world models' obey physics far more reliably. The goal is to turn AI-generated video from something that merely looks plausible into something a robot can actually trust enough to plan with.

The background: a world model is an AI that learns how an environment behaves, so it can imagine what will happen next. A promising version uses video generation - the model literally generates a short clip of a predicted future, like a robot daydreaming the next few seconds before it acts. The trouble is that video generators are trained to make footage that looks convincing, not footage that is physically correct. So they hallucinate: an object being grasped quietly changes shape, a hand passes through a surface, two things touch and the result makes no physical sense. A movie that looks great but breaks the rules of reality is useless as a planning tool, because the robot would be planning around events that can't actually occur.

PhysisForcing's approach is to diagnose precisely where the physics breaks and aim the training there. The researchers traced two main culprits: moving objects deforming in impossible ways, and implausible correlations between things over space and time - especially at the moment of contact, when one object meets another. They then added two targeted training signals. The first, a pixel-level trajectory alignment loss, watches reference points on objects and forces the model's internal features to keep their motion consistent and smooth, so objects move like solid bodies rather than melting blobs. The second, a semantic-level relational alignment loss, uses a separate frozen video-understanding model as a referee to keep the relationships between objects coherent - so when two things interact, the interaction stays believable. The key idea is to concentrate the supervision on the 'physics-informative regions,' the parts of the frame where physics actually matters, rather than spreading effort evenly across every pixel.

The analogy: imagine teaching an animator who draws gorgeous frames but keeps letting characters' hands pass through tables. Instead of critiquing every line, you put two coaches on the specific failures - one watching that objects keep their shape as they move, one watching that contacts between objects look real. The drawings stay beautiful but stop breaking physics.

The results back it up. Across several benchmarks for embodied video generation, PhysisForcing consistently improved the base models. More tellingly, when it was plugged into a system where a robot uses the world model to plan and then act, the rate at which the full loop succeeded climbed from about one in six attempts to roughly one in four, with downstream improvements in actual robot manipulation. Physically honest imagination, in other words, makes for better planning.

Why it matters: world models are one of the most active frontiers in AI right now, seen as a path toward robots and agents that can reason about the physical world rather than just react to it. But a simulator you can't trust is worse than no simulator. PhysisForcing pairs naturally with another recent finding - that world-model hallucinations cluster in the gaps of a model's training data - giving researchers both a way to make the physics better and a way to predict where it'll still go wrong.

The honest caveat is in those numbers. Going from one-in-six to one-in-four is real, meaningful progress - but it still means the imagined plan fails three times out of four. 'Physically plausible' is also measured on benchmarks that only approximate true physics, so the model is being graded against an imperfect rulebook. World-model-driven robotics is clearly improving; it is nowhere near solved.

Primary source, verified: read the paper → (arXiv 2606.28128)