Ground Truth.
AI, checked against the source.
← earlier2026-06-192026-06-20 →

The hidden escape hatch in AI safety controls

2026-06-19

Researchers at Hong Kong Polytechnic University show that clamping an AI safety feature — like one that controls refusals — doesn't remove the behavior. It hides in the part of the model's internal state that the safety tool throws away, and can be recovered while the monitored feature looks perfectly controlled.

safety · interpretability · mechanistic-interpretability

Your AI judge might be reliable — and still be wrong

2026-06-19

The largest audit of AI language model judges to date — 21 judges, over half a million grading decisions — finds that standard reliability metrics are inflated by roughly a third, that the same judge can score differently on different benchmarks, and that high consistency and severe bias can coexist in the same system.

evaluation · llm-judges · rlhf · methodology

Turn the camera away, and the AI's world freezes

2026-06-19

A new benchmark tests whether video AI systems can track what happens to parts of a scene the camera isn't currently showing. Across 23 models, the answer is mostly no — and making the models larger made the problem worse, not better.

world-models · video-generation · robotics · benchmarks

A robot that runs its own experiments — and sometimes fails when it matters

2026-06-19

NVIDIA researchers gave AI coding agents full control of a physical robot lab — including automated reset and vision-based success checking. One agent inserted a graphics card into a motherboard. The headline success rate is real but requires a close read.

robotics · agents · automation · hardware

A tiny image-fixer keeps up with a model fifty times its size

2026-06-19

Filling in the missing parts of an image usually takes a huge model. This one is a small fraction of the size and far faster, yet matches a system far bigger than it.

image-generation · efficiency · inpainting

What if a word were a rotation? A more mathematical way to build AI

2026-06-19

A fresh, abstract idea: treat what a model attends to not as plain lists of numbers but as geometric moves like rotations — so useful symmetries come 'for free.' Elegant and early. (A deeper, technical read.)

architecture · theory · technical

Faster AI training by quietly cloning the model

2026-06-19

Teaching a model with rewards is slow because it has to write out endless practice answers. A new trick: make a cheap, shrunk-down copy of the model to crank those out faster.

rl-post-training · efficiency · training

An AI that could rewrite its own words — and gained nothing from it

2026-06-19

A different style of text AI can go back and change any word at any point as it writes. Given that power, it didn't actually produce better writing. A clean negative result.

diffusion · language-models · negative-result

Crediting an AI for the right steps — without a second model to judge them

2026-06-19

When you reward an AI for a good final answer, it's hard to know which of its steps earned the credit. The usual fix is training a second 'judge' model. This skips that.

rl-post-training · credit-assignment · training

Giving an AI real spatial tools instead of letting it guess

2026-06-19

Vision AIs are surprisingly bad at precise 'where is this in 3D space' questions. This one stops guessing and calls dedicated spatial tools, while keeping a memory across views.

agents · vision · spatial-reasoning

Do robots even need to imagine the movie?

2026-06-19

The common belief is that a robot needs to imagine a video of what happens next to plan. A new method says no — imagine a single still frame, and don't even fully draw it.

robotics · world-models · efficiency · vision

Reliable, and still wrong

2026-06-19

Using one AI to grade another is now common — but the biggest audit yet shows these graders are consistent without being correct. A judge that always picks "answer A" scores perfectly on consistency.

evaluation · llm-as-judge · benchmarks

A coding assistant ran a real robot

2026-06-19

An AI coding agent read the research, wrote the control code, watched it fail, and fixed it — seating a graphics card into a motherboard by itself. The honest catch: most of the success is retrying.

agents · robotics · coding-agents · autonomy

The little words that keep AI from getting boring

2026-06-19

Rewarding a reasoning model too hard makes it repetitive — and the casualties are tiny words like "but" and "instead" that let it branch to a better thought. A near-free fix protects them.

rl-post-training · reasoning · training

Turn around, and the world disappears

2026-06-19

AI video models that are supposed to "understand" a 3D scene only remember what's on screen — pan away and back, and things have reset. Bigger models are worse at it.

world-models · video-generation · robotics · benchmark

The safety switch that doesn't actually work

2026-06-19

A control that's supposed to force an AI to refuse harmful requests gets bypassed while it's switched on — the bad behavior hides in the part of the tool that gets thrown away.

interpretability · safety · sparse-autoencoders

← earlier2026-06-192026-06-20 →