research

Everything on Ground Truth tagged “research” — 40 items.

The best AI agents still fail most real, long computer tasks News

A wave of new benchmarks agrees on an uncomfortable result: even top models finish only a small slice of realistic, multi-hour computer and coding jobs.

Meta reads full sentences from brain waves - without surgery News

A new version of Meta's brain-to-text system decodes typed sentences from magnetic brain signals far more accurately than before, closing much of the gap with implanted electrodes.

Knowing when to quit is a skill AI agents badly lack News

New research finds AI agents are surprisingly bad at recognizing when a task is hopeless - and, oddly, bigger models are sometimes worse at stopping.

Anthropic's Claude Science puts a whole lab bench inside the AI News

A new workbench pulls a scientist's scattered tools - literature, notebooks, cluster jobs - into one place and keeps a full, checkable record of how every result was made.

A 35-billion-parameter agent that punches like a trillion-parameter model News

Shanghai AI Lab argues you can reach giant-model performance on long tasks not by adding parameters, but by training on much longer chains of real work.

This AI predicts how objects move by tracking shapes, not pixels News

PhysiFormer forecasts physical motion as real 3D meshes in space - and recovers rigidity and momentum without anyone hand-coding the laws of physics.

Image generators can't plan. This one bolts on a brain that can. News

Qwen-Image-Agent wraps planning, reasoning, and memory around a text-to-image model so it can break a hard request into steps - and the local-AI crowd immediately asked whether it runs on a gaming GPU.

An AI's hallucinations turned out to be a map with blank spots News

Researchers showed that when a world-model AI imagines impossible futures, it's usually in places it barely saw in training - and that you can predict and fix those blind spots cheaply.

AI video has a consistency problem. This model targets it. News

DomainShuttle goes after the tug-of-war in subject-driven text-to-video: keeping a specific character or object recognizable across frames while still letting the scene move freely.

A wave of new methods trains AI without a human answer key News

Several research groups landed on the same idea at once - improve a model by learning from its own attempts instead of expensive human labels - and the field is debating whether it really removes the labeling burden or just hides it.

Why teaching AI agents to use tools keeps blowing up in training News

A new paper pins the sudden collapse of multi-step tool-use training on runaway probabilities in a few control tokens, and shows that mixing in supervised examples stabilizes it.

Why making an AI think out loud helps it remember facts, even nonsense thinking News

Google Research found that reasoning traces help a model recall facts partly just by buying it extra computation, so even repeating 'let me think' helps, though hallucinated steps backfire.

A huge study finds AI is more persuasive than trained, paid human experts News

Across nearly 19,000 conversations, AI outargued incentivized human experts and raised real donations far more effectively, but its edge collapsed when slowed to human speed.

When AI safety training withholds what could help you News

A pre-registered study finds heavily safety-trained models give doctors medical information they refuse to give ordinary people, with identical facts.

What should an AI agent remember about you, and what leaks when it does? News

Researchers are asking whether AI agents are ready for real long-term memory, just as another study shows how much an agent's memory can quietly give away about the people it served.

What does your AI actually remember about you? News

Two new studies stop trusting that agent 'memory' works and start measuring it directly, with results that carry a privacy sting.

One model that listens, sees, and talks back in real time News

Wan-Streamer collapses the usual chain of separate speech and video tools into a single model built for live, two-way conversation.

NVIDIA shrinks video generation down to real time News

A new NVIDIA recipe distills slow video-generating AI into a fast version that can stream frames live and react to your actions.

Anthropic's own data says the best coders gain the most from AI News

By studying hundreds of thousands of real coding sessions, Anthropic found that experienced engineers get more out of AI assistants, not less, a direct challenge to the idea that AI levels the playing field.

A safety switch an AI agent can't reach News

Researchers propose putting an agent's safety controls outside the agent itself, so a misbehaving AI structurally cannot turn them off.

A language model that writes by erasing, and now keeps up with the classics News

Almost every chatbot writes one word at a time, left to right. A newly released model of real size writes the way image AIs paint, refining a whole passage at once, and finally holds its own.

A language model that doesn't write left to right News

iLLaDA is an 8-billion-parameter model that generates text by refining a blurry whole rather than one word at a time, and it's catching up to the mainstream.

This model's job is to make better training data for other models News

DataClaw0 turns the grind of cleaning and labeling training data into a learned skill -- a small model that refines raw, messy multimodal streams into dense, purpose-built lessons.

Sometimes the AI Knew the Better Answer a Few Layers Early News

A new paper finds that a model's final layer can actually muddy an answer its middle layers had right -- and that reading the answer out a little early can claw back ability lost to safety training.

DeepMind Sketches Four Roads From Human-Level AI to Superintelligence News

A new report from senior DeepMind researchers lays out four ways AI could push past human-level ability -- and argues the leap is more likely to be a steady climb than a single dramatic jump.

Can an AI agent match real published science? A new test says: rarely News

NatureBench pits coding agents against the published state-of-the-art from Nature-family papers. Even the best agents beat the bar on a small minority of tasks -- mostly by reframing, not inventing.

Can an AI Agent Reproduce Real Science? A New Test Says: Rarely News

A new benchmark points coding agents at the actual computational results behind ninety papers in top journals. The strongest models matched the published science on fewer than one in five.

An open project publishes the recipe for training capable AI agents News

OpenThoughts-Agent releases its full data-curation pipeline, dataset, and experiments -- showing that what an agent learns from matters more than raw size, and letting anyone reproduce it.

Alibaba's new models let AI agents practice in a world they imagine News

Qwen-AgentWorld trains a model to simulate the environment an agent acts in, then uses that simulation as a cheap, controllable place to learn -- reporting gains beyond training in the real thing.

AI Agents Are Learning to Build the Worlds They Train In News

Three new open research projects point the same way: instead of only learning what to do, agents are learning to simulate the environment itself, so they can practice in their own imagination.

A small but elegant idea: putting 'experts' inside the attention layer News

Grouped Query Experts brings the mixture-of-experts trick into attention, activating only half a model's query heads per token while matching the full version -- at least at small scale.

A Classic Efficiency Trick Just Moved Into a New Part of the AI News

For years, the committee-of-specialists design that keeps big models fast lived in one layer of the network. A clean new result shows it works in the attention layer too, halving some of the work for free.

A big study finds AI more persuasive than professional human persuaders News

Across roughly nineteen thousand real conversations, AI systems drove far more charitable donations than trained human canvassers -- shifting the question to 'on whose behalf.'

Researchers turn the internet's hobbyist art 'filters' into training fuel News

Cleanly separating 'what's in a picture' from 'what style it's in' usually needs scarce data. A new method mines the huge public library of community-made style add-ons instead.

An image generator that catches and corrects its own errors mid-draw News

Image-generating models often quietly break the very rule they were told to follow. A new method trains them to notice that error as they work and steer back on target.

AI builds a single 3D object that shows two different things from two angles News

A new training-free method generates 3D visual illusions — one sculpture that reads as completely different objects depending on where you stand — in minutes instead of hours.

A robot hand learns to open things by reasoning about touch, not video News

New research teaches multi-finger robot hands to manipulate things with moving parts — handles, drawers, hinges — by focusing on contact points, and stays steady even without touch sensors.

A 61-author paper argues AI leaderboards quietly mislead everyone News

A large industry-led study makes a blunt case: the rankings everyone cites to pick the 'best' AI agent don't survive contact with the real world.

DeerFlow Tool

ByteDance's open-source agent harness that breaks a long task into specialist sub-agents running in parallel, executes code safely in sandboxes, keeps memory across sessions, and produces reports, slides, and pages; built on LangChain and works with multiple model providers.

Claude Science Tool

An AI workbench that unifies literature search, notebooks, statistics, and cluster compute, and keeps a reproducible record behind every figure. Beta on Mac and Linux.