A senator says a banned AI broke into nearly all NSA systems in hours
New testimony reframes the Mythos export ban: a top general reportedly told a senator the model breached almost all classified systems in a red-team test, not in weeks but in hours.
Alibaba's new models let AI agents practice in a world they imagine
Qwen-AgentWorld trains a model to simulate the environment an agent acts in, then uses that simulation as a cheap, controllable place to learn -- reporting gains beyond training in the real thing.
This model's job is to make better training data for other models
DataClaw0 turns the grind of cleaning and labeling training data into a learned skill -- a small model that refines raw, messy multimodal streams into dense, purpose-built lessons.
An open project publishes the recipe for training capable AI agents
OpenThoughts-Agent releases its full data-curation pipeline, dataset, and experiments -- showing that what an agent learns from matters more than raw size, and letting anyone reproduce it.
Uber reportedly burned through its whole 2026 AI coding budget in four months
The clearest enterprise cost figure yet for AI coding agents: Uber's CTO is reported to have said the company exhausted its Claude Code budget in a third of the year.
A small but elegant idea: putting 'experts' inside the attention layer
Grouped Query Experts brings the mixture-of-experts trick into attention, activating only half a model's query heads per token while matching the full version -- at least at small scale.
Anthropic gives AI agents their own work accounts, not yours
Anthropic's new 'agent identity' model lets Claude agents hold their own scoped accounts for tools like GitHub and Slack, tied to channels -- instead of borrowing a human employee's login.
Can an AI agent match real published science? A new test says: rarely
NatureBench pits coding agents against the published state-of-the-art from Nature-family papers. Even the best agents beat the bar on a small minority of tasks -- mostly by reframing, not inventing.
Google promised Gemini 3.5 Pro in June. June is almost over.
Google said its next flagship would arrive in June; with days left it's still limited preview. The timing is awkward -- it overlaps a gap where another Western flagship is also unavailable.
An AI Reportedly Broke Into Nearly All of the NSA's Classified Systems in Hours
A senator says the head of the NSA told him a top AI model walked through almost all of America's classified systems in hours during a controlled test, reframing last week's government shutdown of the model.
AI Agents Are Learning to Build the Worlds They Train In
Three new open research projects point the same way: instead of only learning what to do, agents are learning to simulate the environment itself, so they can practice in their own imagination.
Microsoft's CEO Says the AI Industry Has Not Earned the Right to Do This
In a Wall Street Journal interview, Satya Nadella named OpenAI and Anthropic -- two companies Microsoft has poured billions into -- and warned that an economy reshaped by a handful of AI models will not survive politically.
A Coding AI Ran Through Uber's Yearly Budget in Four Months
Uber gave Claude Code to about 5,000 engineers, who loved it. By April the company had burned through its entire 2026 AI budget, exposing how badly old software pricing fits new agent tools.
A Classic Efficiency Trick Just Moved Into a New Part of the AI
For years, the committee-of-specialists design that keeps big models fast lived in one layer of the network. A clean new result shows it works in the attention layer too, halving some of the work for free.
Can an AI Agent Reproduce Real Science? A New Test Says: Rarely
A new benchmark points coding agents at the actual computational results behind ninety papers in top journals. The strongest models matched the published science on fewer than one in five.
Anthropic Gives Its AI Agents Their Own Logins, Not Yours
As AI agents start working in teams alongside people, the old 'the bot acts as you' model breaks down. Anthropic's answer: give each agent its own scoped account in every system it touches.
The Model Ban Is Quietly Redrawing the AI Map
Two weeks after the US pulled its top models off the market, a Chinese open model sits atop the global download charts and the community is busy rebuilding the banned capability in the open.
DeepMind Sketches Four Roads From Human-Level AI to Superintelligence
A new report from senior DeepMind researchers lays out four ways AI could push past human-level ability -- and argues the leap is more likely to be a steady climb than a single dramatic jump.
Samsung Banned ChatGPT in 2023. Now It's Giving It to 125,000 Workers.
After barring ChatGPT over a data leak three years ago, Samsung has reversed course and rolled OpenAI's enterprise tools out across its workforce -- a vivid sign that the corporate holdouts are capitulating.
Sometimes the AI Knew the Better Answer a Few Layers Early
A new paper finds that a model's final layer can actually muddy an answer its middle layers had right -- and that reading the answer out a little early can claw back ability lost to safety training.