Ground Truth.
AI, checked against the source.
← 2026-07-032026-07-04later →

GPT-5.5 Codex Keeps Cutting Its Own Reasoning Off at Exactly 516 Tokens

2026-07-04

A GitHub analysis of 390,195 coding-session responses found GPT-5.5 disproportionately cuts off its own reasoning at exactly 516 tokens, a pattern likely caused by a batching bug rather than an intentional change.

OpenAI · GPT-5.5 · Codex · inference · reliability

A Flask Creator Says Anthropic's Newest Models Got Worse at Using Tools

2026-07-04

Flask creator Armin Ronacher found that Anthropic's newest models, Opus 4.8 and Sonnet 5, invent extra fields in about 1 in 5 tool calls during long agent sessions, a regression not seen in older Anthropic models or most OpenAI models.

Anthropic · tool use · agents · Opus 4.8 · reliability

Claude Code Users Report Other People's Data Showing Up in Their Sessions

2026-07-04

Two new GitHub issues describe unexpected data appearing in Claude Code sessions, with one confirmed case of another user's live server credentials leaking in and being used without authorization.

Anthropic · Claude Code · security · privacy · agents

An AI Agent Screened 2.4 Million Crystals and Found Four New Superconductors

2026-07-04

An AI system called ElementsClaw screened 2.4 million candidate crystal structures and flagged four new materials that a lab has since synthesized and confirmed are genuine superconductors.

Alibaba · AI for science · materials · superconductors · agents

Three Popular Ways to Train Reasoning AIs Turn Out to Be One Formula

2026-07-04

A new proof shows that three widely used reinforcement-learning recipes for training reasoning models - GRPO, Dr. GRPO, and DAPO - are all just different operations on a single number, the spread of rewards within a group of sampled answers.

reinforcement learning · GRPO · training · reasoning · theory

Why Asking an AI the Same Question 10,000 Times Barely Helps

2026-07-04

A new analysis shows that sampling many answers from an AI and picking the most common one hits a hard ceiling because the samples are correlated, not independent, so thousands of extra tries can be worth only a couple of genuinely new ones.

reasoning · test-time compute · sampling · evaluation · theory

Samsung's Trick Makes a Tiny 4B Agent Nearly Match a Model 18 Times Bigger

2026-07-04

Samsung R&D UK and Queen Mary University of London published DuoMem, a distillation method that took a 4-billion-parameter agent from a 4.3 percent task-success rate to 77.9 percent, nearly matching a 72-billion-parameter teacher model's 87.1 percent.

Samsung · on-device · distillation · agents · efficiency

Microsoft Starts a 2.5 Billion Dollar Company Just to Get Businesses Using AI

2026-07-04

Microsoft launched a new business called Frontier Company, backed by a $2.5 billion investment and staffed by roughly 6,000 people, whose sole job is deploying AI inside client organizations rather than just selling software or consulting on it.

Microsoft · enterprise · AI adoption · industry · business

OpenAI Is Reportedly in Early Talks to Give the US Government a 5% Stake

2026-07-04

OpenAI is reportedly in early discussions about the US government taking roughly a 5% stake in the company, worth about $42.6 billion at its current $852 billion valuation, ahead of its planned September 2026 IPO.

OpenAI · policy · government · Sam Altman · business

Cloudflare Now Lets Sites Block AI Training Bots While Keeping Search

2026-07-04

Cloudflare has split its AI bot controls into three separately toggleable categories - search, live agent, and training crawlers - free for all sites, with training and agent bots blocked by default on ad-supported pages starting September 15, 2026.

Cloudflare · web · AI crawlers · publishers · policy

NVIDIA Starts Taking a Share of Its Cloud Partners' Revenue, Not Just Selling Them Chips

2026-07-04

NVIDIA announced a new arrangement where it earns a share of the cloud revenue its partners generate from NVIDIA-supported data center capacity, on top of its usual chip sales, with first partners Sharon AI and Firmus building campuses totaling hundreds of thousands of GPUs.

NVIDIA · data centers · cloud · infrastructure · business

← 2026-07-032026-07-04later →