Ground Truth.
AI, checked against the source.

News · 2026-06-30

Anthropic's Claude Science puts a whole lab bench inside the AI

Anthropic has released Claude Science, a beta AI workbench that collapses literature search, code, statistics, and compute into a single conversational workspace. The tool keeps large or sensitive datasets on the researcher's own infrastructure and attaches a full reproducibility trail — code, environment, and step history — to every output it produces.

Key facts

Scientists currently lose hours each day stitching together separate tools for literature search, coding, statistics, and cluster computing. Claude Science replaces that plumbing with one environment driven by conversation: you describe what you want to investigate, and the system drafts a plan, pulls relevant papers, writes and runs the analysis code, sets up and scales the computing job — on your laptop, a shared cluster, or rented cloud machines — and produces the figures and draft text. Large or sensitive datasets, such as private patient or genomic data, stay on your own infrastructure rather than being uploaded.

The feature that separates this from a fancy chatbot is auditability. When Claude Science hands you a chart, it also hands you the exact code, the software environment, and the full history of steps that produced it. In science, a result nobody can reproduce is barely a result at all, and the reproducibility crisis of the last decade was fueled partly by analyses that lived in one person's head or one uncommented script. By keeping the whole chain attached to every output, the tool makes each finding checkable by default. It also runs a separate reviewer agent whose explicit job is to sanity-check citations and calculations — a direct swing at the well-known habit of AI models to invent plausible-sounding references, a failure we explain in why does AI make things up.

Under the hood it works less like one big brain and more like a team. A coordinating agent sits on top and delegates to more than sixty specialized skills and connectors — tools tuned for specific domains like genomics, protein science, and chemistry — and it can query hundreds of specialized scientific databases as well as connect to outside toolkits, including one from NVIDIA aimed at biology. This division of labor, one generalist manager routing work to many narrow specialists, is a pattern showing up across serious agent systems; our lesson on what makes an AI an agent covers why it beats a single monolithic model for messy real work.

Beta users reported large speedups on tasks like designing gene-editing screens and analyzing single-cell sequencing data, and one neuroscientist at the Allen Institute said a long-form scientific review that would normally take about two years came together in under a month. Anthropic is pairing the launch with discounted seats for academic and nonprofit labs and up to thirty thousand dollars in compute credits, plus additional cloud credits, for selected projects — a clear play to seed adoption where the most interesting science happens and budgets are tightest.

The genuine caveat is that speed and rigor pull against each other, and this is exactly where AI-for-science has burned people before. A tool that drafts a two-year review in a month is thrilling right up until a fabricated citation or a subtly wrong statistical choice slips through and gets built upon. The reviewer agent is a real safeguard, but it is an AI checking an AI, and it can miss things a domain expert would catch instantly. The reproducibility trail is the strongest idea here precisely because it lets a human go back and audit — but only if the human actually does. The risk is not that Claude Science produces bad science; it is that it produces science fast enough that the human verification step feels optional. For now it is a beta, aimed at experts who can smell a wrong answer, and that is the right audience. The test will be whether the auditability it is built around gets used as a habit, or quietly skipped in the rush to the next result.


Primary source, verified: read the paper →

Key questions

What does Claude Science actually do?

It brings a researcher's separate tools - literature search, code notebooks, statistics, and compute clusters - into one AI-driven workspace, and it saves the code, data, and full history behind every figure so results can be reproduced. It is aimed at scientists, not general users.

Who can use it and what does it cost?

It is in beta on Mac and Linux for Pro, Max, Team, and Enterprise plans, with discounted seats and up to thirty thousand dollars in compute credits for selected academic and nonprofit labs.

Does it just make up citations like a chatbot might?

It includes a separate reviewer agent whose job is to check citations and calculations, an attempt to catch the fabrication problem, though users still need to verify important claims themselves.

Topics: Anthropic · science · agents · tools · research

Comments are replies to this story on Bluesky — reply with any Bluesky account to join in.