News · 2026-06-29

Microsoft's new memory system lets AI agents remember more by storing less

Microsoft Research has released Memora, a memory system for AI agents, along with public code on GitHub. Its pitch is counterintuitive: agents can remember more if they store and search their memories more cleverly, rather than just hoarding everything.

The problem it solves is one anyone who's used a chatbot for a long project has felt. Today's language models are fundamentally forgetful - each session, they only know what's in front of them in the context window, and once a conversation gets long enough, early details fall off the edge. There are two common fixes, and both have flaws. One is to stuff the entire history back in every time, which gets expensive fast and actually degrades quality, since models lose track of details buried in a huge wall of text. The other is to aggressively summarize the past, which is cheap but throws away the specific details you might need later. You're stuck choosing between remembering everything badly or remembering a blurry sketch.

Memora's idea is to separate the two jobs that have been jammed together: what you store, and how you find it. For each memory, it keeps the full rich content - call it the memory's body - but it also attaches a tiny label, a six-to-eight-word phrase that captures the gist, plus a few context-aware tags it calls 'cue anchors.' Crucially, when the agent searches its memory, it searches only the tiny labels, not the full bodies. Once it finds the right label, it pulls up the full detail behind it.

The analogy is a library card catalog. You don't find a book by speed-reading every volume on the shelves; you flip through the index cards, each a few lines long, until you land on the right one, then go pull the actual book. Memora gives every memory a card. And because new information on an existing topic can be merged into the card that already covers it, the system avoids the fragmentation that plagues simpler memory tools, where the same subject ends up scattered across dozens of disconnected entries. A 'policy-guided retriever' can also hop from one card to related ones through those cue anchors, letting it chase a chain of connected memories the way a person follows a train of thought - this is a more capable cousin of retrieval-augmented generation, the standard technique for letting models look things up.

The reported results are strong. On benchmarks that test whether an AI can recall facts from long, sprawling conversations, Memora claims a new best score, beating popular memory systems like Mem0 and plain retrieval. More striking is the efficiency: it cuts token use by up to 98 percent compared with the stuff-everything-in approach, and it stores roughly half as many entries per conversation as Mem0 - because merging beats fragmenting. The retriever can be hand-prompted, or trained into a small dedicated model so it runs cheaply.

Why it matters: durable memory is the missing piece for agents that work alongside you over weeks or months - a coding assistant that remembers your project's history, a workplace tool that accumulates institutional knowledge. Doing that without re-paying for the entire history on every turn is what makes long-term collaboration economically practical, and an open implementation means others can build on it directly.

The honest caveat is that '98 percent fewer tokens' is measured against the most wasteful baseline - dumping the full context every time. Against other smart memory systems, the margin is real but much narrower, and memory benchmarks have been a fast-moving, somewhat gameable target where today's record rarely lasts. The good news is that the code is public, so Memora's claims are checkable rather than just announced. For anyone tracking what an AI agent should remember, it's a concrete, testable step rather than another closed black box.

Primary source, verified: read the paper →