News · 2026-06-25

What does your AI actually remember about you?

AI assistants are increasingly given memory, the ability to remember you across sessions, so they don't reintroduce themselves every time and can act like they actually know you. The usual way to check whether that memory is any good is indirect: see whether the assistant does a better job on tasks, and assume good performance means good memory. Two new studies argue that assumption is shaky, and they go looking at the memory itself.

The first, a survey on arXiv, takes a dozen different memory systems and pulls them apart into their working parts: how they store information, how they decide what is worth keeping, how they fetch the right thing at the right moment, and how they tidy up over time. Its central finding is refreshingly unromantic: there is no best memory system. Which design wins depends entirely on what is actually slowing you down, the bottleneck. A system tuned for storing a lot cheaply may be terrible at fetching precisely, and vice versa. The team also found that doing small, local cleanups of memory is far cheaper than periodically reorganizing the whole thing, the way wiping the counter after each meal beats deep-cleaning the kitchen once a month. The lesson is to treat memory as an engineering problem with tradeoffs, not a feature you switch on. Our AI agents explainer covers why memory is becoming central to agents in the first place.

The second study, called MEMPROBE, also on arXiv, does something cleverer and a little unsettling. It sets up simulated users, each given a hidden profile of facts about themselves, lets them chat with a memory-equipped assistant, and then tries to reconstruct each user's hidden profile purely from what ended up in the assistant's memory afterward. In other words, it audits the memory like a detective examining a notebook, asking: how much of who this person is can be recovered from what the AI wrote down?

The result splits two things people usually conflate. The assistants were good at the tasks, so good that even a version with no memory at all often did fine, which means task success was a poor signal of whether anything was actually remembered. But when the researchers tried to rebuild the users' profiles from memory, they could only recover a middling fraction, and it got worse when the assistant could only look at a handful of its memories at a time, as real systems do for speed. The blunt conclusion: being helpful and actually remembering you are two different skills, and a system can have the first without much of the second.

Why it matters: as memory becomes a default feature in assistants and agents, "does it work" is the wrong question. The right questions are which memory design fits your bottleneck, what it costs, and how much it genuinely retains. These studies give the field tools to ask them directly instead of guessing from downstream behavior.

And there is a privacy edge that is impossible to miss. MEMPROBE is, flipped around, a measurement of how much an AI silently retains about a person, a way to see what a system has quietly written down about you in the course of being helpful. That same technique that audits memory quality also reveals an exposure surface: the more faithfully an assistant remembers you, the more there is, sitting in its memory, to be recovered. The honest caveat on both papers is that they rely on simulated users and synthetic profiles for scale, so how well the findings transfer to messy, real, long-term use is still unproven. But the shift they push, from trusting memory to measuring it, is overdue. (Worth noting: one code link circulated for the survey did not resolve, so treat that repository reference with caution until an official one is confirmed.)

Primary source, verified: read the paper → (arXiv 2606.24595)