Your Agent's Memory Is an Asset, Not a Vendor Feature

Sergey Golubev 2026-06-20 15 min read

A builder holds a glowing box labeled "MY MEMORY .md", next to a shelf of swappable harness cartridges (Codex, Hermes, Claude Code). Wall signs read "OWN YOUR MEMORY, RENT THE ENGINE", "MODELS COME AND GO", "PORTABLE > LOCKED-IN". The metaphor: your memory is yours, you rent the engine.

Over the past six months I’ve watched practitioners bounce between wrappers for their personal AI agent. OpenClaw, then Hermes, then back to Codex or Claude Code. New setup, and you assemble everything from scratch again. And the real question keeps hanging there: what actually moves with you when you switch tools?

The answer is simple. What moves is what you own yourself: your memory and your skills. The model and the harness are consumables - you just rent them for the task.

Memory is the core artifact, and it has three dimensions

To keep memory simple yet solid, it helps to break it into three dimensions: where it lives, how it’s organized inside, and when it gets updated.

Storage - where it lives. The most common option is markdown. Cole Medin showed exactly this at the OME conference: his agent’s memory is just files on the computer, a local vault he opens in Obsidian. No vendor account. The word “memory” sounds impressive, but in practice it’s a set of text files in folders that the agent reads selectively each session. Markdown is easy to read for both the agent and a human, and opens in any editor. And it lives in one place: you switch tools - the files stay where they were; you don’t touch them, the new agent just reads them from there. Always with Git: it gives you history, rollback, and backup in one mechanism. Memory isn’t a vector database, it’s a trail of work: what the agent actually did and what you chose to keep.

Structure - how it’s laid out. One pattern is at work here, and lately it’s mentioned more and more - progressive disclosure. The idea: only a thin index is always loaded into context, while the full data is pulled in just-in-time, when it’s actually needed. Anthropic builds all of Claude Code this way: CLAUDE.md goes into context right away, while the full project files are read by the agent only when it gets to them. The same trick applies to skills - at first only a short frontmatter sits in context (name, description, triggers), and the whole SKILL.md loads only if the skill is needed. It even reached MCP: tool definitions for all connected tools used to sit in context immediately - easily tens of thousands of tokens before any work began - but since late 2025 Claude loads them on demand via Tool Search: the index first, the full tool schema pulled in at call time. The same trick everywhere, and it’s worth applying to everything - to memory first of all.

For progressive disclosure to actually work for memory, the first thing to think through is the index itself - the file that loads at the start of every session. Everything else depends on it: it’s not a warehouse, it’s a table of contents, the agent’s workbench. It holds not content but pointers - what is where and when to go there: a short one-line description per section plus the file’s address. Mine looks like this, for example: “how we compute product metrics - in metrics.md”, “roadmap decisions and why - in roadmap-decisions.md”, “segment profiles and ICP - in icp.md”, “ticket conventions - in jira-conventions.md”. The topic files grow separately and get pulled in for the specific task. I keep the index thin - every line loads in every session - so everything bulky goes into separate files, and the index keeps only a map with a very brief summary, so the agent knows which full file to open.

And the bigger the base, the more index discipline matters - otherwise it quietly degrades. A few things that genuinely help. Levels of abstraction: project overview → section → specific entry → detail, with a short summary at each level, built bottom-up. Clear file names and a single table of contents - without them the agent simply stops finding what it needs. And most important - don’t let the index get overgrown: collapse duplicates, mark stale items instead of hoarding them, and don’t let a junk-drawer section like “misc” balloon. That junk drawer is the most dangerous - it hijacks the agent’s attention and buries the precise entries under it.

Manual discipline lasts a long time, but on large bases people bring in tools. It all started with Karpathy’s “LLM wiki” idea: the agent runs through the raw files itself, extracts topics, and assembles an index file that goes into context every time; from it the agent finds the right pages and forms an answer. Next to the index it keeps a chronological log of what’s been done, so it doesn’t redo the same thing. The method works, but it has a cost: the index plus chunks of the found files is a lot of tokens and a noticeable delay on every query. After Karpathy a wave of tools came along that solve it differently. For example, Graphify: it runs your base offline once, builds a graph of many JSON nodes, and drops a report alongside. After that, plain scripts find the relevant nodes - without the model - so almost no tokens are spent assembling context. The authors of such tools promise token savings of tens of times over. It’s a whole class of solutions, not a single project, and for a large personal base it genuinely changes the economics.

One more thing about the index: moving it between harnesses is no problem either. It’s simply flagged as a file the agent must read at the start of a session - exactly like CLAUDE.md, AGENTS.md, or GEMINI.md. Wherever you move, you wire it into mandatory reading there.

These instruction files themselves are named differently - CLAUDE.md in Claude Code, AGENTS.md in Codex, GEMINI.md in Gemini - and they’re instructions, part of the system context, not the memory the agent maintains itself (more on that below). To avoid keeping three copies, people hold a single source: AGENTS.md has de facto become the cross-tool standard - Codex, Cursor, Copilot, and dozens of others read it - while Claude and Gemini point to the same file via a symlink. I keep AGENTS.md as a symlink to CLAUDE.md; both harnesses read one source. If symlinks are inconvenient, there are generators like rulesync that lay out a version for each tool from a single file.

Process - when to update. The most underrated dimension. Cole put it precisely at OME: the hardest part of memory isn’t retrieving what you need, it’s updating the stale stuff in time. The classic risk - the agent digs up last quarter’s goals and takes them for current. Then memory doesn’t help, it hurts.

So it’s important to understand what exactly you put into memory and how you handle updates. I split it into two types. Static - what changes rarely: rules, style, how the project is built, stable facts about you; think it through once and don’t touch it for a long time. Dynamic - what lives its own life: quarterly goals, current priorities, task status. That’s what you shouldn’t dump into shared memory. Changing things are better kept in separate files, updated by hand, and reviewed periodically: a stale goal is better rewritten or flagged than left for the agent as a live fact. The very split into types removes the main pain - memory stops quietly slipping the past in as the present.

A separate layer is the memory the agent maintains itself, not you, on top of your files rather than instead of them. In Claude Code this is auto-memory: on by default since version 2.1.59, it lives in ~/.claude/projects/<project>/memory/, and the agent decides for itself what’s worth recording. And it doesn’t wait until the end of the session - it can capture something important right in the middle of work, at key moments; at startup only the first ~200 lines of the index MEMORY.md are pulled into context, the rest is read on demand. Codex has its own store (~/.codex/memories/, optional), Gemini an experimental one with manual confirmation. This is worth keeping in mind when you move: every harness has its own memory it maintains in parallel with your files. But it too is just files in a known folder, so if you want, you can take it and wire it into the new harness.

Starting in manual mode is fine here, and often even better. While you’re building memory up, top it off after every meaningful session or just ask the agent to record the takeaways, and turn the repeated action into an “update memory from the work” skill. That way you see for yourself what should change and when. And once the update logic settles, you put it on autopilot - a hook at the end of the session. Cole went further and built “dreaming”: on a schedule, a hook reads the daily log of all conversations and promotes the important bits into the main memory file - machine consolidation, the way the brain sorts thoughts in sleep. The industry moved the same way: Anthropic shipped its own dreaming for managed agents, and in academia the paper “Storage Is Not Memory” argues that meaning should be extracted not at ingestion but at retrieval, and events stored verbatim.

Now about vector databases and graphs. In conversations about memory, almost the first thing that comes up is the question: which base to take - vector, graph? A vector database is a fast and cheap default, but for memory it’s weak: it works with chunks and is blind to relationships. It finds “similar in meaning”, while memory lives on exact facts and on relationships between entities that aren’t even named in the query - so for personal memory the vector regularly misses and quietly under-delivers what’s needed. A graph (and LightRAG as its convenient packaging) is genuinely more powerful on relationships, multi-hop, and “how it changed over time” questions, and indispensable in domains with rigid relationships - law, finance, medicine. But it’s a heavy tool: building the graph is slow and not cheap, maintenance is tedious, and for managing your own memory it’s most often overkill. Reach for it deliberately, for a specific task, not because “that’s what the videos say”. So the ladder is simple: start with markdown files and search over them (this lasts surprisingly long), add a vector when files stop coping, and a graph last. Memory should be exactly as complex as it needs to be.

And a separate thought about backups: you need to back up not just data but configuration. There was a loud case where a digital employee crashed and lost all its skills and its base, because they backed up “data only”. An agent’s config is a production artifact: skills, prompts, MCP config, sessions, memory. I keep all of it under git for exactly this reason.

Skills - the second asset that moves with you

Alongside memory lives the second asset - skills. It’s a separate entity, not part of memory. A good formula: memory is what to remember, a skill is how to repeat it. Memory holds facts and decisions, a skill holds a reproducible procedure.

A skill is experience distilled into an instruction for the agent: how I prepare a PRD, how I write release notes, how I sort user feedback into tasks. You pack your years of practice into an instruction, and it stays yours. You just can’t write a solid skill in two weeks: it iterates on real tasks over months - some things fall away, some mature. I’ve already written in detail about skills as a replacement for the SaaS stack; here the more interesting angle is creation, maintenance, and portability.

Storing them is the same as memory: under git, next to the project. And skills’ portability is even better than memory’s right now. The format became a de facto standard: Cursor, Codex, Gemini, and others picked it up within days - not fragmentation but unification. And progressive disclosure works here again: only the skill’s name, short description, and triggers load into context, while the full SKILL.md of 50-500 lines is read by the agent only when a trigger fires. Otherwise thirty skills at once would blow out the context window. So a skill moves into Hermes, into Codex, into Antigravity almost painlessly - and often outlives the agent you wrote it for.

The harness is the new IDE, and people swap it

People used to swap IDEs. Now it’s the harness, the wrapper around the model itself. And it evolves faster than the models: these days the result is increasingly decided not by the model’s smarts but by its harness - tools, orchestration, feedback.

And here Claude Code is strongest precisely in its ecosystem. It’s not “a model in a terminal” but a whole runtime: tools, skills, subagents with isolated contexts, MCP, slash commands, and - crucially - hooks. A hook is a shell command that fires at a fixed point in the lifecycle: before a tool call, after one, at session end. And it fires outside the model’s context - the model doesn’t see it and can’t “negotiate” with it. Cursor is strong in something else: it’s AI right in the editor - instant inline completion on its own fast model, Composer with diffs across several files, repository indexing.

But not everything transfers equally easily. Memory, skills, and the index file itself move without pain, but deep automation has to be reconfigured for the new engine. The best example is those very hooks. They now exist in almost all of them (Claude Code, Codex, Cursor, Gemini), but each engine has its own event names, its own format, and its own settings file, so on a move you have to rewrite them from scratch. Same story with MCP configs and permissions: the content is similar, but it lives and is configured differently, and you have to watch for that.

Plus there’s a wave of harnesses like Hermes, OpenClaw, Pi - each with its own accents. You can use any of them: if your memory is portable, the harness becomes a swappable part.

From this follows something non-obvious about reliability. An agent’s limits are better kept in the harness, not in an instruction. An instruction in the prompt is a request to a probabilistic system: as the dialogue grows, rules in the middle lose weight, and outside text can override them. A hard guarantee comes from a check in the harness itself - code that simply won’t let the agent move on until a condition is met. I’ve seen a case where a DESIGN.md validation was built right into the pipeline: the agent won’t move to the next step until the document passes the check. Not by persuasion, but by code. The difference is like between “I asked it to be careful” and “the door won’t open without a key”.

A single vendor is a risk

And right here surfaces the reason to keep memory separate from the harness. Lock-in. In April 2026, Anthropic for a while blocked requests with specific phrases from OpenClaw’s system prompt: the match was lexical, worked around by a targeted rewording rather than just a rename. By the end of April they dropped the policy. But the precedent is telling: a vendor’s feature and access today can be gone tomorrow. If your whole agent is tied to one model from one company, you don’t fully own your workflow - you’re hostage to someone else’s updates, billing, and decisions. The cure is simple: a multi-model stack as insurance and memory in a portable format, so that on a harness switch you take the whole working directory - .md files, code, tests, settings - and move it.

What I figured out

The model will get smarter in six months, the harness will update even sooner. You can replace both in an evening, or even an hour - if your whole harness is portable and versioned next to the project. Memory and skills accrue over years and belong to you personally. That’s the asset. Build for yourself: own your memory, rent the provider.

Your Agent's Memory Is an Asset, Not a Vendor Feature

Memory is the core artifact, and it has three dimensions

Skills - the second asset that moves with you

The harness is the new IDE, and people swap it

A single vendor is a risk

What I figured out

Sources

Other posts

Skills as Apps 2.0: Replacing the SaaS Stack