Parchment

Agent Memory Framework

omer·May 02, 2026·15 min read·23

Giving AI Coding Agents Persistent Memory

Every AI coding agent starts every session the same way: knowing nothing. It doesn't know your architecture. It doesn't know the decision you made last week to avoid a certain pattern. It doesn't know that the test suite breaks if you mock the database a specific way. It doesn't know that you spent 45 minutes yesterday debugging an issue caused by a stale import.

Every session, you re-explain. Every session, the agent makes the same mistakes. Every session, you start from zero.

I call this agent dementia. And after five weeks of building, testing, and iterating across two production repositories, I built a framework that fixes it.

What is AMF?

The Agent Memory Framework is a lightweight, file-based system that gives AI coding agents persistent memory across sessions. It works with any AI agent that can read files: GitHub Copilot, Claude, GPT, Cursor, or anything else. No extensions. No plugins. No configuration changes. Just plain markdown files that live locally on your machine and never touch your git history.

The core idea is simple. Before every session, the agent reads a small set of markdown files that contain everything it needs to know: what the codebase looks like, what decisions have been made, what mistakes to avoid, what you're currently working on, and what happened in previous sessions. Instead of starting from zero, it starts with the knowledge of a tenured team member.

In one sentence: run a one-time codebase review, bootstrap a few lightweight context files, and every AI agent session picks up where the last one left off.

The Difference It Makes

Here's what a typical agent interaction looks like without AMF:

You: Update the Button component to support a new "ghost" variant.

Agent: I'd be happy to help. Can you tell me where the Button component is located? What styling system are you using? Are there existing variants I should follow as a pattern? What's the file structure for your component library?

Four questions before any work happens. You answer them, the agent writes code, and the next session you'll answer them all again.

Here's the same prompt with AMF:

You: Update the Button component to support a new "ghost" variant.

Agent: Based on the codebase review, the Button component is at src/components/ui/Button.tsx using CVA for variant management. Existing variants (primary, secondary, outline) follow the pattern in the design system's token structure. I see from the decisions ledger that all new variants should use CSS custom properties rather than hardcoded values. Here's the implementation...

No questions. No re-explanation. The agent already knows.

Across a typical 3-hour working session, this eliminates roughly 30 minutes of context re-feeding, repeated mistakes, and rework from an agent that doesn't know your codebase. That's a 17% efficiency gain, and it compounds as the memory files grow richer over time.

Architecture

AMF uses a two-tier architecture:

Global layer sits outside any single repository. It holds your general coding preferences, patterns you follow across all projects, and cross-project lessons. Think of it as your developer profile that every agent session inherits.

Repository layer lives inside each repo (excluded from git via .git/info/exclude, so it never appears in diffs, commits, PRs, or the remote). It holds everything specific to that codebase: the architecture map, active tickets, project-specific decisions, and the running session log.

Files are excluded from version control by design. This isn't shared team state. It's your personal agent memory, customized to how you think about and work within each codebase.

Core Components

The Codebase Review is the highest-ROI component in the entire framework. It's a one-time, thorough audit where an agent explores the entire codebase and documents the architecture, patterns, conventions, file structure, and gotchas. For a medium-sized frontend app (100-200 source files), expect 20-40 minutes. It runs once and the output is permanent until you choose to refresh it (typically quarterly, or after major architectural changes). Every subsequent session benefits from this investment.

The Bootstrap File (copilot-instructions.md) is the agent's first read. It defines behavior rules, project context, active tickets, known issues, and pointers to other AMF files. This is the single file that transforms a generic agent into one that understands your project.

The Context File (copilot-context.md) holds durable project state: tech stack, environment setup, key architectural decisions, and the current state of ongoing work. It stays high-signal by design. Session-level details go in the running log, not here.

The Running Log (running-log.md) captures what happened each session: what changed, what broke, what was learned. When it hits 7-10 entries, you summarize it into a weekly rollup and flush old entries. This keeps the log scannable while preserving history.

The Lessons Learned File (lessons-learned.md) is where institutional knowledge accumulates. Every time the agent makes a mistake, you record it with a prevention strategy. Not just "what went wrong" but "how to never repeat it." After five weeks across two repos, I've captured 16+ lessons. Each one saves time in every future session.

The Decisions Ledger (decisions-ledger.md) records technical choices that future sessions need to respect. "We use server components for data fetching." "All API routes validate with Zod." "Date formatting uses date-fns, not Intl." These prevent agents from relitigating settled questions.

The Dual-Model Code Review

This is the part of AMF that I think has the most untapped potential, and it's a workflow pattern I haven't seen anyone else implement.

The traditional approach: you write code with an AI agent, then either review it yourself or let the same agent review its own work. The problem is obvious. An agent reviewing its own code carries the same biases and blind spots that produced the code in the first place.

AMF introduces a two-phase review:

Phase 1 (Generation): During your dev session, the agent generates structured review notes as it works. What it changed, why, what tradeoffs it made, what it's unsure about. These get saved to a code-review-notes/ directory.

Phase 2 (Independent Review): You open a separate agent session, a completely fresh instance with no memory of having written the code, and point it at the review notes plus the actual diff. This agent reviews with the independence of a human reviewer. It hasn't written the code, so it has no attachment to defending it.

The independence is the value. The review agent catches things the dev agent rationalized away. It questions decisions the dev agent made on autopilot. It finds edge cases the dev agent didn't consider because it was too close to the implementation.

Real-World Results

This isn't theoretical. Here's what five weeks of active use produced:

L3 Design System Migration: A multi-sprint initiative to migrate a legacy design system to a modern component library. The codebase review mapped the entire existing component structure before any migration work began. The agent knew which components had dependencies, which patterns were deprecated, and which areas had known issues. Migration sessions started immediately productive instead of spending the first 20 minutes re-learning the component tree.

Label Rename Across 40+ Files: A sweeping rename operation that touched files across the entire codebase. The decisions ledger recorded the naming convention. The running log tracked which files had been updated across sessions. The agent picked up exactly where the last session left off, with zero files missed or double-processed.

Cumulative Learning: 16+ entries in lessons learned. 17+ entries in the decisions ledger. Each one represents a mistake that will never be repeated or a decision that will never be relitigated. The compound effect is significant: session 20 is dramatically more productive than session 1, not because the agent got smarter, but because the memory got richer.

Setting It Up

Setup takes one session:

  1. Drop three starter files into your repo: copilot-instructions.md, copilot-context.md, and running-log.md
  2. Add the file paths to .git/info/exclude
  3. Issue the bootstrap prompt to your agent: it reads the instructions, orients to the project, and you're live
  4. Run the codebase review (strongly recommended, this is where the real ROI lives)

Your first 15 minutes: After bootstrap, open a Copilot/Claude/GPT session and start with a feature task you'd normally do. Notice the difference: the agent references architecture from the codebase review, respects constraints from the context file, and doesn't ask questions it already has answers to.

Maintenance Cadence

WhenWhatTime
Every sessionUpdate active tickets if scope changes30 seconds
When something breaksAdd to lessons learned with prevention strategy1 minute
WeeklyRoll up running log, prune known issues5 minutes
QuarterlyRefresh codebase review if architecture shifted20-40 minutes

Total ongoing cost: roughly 10 minutes per week. The framework largely maintains itself, since agents update the running log and flag lessons as part of their normal workflow.

What AMF Is Not

Not shared team state. The files are local to each developer. Your lessons, your decisions, your agent's behavior. This is intentional. Two developers will have different perspectives on the same codebase, and that's a feature. Shared team knowledge belongs in team docs, PR reviews, and architecture decision records. AMF captures the stuff that lives in your head and currently vanishes between sessions.

Not tied to one AI model. It's plain markdown. Switch from Copilot to Claude to GPT mid-project and nothing breaks. The running log, lessons learned, and decisions transfer seamlessly.

Not a heavyweight process. Skip what you don't need. Add sections that help your workflow. Use the lightweight template for small changes (just: what changed, what files, validated how). The framework adapts to you, not the other way around.

When to Use It (and When Not To)

Use it on any repo you'll work in across multiple sessions. The more sessions, the more value compounds. Feature work, migrations, refactors, bug fix campaigns: all benefit from persistent memory.

Skip it for throwaway scripts, one-off prototypes, repos you'll touch exactly once, or quick fixes that don't need agent context. Not everything needs a framework.

The Bigger Picture

AI coding agents are getting more capable every month. But capability without memory is a ceiling. A brilliant agent that forgets everything between sessions is like a senior engineer with amnesia: technically skilled but unable to build on yesterday's work.

The Agent Memory Framework is a practical answer to that ceiling. It doesn't require better models or new tooling. It works today, with whatever agent you're already using, by giving it the one thing it's missing: the ability to remember.

Run one codebase review. Bootstrap three files. And every agent session after that starts with the knowledge of every session before it.

Related posts