A persistent context architecture that turns Claude Code from a clean slate every session into a teammate who knows your projects, remembers your preferences, and gets smarter every session — without being prompted to.
Out of the box, Claude Code has no persistent memory. No matter how many hours you've spent together, tomorrow it won't remember a thing.
A monolithic CLAUDE.md that dumps 15,000 tokens of instructions every session. Deployment details when you're writing emails. Slide rendering rules when you're debugging Apex. 30% of your context window gone before you type a word.
"I'm a Solution Engineer working on Salesforce. Here's my project structure. Don't use these words. Remember the 70/30 rule..." Every. Single. Session. The first 10 minutes are always re-orientation.
Claude makes the same mistake twice. Uses a deprecated product name you corrected last week. Suggests a pattern you've already rejected. Knowledge from yesterday's session vanishes overnight.
The real cost isn't time. It's that Claude can never get better at working with you specifically. Without persistent context, every session is a first date with a genius who has amnesia.
Instead of one massive instruction file, split context into five tiers. Load what you need, when you need it. The right information at the right time.
Who you are, essential rules (anti-slop, anti-hallucination), wiki routing table. The irreducible minimum.
Python script analyzes your directory, git history, and branch name → suggests the right wiki pages to load.
51 pages of deep reference: architecture patterns, deployment runbooks, design systems. Loaded only when the task needs them.
128 topic files: decisions made, mistakes corrected, project status, feedback. MEMORY.md index auto-loaded, details read on demand.
Security governance, architecture standards, testing quality, communication style, context-mode policy, data protection. Loaded from ~/.claude/rules/ every session.
less overhead. The remaining 97.5% of your context window is available for actual work — code, conversation, and output.
The same person, the same projects, the same Claude Code. One has persistent context architecture. The other doesn't.
Each piece is simple on its own. Together they create a system that's genuinely smarter than the sum of its parts.
You open Claude Code in your slide-generation project directory. Here's what happens before you type a single word.
Claude learns your identity, role, and essential rules. Sees the routing table: "If working on slides → load pitch-craft.md and slide-visual-design.md." Cost: 2,800 tokens.
Hook detects cwd contains "slides". Checks git log: recent commits mention "deck" and "slide." Outputs: "Relevant wiki pages: pitch-craft.md, slide-visual-design.md, golden-narrative-framework.md." Cost: 200 tokens.
Claude sees 128 memory entries. Relevant ones jump out: "feedback-imagen-cheesy.md — AI backgrounds look cheesy; solid = premium" and "feedback-seller-urgency-ban.md — Never use SF contract timelines as Why Now drivers." Cost: ~2,000 tokens.
Claude reads pitch-craft.md (golden narrative framework), slide-visual-design.md (design tokens, layout library), and the Starbucks memory file. It already knows: solid backgrounds, 70/30 customer-to-Salesforce ratio, no banned words, brand color resolution. No setup. Just builds.
After Claude writes code, the validator hook checks naming conventions and security patterns automatically. No manual review of basics — the system catches the easy stuff.
Claude learned that Starbucks uses a green-on-dark palette. It appends to wiki/inbox.md: "Starbucks brand colors resolved to #00704A." Next session processes this into a proper wiki entry. The system just got a little smarter.
This system implements the LLM Wiki pattern that Andrej Karpathy described in early 2026: instead of re-deriving knowledge every query, have the LLM build and maintain a persistent wiki that compounds over time.
Immutable input. Human curates what goes in.
Synthesized, interlinked. Claude maintains it.
Structure and conventions. Human + Claude co-evolve.
The core Karpathy insight: the maintenance burden is what kills every knowledge system. Claude handles all the bookkeeping — summarizing, cross-referencing, filing, consistency checking. Your job is sourcing, directing, and thinking. The wiki grows while you work. You never have to organize it.
RAG (retrieval-augmented generation) retrieves by meaning — great for searching large datasets you didn't write. But for personal context — your preferences, your project state, your corrections — structured markdown files that Claude can directly read and update are simpler, faster, and don't lose fidelity. We use both: flat-file wiki for personal knowledge, pgvector RAG for corporate data (product names, customer stories, competitive intel).
Karpathy described the idea. We validated it, then extended it in five directions he hasn't addressed.
Karpathy doesn't address context window costs. We engineered tiered loading: 2,800 tokens always-on, everything else on demand. 82% reduction vs. loading the full schema every session.
Karpathy's workflow is manual — you tell the LLM what to do. Our session-init hook analyzes your git state and auto-suggests which wiki pages to load. No human in the loop.
Karpathy has one artifact: the wiki. We separate four types — wiki (reference), memory (decisions), rules (standards), schema (routing) — each loaded at different times for different reasons.
Karpathy describes Lint. We go further: a /curate command that scans memory for staleness, processes the wiki inbox, and flags what needs attention. Memory files have decay rates. His system grows but doesn’t prune. Ours does.
Karpathy’s system relies on the LLM voluntarily looking things up. We applied DDR-001 from the Agentforce Agent Harness: “agents forget to remember.” Infrastructure constraints load as rules (always). PreToolUse hooks surface ADRs before editing constrained files. The information is in your face when it matters.
“I think there is room here for an incredible new product instead of a hacky collection of scripts.”
We took that seriously. Five named tiers, 19 commands, 10 MCP servers, automated context routing, tiered token loading, architecture guardrail hooks, and ADRs. Not a product — but not a hacky collection of scripts either.
You don't need to build the full system on day one. Start with CLAUDE.md, add layers as you go. Each stage delivers value on its own.
Create CLAUDE.md in your project root with:
Add rules files to ~/.claude/rules/ for cross-project standards.
Enable auto-memory in CLAUDE.md instructions. Start a wiki/ directory with:
Add a routing table to CLAUDE.md mapping work contexts to wiki pages.
Add hooks to settings.json for automated behaviors:
Slim down CLAUDE.md as your wiki grows. Move detail out, keep routing in.
Knowledge bases rot without maintenance. Run a weekly /curate command:
Claude does the legwork. You just say keep, update, or delete.
# Project Context ## Owner - Name: [Your name] - Role: [Your role and team] - Context: [How you use Claude Code — what are you building?] ## Environment - OS: macOS / Windows / Linux - Editor: VS Code / JetBrains / Terminal - Key tools: [sf CLI, npm, docker, etc.] ## Rules (Always Follow) 1. [Your most important rule — e.g., "Never use deprecated APIs"] 2. [Quality standard — e.g., "All Apex must be bulk-safe"] 3. [Voice rule — e.g., "Write like an SE, not a marketing team"] 4. [Safety rule — e.g., "Never hardcode IDs or credentials"] ## Current Projects - [Project 1]: [One-line description + status] - [Project 2]: [One-line description + status] ## Preferences - [How you like to work with Claude] - [What kind of output you expect]
Copy-paste templates, code blocks, and a completion checklist. 30 minutes from zero to done.
15,000 tokens → 2,800 tokens loaded at session start. The rest comes in on demand. Your context window stays clear for actual work.
No more "here's my project, here are my rules, remember that thing from yesterday." Claude knows who you are, what you're working on, and what you've decided.
Every correction, every preference, every architectural decision — saved once, applied forever. Claude stops making the same mistake twice.
The wiki grows every session. Memory accumulates. The system gets measurably better the more you use it. Session 100 is dramatically better than session 1.
60 days later: 220+ memory files, 186 skills, 14 hooks, 13 MCP servers, 2.1 GB of local intelligence. What happens when compound learning doesn't stop.
Read the Follow-UpBuilt by John Tehrani — Principal Solutions Engineer, Salesforce
This document was written collaboratively with Claude Code using the exact system it describes.