Context Engineering — Part 1 Follow-Up

From Memory to
Operating System

60 days ago I taught Claude Code to remember. Then compound learning took over. What started as a filing system became an autonomous development platform.

0
Memory Files
0
Skills
0
Active Hooks
0
MCP Servers
2.1 GB
Local Intelligence
Scroll

The genius with amnesia

Install Claude Code today. Open a terminal. Here's what you get:

$ claude

Welcome to Claude Code v2.1.136
Model: claude-opus-4-6
Context: 1,000,000 tokens

What can I help you with?

// No memory of yesterday's session
// No knowledge of your project
// No rules about your constraints
// No enforcement of your decisions
// Nothing compounds. Nothing persists.

Every session starts from scratch. You explain who you are. You explain your project. You re-state constraints Claude violated yesterday. At minute 45, the context fills. You start over. The genius has total amnesia.

Day 0 — Fresh Install

  • You explain yourself every session
  • Corrections lost overnight
  • No enforcement — suggestions only
  • Generic model capability
  • Can only read/write local files
  • Session dies at context limit
  • Asks permission for everything
  • Starts from zero each time

Day 60 — After Compound Learning

  • Knows your role, style, constraints instantly
  • 220+ files of persisted decisions
  • 14 hooks intercept and block mistakes
  • 180+ domain-expert skills on demand
  • 13 MCP servers — CRM, Slack, browser, search, email
  • FTS5 compression — multi-hour sessions
  • Runs production builds unsupervised
  • Compound learning across every session

Day 0: You work with Claude.
Day 60: Claude works for you.

How corrections become autonomy

Nobody planned a 2.1 GB local intelligence system. It grew from three forces that compound on each other:

1. Friction became architecture

Every mistake Claude made, I corrected. Every correction became a memory file. Every repeated mistake became a rule. Every ignored rule became a hook. Every hook that needed data became an MCP server. Every MCP server that overflowed context became context-mode. Each layer exists because the previous layer wasn't enough.

2. The non-developer advantage

I never wrote a line of Python. Never authored a YAML skill file. Never debugged a hook script. Every piece of infrastructure was built by Claude based on my requirements. The hooks, the graph indexer, the skills — all AI-authored. The barrier isn't coding ability. It's knowing what you want and being specific about constraints.

3. Compound learning is exponential

Day 1–20: memory + wiki (the filing system). Day 20–40: hooks + MCP (the enforcement layer). Day 40–60: skills + graph + context-mode (the expertise layer). Each layer made the next faster to build. By day 60, Claude creates new skills in 5 minutes because it has the patterns from the previous 179.

The compound loop

Mistake
📝
Correction
📋
Memory
🛡
Guardrail
Prevention

Example: --set-env-vars wipes all Cloud Run vars → correction → memory → infrastructure rule → PreToolUse hook blocks it forever

60 days of compound growth

Part 1 → Today
Memory
128
220+
+63%
Wiki
59
85
+44%
Hooks
4
14
4x
Skills
20
180+
10x
MCP Servers
10
new

How it grew

February 2026
Part 1 — Five-tier architecture
128 memory files, 59 wiki pages, SessionStart hook, ~20 commands. Claude stopped forgetting overnight.
March 2026
Enforcement + MCP servers
GitHub, Slack, Salesforce CRM, Playwright. PreToolUse hooks started blocking instead of suggesting. Born from three bad deploys.
April 2026
Skills explosion + knowledge graph
Agentforce skills, GSD project management, entity graph. Skills went from 20 to 100+. Each one a compressed domain expert.
May 2026
Full autonomy
Context-mode overflow, 180+ skills, plugin system. Multi-hour production builds without supervision.

The eleven-layer stack

Part 1 had five tiers. The current system has eleven layers. Five evolved from the original. Six are entirely new.

🎯
Identity
CLAUDE.md — who, what, routing table
194 lines
🧠
Memory
Decisions, corrections, state, feedback
220+ files
📚
Knowledge
Wiki — architecture, patterns, runbooks
95+ pages
📏
Standards
Rules — security, architecture, quality, anti-slop
11 files
Decisions
ADRs — reasoning cache for architecture choices
5 ADRs
🛡
Enforcement
Hooks — intercept, validate, gate actions
14 scripts
🎓
Expertise
Skills — compressed domain experts
180+ skills
Action
MCP — CRM, Slack, GitHub, browser, search
13 servers
📦
Overflow
Context-mode — 98% compression sandbox
FTS5
🕸
Graph
Entity relationships — semantic linking
auto-indexed
👥
Entities
People pages — org trees, timelines, relationships
150+ people

From passive to active

Part 1 described a passive system — store things, load them when relevant. What evolved is active — intercepts, validates, enforces without human intervention.

Part 1 (Passive)Now (Active)
"Hooks auto-suggest wiki pages" Hooks block unsafe operations, validate output, enrich graphs
"Memory persists corrections" Memory is entity-linked, graph-indexed, auto-curated
"Rules load as standards" Rules enforced by hook chains — gated, not suggested
"Wiki pages load on demand" Wiki has inbox, lint, auto-capture, entity extraction
"CLAUDE.md routes context" CLAUDE.md is a full orchestration manifest
"Contacts live in CRM" Entity pages capture working style, decision patterns, relationship history

The pattern

Every passive component attracted an active counterpart. Storage attracted enforcement. Reference attracted validation. Memory attracted curation. The system developed an immune response to its own failure modes.

The relationship layer

Memory stores corrections. Wiki stores knowledge. But neither stores people. Entity pages close that gap — a structured page per person that compounds across every interaction.

Two templates (internal + external). Structured top for instant prep — role, org position, working style, territory. Append-only timeline below that grows with every meeting, decision, and collaboration. Seeded programmatically from Slack profiles, then enriched by overnight intelligence crons.

WithoutWith entity pages
"Who's the AE on that account?" Full org tree with SE/AE assignments in one Read
"What did we discuss last time?" Timeline shows every interaction chronologically
Meeting prep = 20 min searching One page, auto-routed by SessionStart hook

Result: 150+ people mapped across sales and solutions org trees. Every leader with their reports, Slack IDs, and coverage areas. The CRM knows accounts — entity pages know the humans working them.

Where the real payoff lives

Individual layers are useful. The interactions between them are where compound growth happens.

Loop 1: Error → Memory → Guardrail → Prevention

Deploy with --set-env-vars instead of --update-env-vars. All env vars wiped. Correct Claude. Correction becomes memory. Memory cited in infrastructure rule. Hook now blocks any deploy using --set-env-vars. That error class can never happen again — even in a brand new session.

Loop 2: Research → Wiki → Graph → Routing

Research Agentforce Agent Script. Claude writes a wiki page. Graph extracts entities. Next session, mention "agent routing" — SessionStart hook surfaces that wiki page + related memories + architecture doc. Context arrives before you ask.

Loop 3: Skill → MCP → Context-Mode → Output

Run /account-prep. Skill knows the workflow. Calls MCP servers (CRM, news, web). Each response compresses through context-mode. Skill assembles the prep doc using anti-slop rules and corporate grounding. One command, five systems, zero manual context management.

This is the compound effect. No single layer produced these outcomes. Memory alone doesn't prevent errors. Hooks alone don't know what's dangerous. Skills alone can't access external data. The value is in the connections.

Agents forget to remember — at every layer

In Part 1, I cited DDR-001 from the Agentforce Agent Harness: "agents forget to remember." The fix was intrinsic constraint awareness. Two months later, DDR-001 applies at every layer:

Layer"Forgetting" Failure ModeFix
MemoryDoesn't check before claimingGraph recalls relevant memories proactively
RulesReads but ignores under pressurePreToolUse hook intercepts before execution
SkillsReinvents instead of loadingSkill routing matches task automatically
WikiPattern-matches from training dataADR hook injects decisions when files in scope
MCPHallucinates instead of queryingGrounding rules require real data queries

Meta-lesson: Every knowledge source needs a corresponding enforcement mechanism. Knowledge without enforcement is a suggestion. Knowledge with enforcement is a constraint. Constraints compound. Suggestions decay.

The world caught up

When Part 1 went out, this felt like a personal hack. Four weeks later:

Karpathy's "LLM Wiki" hit 5,000+ GitHub stars

The pattern we built on (ingest-synthesize-evolve) went viral. Spawned persistent memory tools. We'd been running it in production for months.

Salesforce published "Not All Agentic Harnesses Are Created Equal"

The VP-level Futures team published on the exact framework. The scaffolding that gives a model tools, data, and constraints. Direct validation at the highest level.

Anthropic shipped Managed Agents with memory consolidation

"Dreaming" — agents consolidating learnings between sessions. The same pattern as our memory + /curate system, productized by Anthropic themselves.

MCP token overhead became a known problem

58 tools = 55K tokens per turn. We run 200+ with deferred loading. A problem others are discovering, we'd already solved.

What 60 days taught me

1. The system builds itself

I didn't plan most of this. Each correction, wiki page, and skill grew from a specific session's need. The 5-tier framework just gave those decisions somewhere to land.

2. Enforcement matters more than knowledge

128 memory files didn't stop infrastructure mistakes. 3 PreToolUse hooks did. If you only build one new thing after Part 1, build a guardrail hook for your most expensive mistake.

3. Context overflow is the real scaling limit

Part 1 optimized startup (15K to 5K tokens). The real limit hit at minute 45 when research filled the window. Context-mode was the breakthrough that enabled multi-hour sessions.

4. Skills are the highest-return investment

One skill replaces 30 minutes of manual instruction per session. I have 186 skills assembled from three suites — project management, Salesforce platform, and developer tooling — plus 28 custom commands I directed Claude to build for my specific SE workflow. You don't build 186 skills from scratch. You install the right systems and customize the last mile.

5. You don't need to be a developer

I've never opened a Python file to write the hooks. Never manually created a skill. Claude builds its own infrastructure from my direction. Domain expertise + clear requirements + an AI that builds its own tooling = this.

What to build after Part 1

Priority order based on what gave me the most return:

#WhatWhyEffort
1One PreToolUse guardrail hookPrevents your most expensive recurring mistake30 min
23 domain skillsReplaces 30 min of instruction per session each2 hrs
3One MCP serverCloses the gap between "understands" and "does"1 hr
4Context overflow managementRemoves the 20-minute session ceiling1 hr
5ADR directoryReasoning cache — stops re-debating settled decisions15 min

You don't need 220 memory files or 186 skills. You need the feedback loops. One guardrail that fires when it matters. One skill that loads expertise automatically. One MCP server that fetches real data. Start the loops, then let compound growth do the rest.

Where this goes

Part 1 solved a specific problem: Claude forgetting between sessions. The system that grew from that fix solves a different problem: making a non-developer as productive as a senior engineering team.

199 commits in 28 days built an AI platform with 6 subagents, 25 data connectors, and production infrastructure. That was with the early five-tier system. With the current ten-layer stack, the same scope takes less than a week.

The gap between "domain expertise + clear requirements" and "production software" is closing. Not because models got smarter. Because the local infrastructure around the model compounds. Memory remembers. Hooks enforce. Skills teach. MCP servers act. The graph connects. And none of it requires you to be a developer.

“I think there is room here for an incredible new product instead of a hacky collection of scripts.”

— Andrej Karpathy, on the LLM Wiki pattern

60 days later, the hacky scripts became a platform. Not because I designed one — because compound learning doesn't stop at memory.

What context actually buys you

Two proof points. One about intelligence. One about autonomy.

The Intelligence Gap
The Autonomy Gap

A frontier model audits my setup. 9 of 10 recommendations were already solved.

I shared this article with Gemini Pro and asked it to audit my setup. It gave 10 confident recommendations:

Gemini's RecommendationReality
"You need a deprecation/garbage collection layer"Already built. /curate runs weekly with decay rates and prune cycles.
"Risk of over-constraint paralysis from hooks"Claude Code's permission system already has bypass. Not a rigid rule engine.
"Graph indexing will cause startup latency"Already measured: 61ms per file, 151ms session-init. Imperceptible.
"Users will have dependency conflicts"Starter kit is a clean template. No machine-specific paths.
"You need an auth onboarding wizard"Recipients are Salesforce employees with sf CLI already authenticated.
"Risk of identity/state bleed in the repo"Starter kit is already scrubbed. Personal data never ships.

Confident, well-structured advice that would have been correct for a generic setup — but wrong for mine. It couldn't know because it had no context.

The gap isn't intelligence — it's context. Same model family, same parameters. The difference is what the system remembers about YOUR specific situation.

I approved a plan at 10 PM. By 7 AM, it was deployed to production.

The first case study showed context makes AI smarter. This one shows it makes AI autonomous. It doesn't just answer better. It operates independently.

I had a 7-pillar architecture plan for transforming a production intelligence platform. Three rounds of architectural audit. 14 accepted fixes. 6 rejected findings with documented rationale. Approved at 10 PM. I said: "Build overnight. Don't stop." Then I went to sleep.

19Files Created
375Tests Passing
22Components Deployed
25xCompression
PhaseOriginal EstimateActual
Salesforce metadata (object + 12 fields + event + service + tests)3-4 days~2 hours
Pipeline infrastructure (types, extractor, evaluator, connector)3-4 days~1.5 hours
Three API endpoints (competitive, risk, temporal)2-3 days~45 minutes
Brand design system provider3-4 days~30 minutes
Conversational Agent Router4-5 days~30 minutes
Total (4 phases)15-20 days~5 hours

Why this was possible

This wasn't just "AI writes code fast." Any frontier model can generate files. The reason it shipped as a coherent system — not disconnected fragments — is the operating system underneath:

OS LayerWhat It Contributed
MemoryKnew the entire architecture, 196 cached accounts, data model history, deployment patterns — no re-briefing needed
RulesSecurity governance, architecture patterns, voice standards — all enforced automatically on every file written
Plan3-round audited plan with explicit file paths, interface contracts, dependency graph — zero ambiguity
Wiki60+ pages of project knowledge, integration topology, platform constraints — answered questions without asking
HooksMetadata validators scored every file (90-120/120). Caught issues at write-time, not deploy-time

What the human actually did

RoleTime
Approve the plan (after 3 audit rounds)30 min
Say "build overnight"5 seconds
Sleep8 hours
Verify build + tests + deploy next morning10 min
Total human involvement~45 minutes

The gap isn't speed — it's autonomy. Context architecture doesn't just make AI faster at answering questions. It makes AI capable of independent execution. The human role shifts from "writing code" to "approving plans and verifying outcomes."

Copy this setup in one shot

I built this over 60 days through trial and error. You don't have to. Here's the architecture as a deployable kit — clone the repo, run setup, start from Day 30 instead of Day 0.

What transfers vs. what doesn't: The architecture, templates, and feedback loops transfer. The 220+ specific memories don't — those are YOUR corrections, YOUR project state. The kit gives you the scaffolding. Compound learning fills it in.

The Kit (5 layers, 45 minutes to deploy)

Layer 1: Identity Skeleton

Template CLAUDE.md with routing table structure, role definition, project registry, and essential standards (anti-slop, anti-hallucination). Pre-wired wiki index. You fill in YOUR role, YOUR projects, YOUR constraints. Takes 10 minutes.

Layer 2: Rules & Guardrails

3 starter rules files: communication.md (voice standards, banned patterns), security-governance.md (CRUD/FLS, sharing, secrets), architecture.md (decision framework, preferred patterns). Plus 2 hook scripts: session-init (context routing on start) and one PreToolUse guardrail (blocks your most expensive recurring mistake).

Layer 3: SE Skill Pack

5 production-ready skills: /account-prep (pre-meeting intelligence), /deal-strategy (competitive + talk track), /email-draft (anti-slop customer email), /post-meeting (capture + CRM + follow-up), /demo-prep (script from brief). Each one replaces 30 minutes of manual instruction per use.

Layer 4: Intelligence Automation

Slack channel roster with 25 pre-mapped internal channels (product, competitive, enablement, win/loss, leadership). Overnight gather scripts that scan Exa + HN + GitHub + X + Slack and synthesize a morning brief. One MCP server (GitHub) to prove the action layer. Swap in your OU channels and go.

Layer 5: Entity Pages

wiki/people/ directory with two templates (internal + external). Seed your org tree from Slack profiles — Claude pulls name, title, email, timezone programmatically. Structure: org position at top, append-only timeline below. After one session, every leader and their reports are mapped. After a month, you have a relationship graph no CRM captures.

Channel Roster (pre-mapped for SE org)

CategoryChannelsWhat You Get
Competitive Intel#tmt-solutions, #analyst-coverageCI drops, IDC/ISG/Gartner reports
Product & Roadmap#agentforce-updates, #platform-releasesWhat shipped, what's GA, deprecations
AI & Tooling#ai-club, #ai-engineering-productivity, #solutions-ai-toolingInternal AI tools, techniques, launches
SE Enablement#se-enablement, #demo-sharingNew assets, techniques that work
Territory#tmt-commercial, #tmt-solutions-broadcastOU updates, leadership priorities
Industry#industries-communications, #industries-media, #industries-techVertical trends, reference stories

Growth timeline (what to expect)

Week 1
Foundation working
CLAUDE.md routes context. Rules enforce standards. Session-init suggests relevant pages. Claude stops making your most common mistake.
Week 2–3
Memory accumulating
20–30 memory files from corrections and decisions. Wiki growing organically. Skills saving 30+ min/day. Morning brief arriving automatically.
Week 4+
Compound effects kick in
Loops connecting. Memory feeding guardrails. Skills calling MCP servers. Context-mode extending sessions. The system builds itself from here.

The non-developer advantage

You don't need to code any of this. Tell Claude what you want enforced, what workflow you need automated, what mistake to never make again. Claude builds the hooks, writes the skills, configures the MCP servers. Your job is direction and domain expertise — the same skills that make you good at your actual job.

Get the kit: git clone https://github.com/jtehrani84/claude-code-se-starter-kit.git && cd claude-code-se-starter-kit && ./setup.sh — or contact John Tehrani (jtehrani@salesforce.com) for access.