From Memory to Operating System

Day ZeroThe genius with amnesia

Install Claude Code today. Open a terminal. Here's what you get:

$ claude

Welcome to Claude Code v2.1.136
Model: claude-opus-4-6
Context: 1,000,000 tokens

What can I help you with?

// No memory of yesterday's session
// No knowledge of your project
// No rules about your constraints
// No enforcement of your decisions
// Nothing compounds. Nothing persists.

Every session starts from scratch. You explain who you are. You explain your project. You re-state constraints Claude violated yesterday. At minute 45, the context fills. You start over. The genius has total amnesia.

Day 0 — Fresh Install

You explain yourself every session
Corrections lost overnight
No enforcement — suggestions only
Generic model capability
Can only read/write local files
Session dies at context limit
Asks permission for everything
Starts from zero each time

Day 60 — After Compound Learning

Knows your role, style, constraints instantly
220+ files of persisted decisions
14 hooks intercept and block mistakes
180+ domain-expert skills on demand
13 MCP servers — CRM, Slack, browser, search, email
FTS5 compression — multi-hour sessions
Runs production builds unsupervised
Compound learning across every session

Day 0: You work with Claude.
Day 60: Claude works for you.

The MechanismHow corrections become autonomy

Nobody planned a 2.1 GB local intelligence system. It grew from three forces that compound on each other:

1. Friction became architecture

Every mistake Claude made, I corrected. Every correction became a memory file. Every repeated mistake became a rule. Every ignored rule became a hook. Every hook that needed data became an MCP server. Every MCP server that overflowed context became context-mode. Each layer exists because the previous layer wasn't enough.

2. The non-developer advantage

I never wrote a line of Python. Never authored a YAML skill file. Never debugged a hook script. Every piece of infrastructure was built by Claude based on my requirements. The hooks, the graph indexer, the skills — all AI-authored. The barrier isn't coding ability. It's knowing what you want and being specific about constraints.

3. Compound learning is exponential

Day 1–20: memory + wiki (the filing system). Day 20–40: hooks + MCP (the enforcement layer). Day 40–60: skills + graph + context-mode (the expertise layer). Each layer made the next faster to build. By day 60, Claude creates new skills in 5 minutes because it has the patterns from the previous 179.

The compound loop

⚠

Mistake

→

📝

Correction

→

📋

Memory

→

🛡

Guardrail

→

✅

Prevention

Example: --set-env-vars wipes all Cloud Run vars → correction → memory → infrastructure rule → PreToolUse hook blocks it forever

ArchitectureThe eleven-layer stack

Part 1 had five tiers. The current system has eleven layers. Five evolved from the original. Six are entirely new.

🎯

Identity

CLAUDE.md — who, what, routing table

194 lines

🧠

Memory

Decisions, corrections, state, feedback

220+ files

📚

Knowledge

Wiki — architecture, patterns, runbooks

95+ pages

📏

Standards

Rules — security, architecture, quality, anti-slop

11 files

⚖

Decisions

ADRs — reasoning cache for architecture choices

5 ADRs

🛡

Enforcement

Hooks — intercept, validate, gate actions

14 scripts

🎓

Expertise

Skills — compressed domain experts

180+ skills

⚡

Action

MCP — CRM, Slack, GitHub, browser, search

13 servers

📦

Overflow

Context-mode — 98% compression sandbox

FTS5

🕸

Graph

Entity relationships — semantic linking

auto-indexed

👥

Entities

People pages — org trees, timelines, relationships

150+ people

The ShiftFrom passive to active

Part 1 described a passive system — store things, load them when relevant. What evolved is active — intercepts, validates, enforces without human intervention.

Part 1 (Passive)		Now (Active)
"Hooks auto-suggest wiki pages"	→	Hooks block unsafe operations, validate output, enrich graphs
"Memory persists corrections"	→	Memory is entity-linked, graph-indexed, auto-curated
"Rules load as standards"	→	Rules enforced by hook chains — gated, not suggested
"Wiki pages load on demand"	→	Wiki has inbox, lint, auto-capture, entity extraction
"CLAUDE.md routes context"	→	CLAUDE.md is a full orchestration manifest
"Contacts live in CRM"	→	Entity pages capture working style, decision patterns, relationship history

The pattern

Every passive component attracted an active counterpart. Storage attracted enforcement. Reference attracted validation. Memory attracted curation. The system developed an immune response to its own failure modes.

Entity PagesThe relationship layer

Memory stores corrections. Wiki stores knowledge. But neither stores people. Entity pages close that gap — a structured page per person that compounds across every interaction.

Two templates (internal + external). Structured top for instant prep — role, org position, working style, territory. Append-only timeline below that grows with every meeting, decision, and collaboration. Seeded programmatically from Slack profiles, then enriched by overnight intelligence crons.

Without		With entity pages
"Who's the AE on that account?"	→	Full org tree with SE/AE assignments in one Read
"What did we discuss last time?"	→	Timeline shows every interaction chronologically
Meeting prep = 20 min searching	→	One page, auto-routed by SessionStart hook

Result: 150+ people mapped across sales and solutions org trees. Every leader with their reports, Slack IDs, and coverage areas. The CRM knows accounts — entity pages know the humans working them.

InteractionsWhere the real payoff lives

Individual layers are useful. The interactions between them are where compound growth happens.

Loop 1: Error → Memory → Guardrail → Prevention

Deploy with --set-env-vars instead of --update-env-vars. All env vars wiped. Correct Claude. Correction becomes memory. Memory cited in infrastructure rule. Hook now blocks any deploy using --set-env-vars. That error class can never happen again — even in a brand new session.

Loop 2: Research → Wiki → Graph → Routing

Research Agentforce Agent Script. Claude writes a wiki page. Graph extracts entities. Next session, mention "agent routing" — SessionStart hook surfaces that wiki page + related memories + architecture doc. Context arrives before you ask.

Loop 3: Skill → MCP → Context-Mode → Output

Run /account-prep. Skill knows the workflow. Calls MCP servers (CRM, news, web). Each response compresses through context-mode. Skill assembles the prep doc using anti-slop rules and corporate grounding. One command, five systems, zero manual context management.

This is the compound effect. No single layer produced these outcomes. Memory alone doesn't prevent errors. Hooks alone don't know what's dangerous. Skills alone can't access external data. The value is in the connections.

LessonAgents forget to remember — at every layer

In Part 1, I cited DDR-001 from the Agentforce Agent Harness: "agents forget to remember." The fix was intrinsic constraint awareness. Two months later, DDR-001 applies at every layer:

Layer	"Forgetting" Failure Mode	Fix
Memory	Doesn't check before claiming	Graph recalls relevant memories proactively
Rules	Reads but ignores under pressure	PreToolUse hook intercepts before execution
Skills	Reinvents instead of loading	Skill routing matches task automatically
Wiki	Pattern-matches from training data	ADR hook injects decisions when files in scope
MCP	Hallucinates instead of querying	Grounding rules require real data queries

Meta-lesson: Every knowledge source needs a corresponding enforcement mechanism. Knowledge without enforcement is a suggestion. Knowledge with enforcement is a constraint. Constraints compound. Suggestions decay.

ValidationThe world caught up

When Part 1 went out, this felt like a personal hack. Four weeks later:

Karpathy's "LLM Wiki" hit 5,000+ GitHub stars

The pattern we built on (ingest-synthesize-evolve) went viral. Spawned persistent memory tools. We'd been running it in production for months.

Salesforce published "Not All Agentic Harnesses Are Created Equal"

The VP-level Futures team published on the exact framework. The scaffolding that gives a model tools, data, and constraints. Direct validation at the highest level.

Anthropic shipped Managed Agents with memory consolidation

"Dreaming" — agents consolidating learnings between sessions. The same pattern as our memory + /curate system, productized by Anthropic themselves.

MCP token overhead became a known problem

58 tools = 55K tokens per turn. We run 200+ with deferred loading. A problem others are discovering, we'd already solved.

LessonsWhat 60 days taught me

1. The system builds itself

I didn't plan most of this. Each correction, wiki page, and skill grew from a specific session's need. The 5-tier framework just gave those decisions somewhere to land.

2. Enforcement matters more than knowledge

128 memory files didn't stop infrastructure mistakes. 3 PreToolUse hooks did. If you only build one new thing after Part 1, build a guardrail hook for your most expensive mistake.

3. Context overflow is the real scaling limit

Part 1 optimized startup (15K to 5K tokens). The real limit hit at minute 45 when research filled the window. Context-mode was the breakthrough that enabled multi-hour sessions.

4. Skills are the highest-return investment

One skill replaces 30 minutes of manual instruction per session. I have 186 skills assembled from three suites — project management, Salesforce platform, and developer tooling — plus 28 custom commands I directed Claude to build for my specific SE workflow. You don't build 186 skills from scratch. You install the right systems and customize the last mile.

5. You don't need to be a developer

I've never opened a Python file to write the hooks. Never manually created a skill. Claude builds its own infrastructure from my direction. Domain expertise + clear requirements + an AI that builds its own tooling = this.

#	What	Why	Effort
1	One PreToolUse guardrail hook	Prevents your most expensive recurring mistake	30 min
2	3 domain skills	Replaces 30 min of instruction per session each	2 hrs
3	One MCP server	Closes the gap between "understands" and "does"	1 hr
4	Context overflow management	Removes the 20-minute session ceiling	1 hr
5	ADR directory	Reasoning cache — stops re-debating settled decisions	15 min

The TrajectoryWhere this goes

Part 1 solved a specific problem: Claude forgetting between sessions. The system that grew from that fix solves a different problem: making a non-developer as productive as a senior engineering team.

199 commits in 28 days built an AI platform with 6 subagents, 25 data connectors, and production infrastructure. That was with the early five-tier system. With the current ten-layer stack, the same scope takes less than a week.

The gap between "domain expertise + clear requirements" and "production software" is closing. Not because models got smarter. Because the local infrastructure around the model compounds. Memory remembers. Hooks enforce. Skills teach. MCP servers act. The graph connects. And none of it requires you to be a developer.

“I think there is room here for an incredible new product instead of a hacky collection of scripts.”

— Andrej Karpathy, on the LLM Wiki pattern

60 days later, the hacky scripts became a platform. Not because I designed one — because compound learning doesn't stop at memory.

Case StudiesWhat context actually buys you

Two proof points. One about intelligence. One about autonomy.

The Intelligence Gap

The Autonomy Gap

A frontier model audits my setup. 9 of 10 recommendations were already solved.

I shared this article with Gemini Pro and asked it to audit my setup. It gave 10 confident recommendations:

Gemini's Recommendation	Reality
"You need a deprecation/garbage collection layer"	Already built. /curate runs weekly with decay rates and prune cycles.
"Risk of over-constraint paralysis from hooks"	Claude Code's permission system already has bypass. Not a rigid rule engine.
"Graph indexing will cause startup latency"	Already measured: 61ms per file, 151ms session-init. Imperceptible.
"Users will have dependency conflicts"	Starter kit is a clean template. No machine-specific paths.
"You need an auth onboarding wizard"	Recipients are Salesforce employees with sf CLI already authenticated.
"Risk of identity/state bleed in the repo"	Starter kit is already scrubbed. Personal data never ships.

Confident, well-structured advice that would have been correct for a generic setup — but wrong for mine. It couldn't know because it had no context.

The gap isn't intelligence — it's context. Same model family, same parameters. The difference is what the system remembers about YOUR specific situation.

I approved a plan at 10 PM. By 7 AM, it was deployed to production.

The first case study showed context makes AI smarter. This one shows it makes AI autonomous. It doesn't just answer better. It operates independently.

I had a 7-pillar architecture plan for transforming a production intelligence platform. Three rounds of architectural audit. 14 accepted fixes. 6 rejected findings with documented rationale. Approved at 10 PM. I said: "Build overnight. Don't stop." Then I went to sleep.

19Files Created

375Tests Passing

22Components Deployed

25xCompression

Phase	Original Estimate	Actual
Salesforce metadata (object + 12 fields + event + service + tests)	3-4 days	~2 hours
Pipeline infrastructure (types, extractor, evaluator, connector)	3-4 days	~1.5 hours
Three API endpoints (competitive, risk, temporal)	2-3 days	~45 minutes
Brand design system provider	3-4 days	~30 minutes
Conversational Agent Router	4-5 days	~30 minutes
Total (4 phases)	15-20 days	~5 hours

Why this was possible

This wasn't just "AI writes code fast." Any frontier model can generate files. The reason it shipped as a coherent system — not disconnected fragments — is the operating system underneath:

OS Layer	What It Contributed
Memory	Knew the entire architecture, 196 cached accounts, data model history, deployment patterns — no re-briefing needed
Rules	Security governance, architecture patterns, voice standards — all enforced automatically on every file written
Plan	3-round audited plan with explicit file paths, interface contracts, dependency graph — zero ambiguity
Wiki	60+ pages of project knowledge, integration topology, platform constraints — answered questions without asking
Hooks	Metadata validators scored every file (90-120/120). Caught issues at write-time, not deploy-time

What the human actually did

Role	Time
Approve the plan (after 3 audit rounds)	30 min
Say "build overnight"	5 seconds
Sleep	8 hours
Verify build + tests + deploy next morning	10 min
Total human involvement	~45 minutes

The gap isn't speed — it's autonomy. Context architecture doesn't just make AI faster at answering questions. It makes AI capable of independent execution. The human role shifts from "writing code" to "approving plans and verifying outcomes."

Recipe CardCopy this setup in one shot

I built this over 60 days through trial and error. You don't have to. Here's the architecture as a deployable kit — clone the repo, run setup, start from Day 30 instead of Day 0.

What transfers vs. what doesn't: The architecture, templates, and feedback loops transfer. The 220+ specific memories don't — those are YOUR corrections, YOUR project state. The kit gives you the scaffolding. Compound learning fills it in.

The Kit (5 layers, 45 minutes to deploy)

Layer 1: Identity Skeleton

Template CLAUDE.md with routing table structure, role definition, project registry, and essential standards (anti-slop, anti-hallucination). Pre-wired wiki index. You fill in YOUR role, YOUR projects, YOUR constraints. Takes 10 minutes.

Layer 2: Rules & Guardrails

3 starter rules files: communication.md (voice standards, banned patterns), security-governance.md (CRUD/FLS, sharing, secrets), architecture.md (decision framework, preferred patterns). Plus 2 hook scripts: session-init (context routing on start) and one PreToolUse guardrail (blocks your most expensive recurring mistake).

Layer 3: SE Skill Pack

5 production-ready skills: /account-prep (pre-meeting intelligence), /deal-strategy (competitive + talk track), /email-draft (anti-slop customer email), /post-meeting (capture + CRM + follow-up), /demo-prep (script from brief). Each one replaces 30 minutes of manual instruction per use.

Layer 4: Intelligence Automation

Slack channel roster with 25 pre-mapped internal channels (product, competitive, enablement, win/loss, leadership). Overnight gather scripts that scan Exa + HN + GitHub + X + Slack and synthesize a morning brief. One MCP server (GitHub) to prove the action layer. Swap in your OU channels and go.

Layer 5: Entity Pages

wiki/people/ directory with two templates (internal + external). Seed your org tree from Slack profiles — Claude pulls name, title, email, timezone programmatically. Structure: org position at top, append-only timeline below. After one session, every leader and their reports are mapped. After a month, you have a relationship graph no CRM captures.

Channel Roster (pre-mapped for SE org)

Category	Channels	What You Get
Competitive Intel	#tmt-solutions, #analyst-coverage	CI drops, IDC/ISG/Gartner reports
Product & Roadmap	#agentforce-updates, #platform-releases	What shipped, what's GA, deprecations
AI & Tooling	#ai-club, #ai-engineering-productivity, #solutions-ai-tooling	Internal AI tools, techniques, launches
SE Enablement	#se-enablement, #demo-sharing	New assets, techniques that work
Territory	#tmt-commercial, #tmt-solutions-broadcast	OU updates, leadership priorities
Industry	#industries-communications, #industries-media, #industries-tech	Vertical trends, reference stories

Growth timeline (what to expect)

Week 1

Foundation working

CLAUDE.md routes context. Rules enforce standards. Session-init suggests relevant pages. Claude stops making your most common mistake.

Week 2–3

Memory accumulating

20–30 memory files from corrections and decisions. Wiki growing organically. Skills saving 30+ min/day. Morning brief arriving automatically.

Week 4+

Compound effects kick in

Loops connecting. Memory feeding guardrails. Skills calling MCP servers. Context-mode extending sessions. The system builds itself from here.

The non-developer advantage

You don't need to code any of this. Tell Claude what you want enforced, what workflow you need automated, what mistake to never make again. Claude builds the hooks, writes the skills, configures the MCP servers. Your job is direction and domain expertise — the same skills that make you good at your actual job.

Get the kit: git clone https://github.com/jtehrani84/claude-code-se-starter-kit.git && cd claude-code-se-starter-kit && ./setup.sh — or contact John Tehrani (jtehrani@salesforce.com) for access.

From Memory to
Operating System

Day ZeroThe genius with amnesia

Day 0 — Fresh Install

Day 60 — After Compound Learning

The MechanismHow corrections become autonomy

The compound loop

Scale60 days of compound growth

How it grew

ArchitectureThe eleven-layer stack

The ShiftFrom passive to active

The pattern

Entity PagesThe relationship layer

InteractionsWhere the real payoff lives

LessonAgents forget to remember — at every layer

ValidationThe world caught up

LessonsWhat 60 days taught me

For YouWhat to build after Part 1

The TrajectoryWhere this goes

Case StudiesWhat context actually buys you

A frontier model audits my setup. 9 of 10 recommendations were already solved.

I approved a plan at 10 PM. By 7 AM, it was deployed to production.

Why this was possible

What the human actually did

Recipe CardCopy this setup in one shot

The Kit (5 layers, 45 minutes to deploy)

Channel Roster (pre-mapped for SE org)

Growth timeline (what to expect)

The non-developer advantage

From Memory toOperating System

Day ZeroThe genius with amnesia

Day 0 — Fresh Install

Day 60 — After Compound Learning

The MechanismHow corrections become autonomy

The compound loop

Scale60 days of compound growth

How it grew

ArchitectureThe eleven-layer stack

The ShiftFrom passive to active

The pattern

Entity PagesThe relationship layer

InteractionsWhere the real payoff lives

LessonAgents forget to remember — at every layer

ValidationThe world caught up

LessonsWhat 60 days taught me

For YouWhat to build after Part 1

The TrajectoryWhere this goes

Case StudiesWhat context actually buys you

A frontier model audits my setup. 9 of 10 recommendations were already solved.

I approved a plan at 10 PM. By 7 AM, it was deployed to production.

Why this was possible

What the human actually did

Recipe CardCopy this setup in one shot

The Kit (5 layers, 45 minutes to deploy)

Channel Roster (pre-mapped for SE org)

Growth timeline (what to expect)

The non-developer advantage

From Memory to
Operating System