Extended Working Memory

The Token Budget Bottleneck

A Claude model's context window is finite—200K tokens for Claude 3.5 Sonnet and Claude 3 Opus alike. Within that window, the model must hold:

System instructions and task context
Conversation history
Current codebase context (read files, grep results)
Project documentation
Test outputs and error messages

As a session grows, context fills up. The model must choose: either forget old context (losing continuity) or compress current context (losing fidelity). This is the fundamental tradeoff of attention-based models: depth of knowledge vs. token budget.

"The solution is not a bigger window—it's offloading knowledge to external systems where the model can fetch only what it needs, when it needs it."

Model Context Protocol: The External Cognition Layer

The Model Context Protocol (MCP) is an open-source standard, created by Anthropic and adopted broadly across the AI ecosystem, for connecting language models to external data sources and tools. Rather than embedding all context into the token budget, MCP lets the model:

Query external services: Ask a database for the schema of a specific table, not store the entire schema in context
Invoke specialized tools: Call a vector search service to retrieve semantically similar code examples
Access files on-demand: Read specific files from a knowledge base without pre-loading the entire base
Maintain persistent state: Store session context, decisions, and patterns in external memory without burning tokens

This transforms the model from a single-shot reasoner to a multi-step agent. It doesn't "think harder"—it thinks smarter by delegating knowledge retrieval to systems optimized for that task.

Real-World MCP Architecture

A practical example: an engineering team using Claude Code for architecture design.

Without MCP (Token Bottleneck):

Load entire architecture documentation into context: 5K tokens
Load all service schemas: 8K tokens
Load deployment configs: 3K tokens
Total: 16K tokens just for context, leaving 84K for reasoning and response

With MCP (Efficient Knowledge Access):

Register an MCP service for architecture docs (indexed, searchable)
Register an MCP service for service discovery (queries live state)
Register an MCP service for deployment metadata
The model fetches only relevant information on-demand: "What services depend on the auth service?" queries the knowledge base instead of loading it all
Token savings: 10K+ tokens freed for deeper reasoning

MCP in Practice: Integration Patterns

Three patterns emerge in production deployments:

Pattern 1: Knowledge Base as MCP Server

A vector database (Pinecone, Weaviate) indexed with company architecture, patterns, and case studies. The MCP server exposes semantic search: the model queries "examples of database migration patterns in our codebase" and gets back relevant code snippets, ranked by relevance.

Pattern 2: Live System State via MCP

An MCP server wrapping your observability stack (Prometheus, Datadog, New Relic). The model can ask: "What's the error rate in the checkout service over the last hour?" and get live data without pre-loading metrics into context.

Pattern 3: Persistent Session Memory via MCP

An MCP server wrapping a document store (Firebase, Postgres) where the model can persist decisions and patterns. Early sessions store architectural decisions; later sessions retrieve and reference them. This creates a form of institutional memory that survives session boundaries.

The Cognitive Science Angle: Situated Cognition

Vygotsky's concept of "scaffolding" in learning applies here: people learn better when knowledge is externalized and accessible, not internalized and hidden. MCP is scaffolding for AI: the model's reasoning is enhanced by externalizing knowledge to specialized systems, then accessing it on-demand.

The model doesn't need to memorize your entire architecture—it needs to know how to ask good questions and interpret answers. MCP handles the retrieval; the model handles the reasoning.

Implementation: Building an MCP Service

A minimal MCP service (pseudocode):

class ArchitectureKnowledgeBase(MCPServer):
    def __init__(self):
        self.db = VectorDatabase("company-architecture")

    def query(self, prompt: str) -> List[str]:
        # Convert prompt to embedding
        embedding = embed(prompt)
        # Retrieve top-k relevant documents
        results = self.db.search(embedding, k=5)
        return [doc.text for doc in results]

    # Claude Code can now call:
    # "Search architecture KB for: database scaling patterns"
    # and get back relevant code, docs, and case studies

Conclusion: Knowledge Density vs. Token Budget

The future of AI-augmented engineering isn't bigger models or bigger context windows—it's smarter knowledge architectures. MCP servers separate the model's reasoning process from the knowledge base it reasons over, enabling the model to operate at peak efficiency while maintaining access to unlimited external knowledge.

Teams that implement MCP early—treating their documentation, code, and operational state as queryable systems—will see compounding returns: their AI agents get faster, smarter, and more contextually aware with each session.

The Token Budget Bottleneck

Model Context Protocol: The External Cognition Layer

Real-World MCP Architecture

Without MCP (Token Bottleneck):

With MCP (Efficient Knowledge Access):

MCP in Practice: Integration Patterns

Pattern 1: Knowledge Base as MCP Server

Pattern 2: Live System State via MCP

Pattern 3: Persistent Session Memory via MCP

The Cognitive Science Angle: Situated Cognition

Implementation: Building an MCP Service

Conclusion: Knowledge Density vs. Token Budget

Related Articles

References & Extended Literature

The Token Budget Bottleneck

Model Context Protocol: The External Cognition Layer

Real-World MCP Architecture

Without MCP (Token Bottleneck):

With MCP (Efficient Knowledge Access):

MCP in Practice: Integration Patterns

Pattern 1: Knowledge Base as MCP Server

Pattern 2: Live System State via MCP

Pattern 3: Persistent Session Memory via MCP

The Cognitive Science Angle: Situated Cognition

Implementation: Building an MCP Service

Conclusion: Knowledge Density vs. Token Budget

Related Articles

Context Management Hygiene

The Paradigm Shift

References & Extended Literature