Three Context Management Patterns for Agentic CLI Skills
Context forking, dynamic injection, and sub-agent delegation — the practices that separate production-ready skill builders from token-hungry workflows.
Ibrahim AbuAlhaol, PhD, P.Eng., SMIEEE
AI Technical Lead
The Hidden Cost of Context Bloat
Building custom skills for agentic CLI tools like Claude Code isn't just about getting the task done — it's about doing it efficiently. Most practitioners focus on whether their skill produces the right output. Far fewer ask a harder question: how much context is being consumed along the way, and how much of that consumption was truly necessary?
Context is finite. Every token spent loading irrelevant documentation, re-fetching project data the agent could have received upfront, or blocking the main thread on a long-running review is a token not spent on the actual task. At small scale, this is friction. At production scale — hundreds of daily skill invocations across a team — it becomes a compounding tax on both cost and throughput.
Three architectural patterns address this directly: context forking, dynamic context injection, and sub-agent delegation. None of them are difficult to implement. Together, they form the foundation of a well-designed agentic skill.
"The fastest skill isn't the one that runs the fewest steps — it's the one that pollutes the fewest tokens."
Pattern 1: Context Forking — Isolate the Noise
Some skills are inherently context-heavy. A skill that generates documentation, for instance, might need to pull in hundreds of lines of API reference material, schema definitions, or style guides before it can do anything useful. If that skill runs in your main conversation window, all of that reference content gets loaded into the shared context — where it will persist, accumulate, and interfere with everything that follows.
Context forking solves this by running the skill in a fully isolated session. The forked agent reads its full reference corpus, does its work, and then passes only the final output back to the main window. Your primary context sees one clean result, not the entire working trace.
In Claude Code, this is configured by adding context: fork to the skill invocation. The mechanics are straightforward, but the design implications are significant. Any skill that consumes large reference documents — technical style guides, full API specs, lengthy runbooks — is a candidate for forking. The rule of thumb: if the skill's internal context would embarrass you if it showed up in the main thread, fork it.
When to fork
- Skills that load more than a few hundred lines of reference documentation
- Skills with multi-step reasoning chains that produce intermediate artifacts the user doesn't need to see
- Skills that might produce verbose diagnostic output before reaching the final answer
- Any skill where context bleed from one invocation to the next would meaningfully degrade quality
What you give up with forking is visibility into the intermediate reasoning. What you gain is a clean, focused main context that doesn't accumulate clutter. For skills that run frequently, that trade-off almost always favors forking.
Pattern 2: Dynamic Context Injection — Pre-load What Matters
The second failure mode is the opposite of the first: agents that spend tokens searching for project data that could have been handed to them upfront. A skill invoked to review a pull request shouldn't need to discover the repo structure, read the package manifest, and identify the primary language through a sequence of exploratory tool calls. That information exists at invocation time. The only question is whether your skill architecture delivers it automatically.
Dynamic context injection means pre-processing live project data before the agent reads the prompt. In Claude Code, shell commands wrapped in backticks execute at prompt-parse time, injecting their output directly into the agent's initial context. A command like !pwd resolves to the current working directory before the agent sees a single token. More usefully, a command like !find . -name "*.ts" | head -20 gives the agent an immediate file tree of TypeScript sources — no search required.
Practical injection patterns
- File tree injection: Pre-inject a filtered directory listing so the agent knows what exists before it starts exploring
- Manifest injection: Load
package.json,pyproject.toml, or equivalent — the agent immediately knows the project's dependencies, scripts, and metadata - Git context injection: Run
!git log --oneline -10to give the agent recent commit history before it tries to understand what changed - Environment injection: Surface relevant environment variables so the agent doesn't probe for configuration it already needs
The productivity gain here compounds in two ways. First, it eliminates the round-trips that come from exploratory tool calls — those calls take time and consume context with their outputs. Second, it front-loads the relevant signal, which tends to improve the quality of everything the agent does downstream. An agent that starts with a clear picture of the project makes better decisions throughout.
The design discipline this requires is intentional: when writing a skill, ask yourself what the agent will definitely need to know before it can begin. Then inject that information at invocation time, not after the agent asks for it.
Pattern 3: Sub-agent Delegation — Background the Long Tail
Some tasks are slow. A skill that reviews an entire pull request, audits a codebase for security vulnerabilities, or generates a comprehensive test suite might take minutes to complete. Running that work in the foreground blocks your main agent — and blocks you — for the entire duration.
Sub-agent delegation addresses this by offloading long-running tasks to a background agent. In Claude Code skills, this is configured by setting background: true on the sub-agent invocation. The main agent dispatches the task and immediately becomes available for other work. When the background agent completes, it surfaces its results without having held the main thread.
This pattern transforms the economics of long-running skills. Instead of a workflow that serially blocks on each expensive operation, you get a workflow that can run multiple expensive operations concurrently — or simply stay responsive while the heavy lifting happens out of band.
Tasks suited for background delegation
- Full pull request reviews that need to read many files across a repository
- Security audits that scan large codebases for vulnerability patterns
- Documentation generation that requires reading and synthesizing many source files
- Test generation that needs to understand the full project before proposing coverage
- Any analysis that would take more than 30 seconds in the foreground
Background delegation also pairs naturally with context forking. A forked, backgrounded sub-agent is the cleanest possible unit of work: it has its own isolated context, runs asynchronously, and delivers only its final output to the main window when complete. For skills with genuinely long execution times, this combination is the gold standard.
Composing the Patterns: A Real-World Skill Architecture
These three patterns are most powerful in combination. Consider a code review skill. A naive implementation invokes the agent directly, lets it explore the repository, loads reference documentation into the shared context, and blocks the user until the review completes. A well-designed implementation does the following:
- Inject context dynamically: At invocation time, inject the diff, the file tree, and recent commit history. The agent starts with full situational awareness.
- Fork the session: The review skill runs in an isolated context so that extensive code reading doesn't pollute the main window.
- Delegate to a background agent: The review itself runs asynchronously. The user is free to continue other work while the analysis proceeds.
The result is a skill that is faster to start (no exploratory tool calls), cleaner to maintain (isolated context), and non-blocking to the user (background execution). None of these changes alter what the skill produces — only how efficiently it produces it.
The Efficiency Mindset
The deeper principle behind all three patterns is the same: treat context as a scarce resource, not a free one. Every token has a cost — in latency, in API spend, and in the quality degradation that comes from a crowded context window. A skill that manages its context budget carefully will outperform a functionally identical skill that doesn't, every time.
This isn't a niche optimization concern. As agentic tooling becomes the standard for engineering work, the difference between a team with context-efficient skills and one without will be measurable in hours per week per engineer. The practices described here are not advanced techniques. They are the baseline for responsible skill authorship.
Build skills that get the task done. Then build skills that get the task done without wasting the context that surrounds them. That second step is where production-grade agentic engineering actually lives.