← Back to all entries
2026-03-15

Fast Mode, Compaction API & Claude Code Updates

Fast Mode, Compaction & Claude Code Updates — visual for 2026-03-15

Fast Mode for Opus 4.6 — Up to 2.5× Faster Output

Launched on 7 February 2026, Fast Mode for Claude Opus 4.6 delivers up to 2.5× faster output token generation via a new speed parameter in the Messages API. Previously you had to choose between Opus-level intelligence and sub-second responsiveness; Fast Mode lets you get both — at premium pricing — for latency-sensitive agentic applications.

How to enable Fast Mode

import anthropic

client = anthropic.Anthropic()

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=1024,
    speed="fast",          # ← enable Fast Mode
    messages=[{
        "role": "user",
        "content": "Summarise this report in bullet points."
    }]
)
print(response.content[0].text)

When to use it

Tip Fast Mode is currently available via a waitlist at claude.com/fast-mode. Standard Opus 4.6 pricing applies with a premium multiplier — benchmark your actual use case to confirm the speed gain justifies the cost delta for your workload.

fast-mode Opus-4.6 latency API

Compaction API — Effectively Infinite Conversations

The Compaction API (launched in beta on 5 February 2026) provides server-side context summarisation for Claude Opus 4.6 conversations. When your conversation grows beyond the model's practical context limit, the API can automatically distil the history into a compact summary, then continue as if no context was lost. Long-running agents, multi-session workflows, and persistent assistants can now operate without manual context management.

How compaction works

Minimal usage

response = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=2048,
    compaction={
        "enabled": True,
        "threshold": 0.85   # compact when 85 % of context is used
    },
    messages=long_conversation_history
)

Key insight Previously, long-running agents had to either truncate history (losing context) or implement bespoke summarisation logic (engineering overhead). The Compaction API moves this complexity to the platform layer so your agent code stays simple.

Beta caveat Compaction is currently available only for Opus 4.6. Summarisation is non-deterministic — run evals on your use case to verify that no business-critical information is lost across compaction boundaries.

compaction Opus-4.6 agents context-management beta

Claude Code: Three New Productivity Features

Recent Claude Code releases (v2.1.74 – v2.1.76) have shipped several quality-of-life improvements aimed at reducing wasted context and giving developers finer control over effort and speed. Here are the three most impactful additions.

1. /context — Actionable Optimisation Suggestions

Running /context in any Claude Code session now shows not just how many tokens are in use, but actionable suggestions identifying the biggest optimisation opportunities in your current context window — for example, large tool results that could be summarised, or conversation branches that are no longer relevant. Use it before a long coding task to ensure you're not burning context on stale scaffolding.

2. /effort — On-the-Fly Effort Control

The new /effort slash command lets you adjust Claude's effort level mid-session without restarting. Higher effort means deeper reasoning (and more tokens); lower effort is faster and cheaper for mechanical tasks like simple edits or reformatting. This maps to the API's effort parameter (which replaced budget_tokens for Opus 4.6).

/effort high    # deep reasoning for architecture decisions
/effort low     # fast mode for search-and-replace style edits

3. worktree.sparsePaths — Sparse Worktree Checkouts

In large monorepos, running Claude Code in an isolated git worktree previously checked out the entire repo — slow and token-expensive. The new worktree.sparsePaths setting in ~/.claude/settings.json lets you specify which directories to include, giving Claude a lean, relevant slice of the repository.

// ~/.claude/settings.json
{
  "worktree": {
    "sparsePaths": ["src/api", "src/shared", "tests/integration"]
  }
}

Tip Combine worktree.sparsePaths with the /context command at session start. Check optimisation suggestions, trim irrelevant files from context, then use /effort high only for the reasoning-intensive parts of your task.

claude-code context-optimisation effort worktree productivity

Custom Subagents & Agent Teams in Claude Code

Claude Code now supports custom subagents — specialised Claude instances defined in your project's .claude/agents/ directory — and agent teams where multiple parallel sessions coordinate through a shared task list. These two features let you decompose complex, long-horizon work in a way that keeps each agent's context window clean and focused.

Custom Subagents

A subagent is a Markdown file in .claude/agents/ with a YAML front-matter block. Claude can delegate to it automatically (based on context) or on demand via /agent <name>. Each subagent runs in its own isolated context window, so tool results and file reads don't pollute your main session.

# .claude/agents/test-runner.md
---
name: test-runner
description: Runs the test suite and reports failures with root-cause analysis
model: claude-haiku-4-5         # fast, cheap for mechanical work
allowedTools: [Bash, Read, Glob]
---

You are a test specialist. When invoked, run `npm test`, capture failures,
read the relevant source files, and return a structured failure report.
Do not fix the code — only diagnose.

Agent Teams

For large parallel workloads, assemble a team: one lead agent breaks the problem into tasks, spawns worker agents to handle them concurrently, and synthesises the results. Workers communicate through a shared task list (created via TaskCreate / TaskUpdate) and message each other through the lead's context.

Tip Store subagent definitions in a shared repo so the whole team inherits the same specialised helpers. Treat the description field as documentation — Claude reads it to decide when to delegate automatically.

claude-code subagents agent-teams multi-agent best-practices