✦ 1 Million Token Context Window — Now Generally Available
As of 12 March 2026, the 1 million token context window is generally
available for both Claude Opus 4.6 and Claude Sonnet 4.6 —
no beta header required. This removes the single biggest practical barrier to processing
entire codebases, lengthy documents, or long conversation histories in a single pass.
What fits in 1M tokens?
- ~750 000 words of plain text — roughly 10 full-length novels
- A medium-to-large codebase (e.g. all source files in a typical SaaS monorepo)
- Hours of meeting transcripts, call logs, or support tickets
- Large PDF documents such as legal contracts or technical specifications
Practical tips for large contexts
- Place the most critical instructions at the very start and
very end of the prompt — Claude weighs these positions most heavily
in long contexts.
- Combine with prompt caching (see entry below) to avoid re-processing
the same large corpus on every turn.
- For structured retrieval over huge corpora, still prefer embedding-based search —
not every use case needs to dump everything into context.
Tip Measure latency before committing to 1M-token prompts in production.
Time-to-first-token grows with context size; for latency-sensitive apps consider chunked
retrieval or caching strategies first.
context-window
Opus-4.6
Sonnet-4.6
GA
✦ Automatic Prompt Caching
Anthropic has launched automatic prompt caching for the Messages API.
Instead of manually placing cache_control breakpoints throughout your prompt,
you now add a single cache_control field to your request and Claude
automatically moves the cache point forward as the conversation grows. This makes
multi-turn conversations with large system prompts or tool definitions dramatically
cheaper and faster — without any extra engineering effort.
How it works
- Cached tokens are stored server-side for up to 5 minutes by default
(configurable via the
ttl parameter).
- Cache hits are billed at a reduced rate — typically
90 % cheaper than full input token pricing.
- The cache auto-advances: each turn only pays for the new tokens added
since the last cache point.
Minimal API usage
// Python — enable automatic caching
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
system=[{
"type": "text",
"text": very_long_system_prompt,
"cache_control": {"type": "ephemeral"} # ← one field
}],
messages=conversation_history
)
Tip Prompt caching pays off most when you have a large, stable prefix
(system prompt, tool definitions, RAG context) that is reused across many turns or
many users. Even a 10 000-token system prompt cached across 100 calls saves significant
cost.
prompt-caching
API
cost-optimisation
latency
✦ Claude Code: Plan Before You Code
One of the highest-leverage Claude Code best practices in 2026 is explicitly separating
the planning phase from the coding phase. Asking Claude
to produce a detailed implementation plan — and explicitly telling it not to
write any code yet — dramatically reduces wasted edits and mid-task course corrections.
The two-phase workflow
- Phase 1 — Plan: Ask Claude to think through the task, list files
it will touch, describe the approach, and flag risks. Use the phrase
"think hard" or "think carefully" to allocate more reasoning
budget. End with "Do not write any code yet. Just show me the plan."
- Phase 2 — Execute: Review the plan, correct misunderstandings, then
say "Go ahead and implement the plan." Claude now codes with a shared,
reviewed understanding of the goal.
Combining with Claude Code's built-in Plan Mode
Claude Code ships a dedicated Plan Mode (toggle with
Shift+Tab or the /plan command). In Plan Mode the model
is permitted to read files and think, but all write/edit tools
are disabled. This gives you a safe sandbox to review the full plan before any
filesystem changes are made.
Key insight Most costly Claude Code mistakes happen when coding
starts before the requirements are fully understood. A 2-minute planning step
routinely saves 20+ minutes of reverting incorrect edits.
Useful phrases
"Think hard and produce a plan. Do not write code yet."
"List every file you will change and explain why."
"What edge cases should we handle before we start?"
claude-code
plan-mode
workflow
best-practices