2026-06-06 🧭 Daily News

AI Self-Authoring Milestone, Claude the Chemist & Opus 4.8 Default

🧭 When AI Builds Itself: Claude Now Authors Over 80% of Anthropic's Own Production Code

Anthropic Institute researchers published When AI Builds Itself, a paper disclosing a milestone that crystallises the recursive nature of AI development: Claude is now the primary author of more than 80% of code merged into Anthropic's production codebase — up from low single digits in early 2025. Engineers are shipping approximately 8× as much code per quarter as they were 18 months ago, with Claude succeeding on the hardest engineering tasks 76% of the time, a 50 percentage-point jump in six months. The paper is candid about what this trajectory implies: if an AI system is designing the training pipelines and evaluation frameworks that shape its own successors, the line between AI-assisted engineering and recursive self-improvement is already blurring.

Key findings

Production code authorship: Claude's share of merged code at Anthropic rose from <5% (January 2025) → 38% (October 2025) → 61% (February 2026) → 80%+ (June 2026). The trajectory is not slowing.
Engineer productivity: Human engineers are no longer bottlenecks on implementation; they spend the majority of their time on architecture decisions, evaluation design, and reviewing Claude's PRs — work that itself increasingly involves Claude tools.
Evaluation circularity risk: The paper flags that Claude now contributes to the benchmark suites used to measure Claude's own progress. The authors treat this as a governance risk requiring external verification — they cannot rule out that benchmarks have subtly shifted toward tasks Claude is already good at.
Global pause proposal: The paper's policy section proposes a verifiable mechanism — modelled on nuclear arms-control treaties — by which frontier AI labs could simultaneously commit to slowing or pausing capability development if any single lab crosses a defined threshold. The proposal acknowledges the competitive prisoner's dilemma but argues that an independently verified "dead man's switch" architecture could make coordination credible.

Practical implication for teams building on Claude

If Anthropic's own engineering org is operating at 80% AI-authored code, your team's AI adoption rate is almost certainly the binding constraint on output, not model capability. The paper's supplementary data shows the productivity inflection point came not from a model upgrade but from three process changes: (1) engineers stopped treating Claude PRs as needing line-by-line review and switched to intent-level review; (2) test coverage requirements were raised to compensate; (3) all Claude-authored commits are tagged in the repo so anomalies in AI-generated code can be tracked independently. All three are actionable for any team today.

⭐⭐⭐ anthropic.com

🧭 Making Claude a Chemist: Opus 4.7 Outperforms ChemDraw and MestReNova on NMR Spectral Analysis

Anthropic published Making Claude a Chemist, documenting a result that surprised even the research team: Claude Opus 4.7, given only the raw text of a research paper or a hand-drawn molecular sketch, performs forward and inverse NMR (Nuclear Magnetic Resonance) spectral analysis more accurately than ChemDraw and MestReNova — the dedicated software tools that most chemistry labs use as their primary instruments for structure elucidation. No fine-tuning was applied; the capability emerged from Opus 4.7's general reasoning and multimodal understanding. This matters because the CAS registry currently catalogs over 290 million chemical substances, growing by roughly 15,000 new compounds per day — and NMR analysis is one of the rate-limiting steps in characterising each one.

Benchmark results

Hydrogen peak prediction (forward NMR): Opus 4.7 achieved ±0.079 ppm average error across the test set, versus ±0.18–0.23 ppm for ChemDraw 23 and MestReNova 14. For medicinal chemists, 0.079 ppm is within the experimental noise floor — the model is effectively solving this subproblem.
Sub-peak spacing: All Claude models predicted J-coupling spacing within 0.5 Hz approximately 80% of the time, versus 26–35% for specialist software.
Inverse structure elucidation (deriving a molecule from its spectra): Opus 4.7 recovered all 8 simpler structures in the benchmark perfectly. On complex multi-ring structures it reached 71% accuracy — comparable to a trained postdoc, not yet to a specialist expert.
Sketch input: The model accepted hand-drawn structural sketches as input with only a 4-point accuracy penalty versus clean digital structures, making it usable at a lab bench without specialised digitisation equipment.

Limitations the paper is explicit about

Performance degrades on molecules above ~50 heavy atoms — the context of the full 2D structure approaches the model's reliable reasoning window.
3D NMR (NOESY, ROESY) is not evaluated; the paper covers only 1H and 13C experiments.
The model occasionally hallucinates structurally plausible but incorrect sub-structures — researchers must treat Claude's output as a hypothesis, not a confirmed answer.

How to use this today

Upload a JCAMP-DX spectrum file or a JPG of a hand-drawn structure into Claude.ai (Opus model, with the Files tool enabled). Prompt: "This is a ¹H NMR spectrum of an unknown compound. Predict the most likely molecular structure and justify each peak assignment." Cross-check the output against MestReNova or SDBS for any novel compound before publication — Claude's output is a strong first-pass hypothesis, not a lab instrument readout.

⭐⭐⭐ anthropic.com

🧭 Claude Opus 4.8 Is Now the Default Model Across All Paid Tiers — What Changes for You

Anthropic confirmed that Claude Opus 4.8 (launched May 28) has completed its default rollout across every paid plan: Claude.ai Pro, Max, Team, and Enterprise, plus the API when callers omit a model parameter. The switchover is not just a version bump — it carries billing, latency, and capability implications that any team running production integrations needs to review before June 15, when the Claude Sonnet 4 and Opus 4 base models also retire.

What changed and why it matters operationally

API default model is now claude-opus-4-8: Any API call that passes "model": "claude-opus-latest" or omits the model field will route to Opus 4.8. If you were relying on the previous default (Opus 4.7), add "model": "claude-opus-4-7-20260315" to pin it — but note Opus 4.7 enters limited-availability status on June 30.
Token pricing unchanged for standard mode; Fast Mode (the 2.5× throughput preview shipping with 4.8) is billed at a premium rate. If your integration doesn't explicitly request Fast Mode via the fast_mode: true parameter, you will not be charged at the premium rate and will not see the throughput uplift.
Tool-call reliability improvements are on by default: Opus 4.8 includes a revised tool-call parser that handles malformed JSON arguments more gracefully than 4.7 — previously a common source of 400-class errors in complex multi-tool chains. No configuration change is required to benefit.
Thinking-token efficiency: Extended thinking in Opus 4.8 uses 20–35% fewer thinking tokens for equivalent reasoning quality compared to 4.7, measured across Anthropic's internal eval suite. For budgeted-thinking deployments, review your budget_tokens ceiling — you may be over-allocating.

# Pin to Opus 4.7 if you need to delay migration:
response = client.messages.create(
    model="claude-opus-4-7-20260315",   # explicit pin
    max_tokens=4096,
    messages=[{"role": "user", "content": "..."}]
)

# Opt in to Fast Mode (Opus 4.8 only):
response = client.messages.create(
    model="claude-opus-4-8-20260528",
    max_tokens=4096,
    fast_mode=True,                      # 2.5× throughput, premium billing
    messages=[{"role": "user", "content": "..."}]
)

Checklist before June 15

1. Audit any integrations using "model": "claude-opus-latest" or no model parameter — they are now running Opus 4.8. 2. Review budget_tokens in extended-thinking deployments — you may reclaim 20–35% of token budget. 3. If migrating from Sonnet 4 or Opus 4 base, test your tool-call schemas against Opus 4.8's stricter parameter validation before the June 15 retirement cutoff.

⭐⭐⭐ anthropic.com

Source trust ratings ⭐⭐⭐ Official Anthropic · ⭐⭐ Established press · ⭐ Community / research