AI TL;DR

Anthropic releases Claude Opus 4.6 with revolutionary agent teams feature, 1M token context window, and state-of-the-art performance on coding and reasoning benchmarks. Here's everything you need to know.

Claude Opus 4.6 Review: Agent Teams, 1M Context, and Industry-Leading Performance

On February 5, 2026, Anthropic released Claude Opus 4.6—the most significant upgrade to their flagship model since Opus 4.5 launched in November 2025. This isn't just an incremental improvement. Opus 4.6 introduces agent teams, a 1-million token context window, and performance that outpaces GPT-5.2 across multiple benchmarks.

What's New in Claude Opus 4.6

Agent Teams: AI Collaboration at Scale

The headline feature is Agent Teams—a research preview that lets you spin up multiple Claude agents working in parallel as a coordinated team.

Agent Teams Architecture:
├── Main orchestrating agent
├── Subagent 1: Code review
├── Subagent 2: Documentation
├── Subagent 3: Testing
└── Autonomous coordination between agents

According to Anthropic's Head of Product Scott White:

"Instead of one agent working through tasks sequentially, you can split the work across multiple agents—each owning its piece and coordinating directly with the others."

White compared it to having a talented team of humans working for you, noting that agents "coordinate in parallel [and work] faster."

Best Use Cases for Agent Teams:

Tasks that split into independent, read-heavy work
Codebase reviews across multiple repositories
Large documentation projects
Complex research requiring parallel investigation

You can take over any subagent directly using Shift+Up/Down or tmux integration.

1M Token Context Window (Beta)

Opus 4.6 is the first Opus-class model with a 1-million token context window. This is comparable to what Sonnet 4 and 4.5 offer, but now available in Anthropic's most powerful model.

Context Window Pricing:

Standard (up to 200k tokens): $5/$25 per million input/output tokens
Premium (200k+ tokens): $10/$37.50 per million input/output tokens

Why 1M Context Matters:

Work with larger codebases without splitting
Process massive documents in a single session
Maintain coherence over extremely long conversations

128K Output Tokens

Opus 4.6 supports outputs of up to 128k tokens—allowing Claude to complete larger tasks without breaking them into multiple requests.

Benchmark Performance: State of the Art

Anthropic has positioned Opus 4.6 as an industry leader across multiple categories:

Knowledge Work (GDPval-AA)

On GDPval-AA—an evaluation of economically valuable knowledge work in finance, legal, and other domains:

Model	Elo Score
Claude Opus 4.6	Highest
GPT-5.2	-144 points
Claude Opus 4.5	-190 points

Opus 4.6 outperforms GPT-5.2 by approximately 144 Elo points, which translates to scoring higher approximately 70% of the time in direct comparisons.

Agentic Coding (Terminal-Bench 2.0)

Opus 4.6 achieves the highest score on Terminal-Bench 2.0, the leading agentic coding evaluation.

Reasoning (Humanity's Last Exam)

On Humanity's Last Exam—a complex multidisciplinary reasoning test—Opus 4.6 leads all other frontier models.

Agentic Search (BrowseComp)

Opus 4.6 outperforms every other model on BrowseComp, which measures ability to locate hard-to-find information online. With a multi-agent harness, scores increased to 86.8%.

Long-Context Performance

One of the most significant improvements is in long-context handling:

MRCR v2 (8-needle 1M variant):

Model	Score
Claude Opus 4.6	76%
Claude Sonnet 4.5	18.5%

This is a 4x improvement in the model's ability to retrieve information "hidden" in vast amounts of text.

Specialized Domain Performance

Harvey Legal (BigLaw Bench): 90.2% score—highest of any Claude model
NBIM Cybersecurity: Best results in 38 out of 40 investigations in blind ranking
Box Multi-Source Analysis: 68% vs 58% baseline (10% lift)

Early Access Partner Testimonials

Anthropic shared feedback from major tech companies using Opus 4.6:

Notion

"Claude Opus 4.6 is the strongest model Anthropic has shipped. It takes complicated requests and actually follows through, breaking them into concrete steps, executing, and producing polished work." — Sarah Sachs, AI Lead

GitHub

"Early testing shows Claude Opus 4.6 delivering on the complex, multi-step coding work developers face every day—especially agentic workflows that demand planning and tool calling." — Mario Rodriguez, Chief Product Officer

Replit

"Claude Opus 4.6 is a huge leap for agentic planning. It breaks complex tasks into independent subtasks, runs tools and subagents in parallel, and identifies blockers with real precision." — Michele Catasta, President

Cursor

"Claude Opus 4.6 is the new frontier on long-running tasks from our internal benchmarks and testing. It's also been highly effective at reviewing code." — Michael Truell, Co-founder & CEO

Cognition (Devin)

"Claude Opus 4.6 reasons through complex problems at a level we haven't seen before. It considers edge cases that other models miss." — Scott Wu, CEO

SentinelOne

"Claude Opus 4.6 handled a multi-million-line codebase migration like a senior engineer. It planned up front, adapted its strategy as it learned, and finished in half the time." — Gregor Stewart, Chief AI Officer

Rakuten

"Claude Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a ~50-person organization across 6 repositories." — Yusuke Kaji, General Manager, AI

New API Features

Adaptive Thinking

Previously, developers only had a binary choice between enabling or disabling extended thinking. Now with adaptive thinking, Claude can decide when deeper reasoning would be helpful.

At the default effort level (high), the model uses extended thinking when useful, but developers can adjust this behavior.

Effort Levels

Four new effort levels give developers control over intelligence, speed, and cost:

Level	Description	Best For
Low	Minimal thinking	Simple queries, high speed
Medium	Balanced	General tasks
High (default)	Extended when useful	Complex reasoning
Max	Maximum reasoning	Hardest problems

Context Compaction (Beta)

Long-running conversations and agentic tasks often hit the context window. Context compaction automatically summarizes and replaces older context when the conversation approaches a configurable threshold.

This lets Claude perform longer tasks without hitting limits—essential for autonomous agent workflows.

US-Only Inference

For workloads requiring US data residency, US-only inference is available at 1.1× token pricing.

Product Updates

Claude in PowerPoint (Research Preview)

Claude now integrates directly into PowerPoint as an accessible side panel. Previously, you had to export presentations from Claude and import them separately. Now presentations can be crafted directly within PowerPoint.

Available for Max, Team, and Enterprise plans.

Claude in Excel Upgrades

Claude in Excel now handles:

Long-running and harder tasks with improved performance
Pre-planning before acting
Ingesting unstructured data and inferring correct structure
Multi-step changes in one pass

Safety and Alignment

These intelligence gains do not come at the cost of safety. According to Anthropic's automated behavioral audit:

Misaligned Behavior Rates:

Claude Opus 4.6: Low rates of deception, sycophancy, user delusion encouragement
Overall alignment: As good as or better than Opus 4.5 (most-aligned frontier model to date)

Over-Refusals: Opus 4.6 shows the lowest rate of over-refusals—where the model fails to answer benign queries—of any recent Claude model.

Enhanced Safety Testing

For Opus 4.6, Anthropic ran their most comprehensive safety evaluations ever:

New evaluations for user wellbeing
More complex tests of dangerous request refusal
Updated evaluations for surreptitious harmful actions
Interpretability methods to understand model behavior

Cybersecurity Safeguards

Since Opus 4.6 shows enhanced cybersecurity abilities, Anthropic developed six new cybersecurity probes to detect harmful responses. They're also using the model for cyberdefense—finding and patching vulnerabilities in open-source software.

Pricing

Pricing remains the same as Opus 4.5:

Type	Price
Input tokens	$5 per million
Output tokens	$25 per million
Premium context (>200k)	$10/$37.50 per million
US-only inference	1.1× multiplier

How to Access

Claude.ai: Available now at claude.ai

API: Use claude-opus-4-6 via the Claude API

Cloud Platforms: Available on Amazon Bedrock and Google Cloud Vertex AI

What This Means for Developers

The Agent Teams Shift

Agent Teams represents a fundamental shift in how AI coding assistants work. Instead of a single agent working sequentially, you now have:

Parallel execution - Multiple agents work simultaneously
Autonomous coordination - Agents communicate without human intervention
Specialization - Each agent can focus on its piece
Scalability - Add more agents for larger tasks

Practical Applications

For Software Teams:

Assign one agent to code review, another to testing, another to documentation
Complete multi-hour tasks in parallel
Handle codebase-wide refactoring across multiple repositories

For Enterprise:

Process massive document sets in single sessions
Run complex analysis with longer coherence
Build agent orchestration systems

The Competitive Landscape

The release came just 15 minutes before OpenAI launched GPT-5.3 Codex—a clear signal that the AI coding war is intensifying.

Opus 4.6 vs GPT-5.2 (per Anthropic benchmarks):

Knowledge work: Opus 4.6 wins by 144 Elo
Coding: Opus 4.6 leads Terminal-Bench 2.0
Search: Opus 4.6 leads BrowseComp
Reasoning: Opus 4.6 leads Humanity's Last Exam

We'll need to wait for independent benchmarks comparing Opus 4.6 to the newly released GPT-5.3 Codex.

The Bottom Line

Claude Opus 4.6 is a substantial upgrade that delivers on Anthropic's promise of "smarter models that work harder, longer, and more autonomously."

Key Takeaways:

Agent Teams enables parallel AI collaboration
1M context window opens new use cases
State-of-the-art on coding, reasoning, and search
Lowest over-refusal rate of any Claude model
Same pricing as Opus 4.5

For developers building agentic applications, Opus 4.6's combination of agent teams, long context, and context compaction creates a compelling platform. For enterprise users, the PowerPoint and Excel integrations make Claude increasingly useful for everyday knowledge work.

The AI coding war just got more interesting.

Have you tried Claude Opus 4.6? Share your experience in the comments.

AI TL;DR

Claude Opus 4.6 Review: Agent Teams, 1M Context, and Industry-Leading Performance

What's New in Claude Opus 4.6

Agent Teams: AI Collaboration at Scale

The headline feature is Agent Teams—a research preview that lets you spin up multiple Claude agents working in parallel as a coordinated team.

Agent Teams Architecture:
├── Main orchestrating agent
├── Subagent 1: Code review
├── Subagent 2: Documentation
├── Subagent 3: Testing
└── Autonomous coordination between agents

According to Anthropic's Head of Product Scott White:

"Instead of one agent working through tasks sequentially, you can split the work across multiple agents—each owning its piece and coordinating directly with the others."

White compared it to having a talented team of humans working for you, noting that agents "coordinate in parallel [and work] faster."

Best Use Cases for Agent Teams:

Tasks that split into independent, read-heavy work
Codebase reviews across multiple repositories
Large documentation projects
Complex research requiring parallel investigation

You can take over any subagent directly using Shift+Up/Down or tmux integration.

1M Token Context Window (Beta)

Opus 4.6 is the first Opus-class model with a 1-million token context window. This is comparable to what Sonnet 4 and 4.5 offer, but now available in Anthropic's most powerful model.

Context Window Pricing:

Standard (up to 200k tokens): $5/$25 per million input/output tokens
Premium (200k+ tokens): $10/$37.50 per million input/output tokens

Why 1M Context Matters:

Work with larger codebases without splitting
Process massive documents in a single session
Maintain coherence over extremely long conversations

128K Output Tokens

Opus 4.6 supports outputs of up to 128k tokens—allowing Claude to complete larger tasks without breaking them into multiple requests.

Benchmark Performance: State of the Art

Anthropic has positioned Opus 4.6 as an industry leader across multiple categories:

Knowledge Work (GDPval-AA)

On GDPval-AA—an evaluation of economically valuable knowledge work in finance, legal, and other domains:

Model	Elo Score
Claude Opus 4.6	Highest
GPT-5.2	-144 points
Claude Opus 4.5	-190 points

Opus 4.6 outperforms GPT-5.2 by approximately 144 Elo points, which translates to scoring higher approximately 70% of the time in direct comparisons.

Agentic Coding (Terminal-Bench 2.0)

Opus 4.6 achieves the highest score on Terminal-Bench 2.0, the leading agentic coding evaluation.

Reasoning (Humanity's Last Exam)

On Humanity's Last Exam—a complex multidisciplinary reasoning test—Opus 4.6 leads all other frontier models.

Agentic Search (BrowseComp)

Opus 4.6 outperforms every other model on BrowseComp, which measures ability to locate hard-to-find information online. With a multi-agent harness, scores increased to 86.8%.

Long-Context Performance

One of the most significant improvements is in long-context handling:

MRCR v2 (8-needle 1M variant):

Model	Score
Claude Opus 4.6	76%
Claude Sonnet 4.5	18.5%

This is a 4x improvement in the model's ability to retrieve information "hidden" in vast amounts of text.

Specialized Domain Performance

Harvey Legal (BigLaw Bench): 90.2% score—highest of any Claude model
NBIM Cybersecurity: Best results in 38 out of 40 investigations in blind ranking
Box Multi-Source Analysis: 68% vs 58% baseline (10% lift)

Early Access Partner Testimonials

Anthropic shared feedback from major tech companies using Opus 4.6:

Notion

"Claude Opus 4.6 is the strongest model Anthropic has shipped. It takes complicated requests and actually follows through, breaking them into concrete steps, executing, and producing polished work." — Sarah Sachs, AI Lead

GitHub

"Early testing shows Claude Opus 4.6 delivering on the complex, multi-step coding work developers face every day—especially agentic workflows that demand planning and tool calling." — Mario Rodriguez, Chief Product Officer

Replit

"Claude Opus 4.6 is a huge leap for agentic planning. It breaks complex tasks into independent subtasks, runs tools and subagents in parallel, and identifies blockers with real precision." — Michele Catasta, President

Cursor

"Claude Opus 4.6 is the new frontier on long-running tasks from our internal benchmarks and testing. It's also been highly effective at reviewing code." — Michael Truell, Co-founder & CEO

Cognition (Devin)

"Claude Opus 4.6 reasons through complex problems at a level we haven't seen before. It considers edge cases that other models miss." — Scott Wu, CEO

SentinelOne

"Claude Opus 4.6 handled a multi-million-line codebase migration like a senior engineer. It planned up front, adapted its strategy as it learned, and finished in half the time." — Gregor Stewart, Chief AI Officer

Rakuten

"Claude Opus 4.6 autonomously closed 13 issues and assigned 12 issues to the right team members in a single day, managing a ~50-person organization across 6 repositories." — Yusuke Kaji, General Manager, AI

New API Features

Adaptive Thinking

Previously, developers only had a binary choice between enabling or disabling extended thinking. Now with adaptive thinking, Claude can decide when deeper reasoning would be helpful.

At the default effort level (high), the model uses extended thinking when useful, but developers can adjust this behavior.

Effort Levels

Four new effort levels give developers control over intelligence, speed, and cost:

Level	Description	Best For
Low	Minimal thinking	Simple queries, high speed
Medium	Balanced	General tasks
High (default)	Extended when useful	Complex reasoning
Max	Maximum reasoning	Hardest problems

Context Compaction (Beta)

This lets Claude perform longer tasks without hitting limits—essential for autonomous agent workflows.

US-Only Inference

For workloads requiring US data residency, US-only inference is available at 1.1× token pricing.

Product Updates

Claude in PowerPoint (Research Preview)

Available for Max, Team, and Enterprise plans.

Claude in Excel Upgrades

Claude in Excel now handles:

Long-running and harder tasks with improved performance
Pre-planning before acting
Ingesting unstructured data and inferring correct structure
Multi-step changes in one pass

Safety and Alignment

These intelligence gains do not come at the cost of safety. According to Anthropic's automated behavioral audit:

Misaligned Behavior Rates:

Claude Opus 4.6: Low rates of deception, sycophancy, user delusion encouragement
Overall alignment: As good as or better than Opus 4.5 (most-aligned frontier model to date)

Over-Refusals: Opus 4.6 shows the lowest rate of over-refusals—where the model fails to answer benign queries—of any recent Claude model.

Enhanced Safety Testing

For Opus 4.6, Anthropic ran their most comprehensive safety evaluations ever:

New evaluations for user wellbeing
More complex tests of dangerous request refusal
Updated evaluations for surreptitious harmful actions
Interpretability methods to understand model behavior

Cybersecurity Safeguards

Pricing

Pricing remains the same as Opus 4.5:

Type	Price
Input tokens	$5 per million
Output tokens	$25 per million
Premium context (>200k)	$10/$37.50 per million
US-only inference	1.1× multiplier

How to Access

Claude.ai: Available now at claude.ai

API: Use claude-opus-4-6 via the Claude API

Cloud Platforms: Available on Amazon Bedrock and Google Cloud Vertex AI

What This Means for Developers

The Agent Teams Shift

Agent Teams represents a fundamental shift in how AI coding assistants work. Instead of a single agent working sequentially, you now have:

Parallel execution - Multiple agents work simultaneously
Autonomous coordination - Agents communicate without human intervention
Specialization - Each agent can focus on its piece
Scalability - Add more agents for larger tasks

Practical Applications

For Software Teams:

Assign one agent to code review, another to testing, another to documentation
Complete multi-hour tasks in parallel
Handle codebase-wide refactoring across multiple repositories

For Enterprise:

Process massive document sets in single sessions
Run complex analysis with longer coherence
Build agent orchestration systems

The Competitive Landscape

The release came just 15 minutes before OpenAI launched GPT-5.3 Codex—a clear signal that the AI coding war is intensifying.

Opus 4.6 vs GPT-5.2 (per Anthropic benchmarks):

Knowledge work: Opus 4.6 wins by 144 Elo
Coding: Opus 4.6 leads Terminal-Bench 2.0
Search: Opus 4.6 leads BrowseComp
Reasoning: Opus 4.6 leads Humanity's Last Exam

We'll need to wait for independent benchmarks comparing Opus 4.6 to the newly released GPT-5.3 Codex.

The Bottom Line

Claude Opus 4.6 is a substantial upgrade that delivers on Anthropic's promise of "smarter models that work harder, longer, and more autonomously."

Key Takeaways:

Agent Teams enables parallel AI collaboration
1M context window opens new use cases
State-of-the-art on coding, reasoning, and search
Lowest over-refusal rate of any Claude model
Same pricing as Opus 4.5

The AI coding war just got more interesting.

Have you tried Claude Opus 4.6? Share your experience in the comments.

Claude Opus 4.6 Review: Agent Teams, 1M Context, and Industry-Leading Performance

AI TL;DR

Claude Opus 4.6 Review: Agent Teams, 1M Context, and Industry-Leading Performance

What's New in Claude Opus 4.6

Agent Teams: AI Collaboration at Scale

1M Token Context Window (Beta)

128K Output Tokens

Benchmark Performance: State of the Art

Knowledge Work (GDPval-AA)

Agentic Coding (Terminal-Bench 2.0)

Reasoning (Humanity's Last Exam)

Agentic Search (BrowseComp)

Long-Context Performance

Specialized Domain Performance

Early Access Partner Testimonials

Notion

GitHub

Replit

Cursor

Cognition (Devin)

SentinelOne

Rakuten

New API Features

Adaptive Thinking

Effort Levels

Context Compaction (Beta)

US-Only Inference

Product Updates

Claude in PowerPoint (Research Preview)

Claude in Excel Upgrades

Safety and Alignment

Enhanced Safety Testing

Cybersecurity Safeguards

Pricing

How to Access

What This Means for Developers

The Agent Teams Shift

Practical Applications

The Competitive Landscape

The Bottom Line

Tags

Claude Opus 4.6 Review: Agent Teams, 1M Context, and Industry-Leading Performance

AI TL;DR

Claude Opus 4.6 Review: Agent Teams, 1M Context, and Industry-Leading Performance

What's New in Claude Opus 4.6

Agent Teams: AI Collaboration at Scale

1M Token Context Window (Beta)

128K Output Tokens

Benchmark Performance: State of the Art

Knowledge Work (GDPval-AA)

Agentic Coding (Terminal-Bench 2.0)

Reasoning (Humanity's Last Exam)

Agentic Search (BrowseComp)

Long-Context Performance

Specialized Domain Performance

Early Access Partner Testimonials

Notion

GitHub

Replit

Cursor

Cognition (Devin)

SentinelOne

Rakuten

New API Features

Adaptive Thinking

Effort Levels

Context Compaction (Beta)

US-Only Inference

Product Updates

Claude in PowerPoint (Research Preview)

Claude in Excel Upgrades

Safety and Alignment

Enhanced Safety Testing

Cybersecurity Safeguards

Pricing

How to Access

What This Means for Developers

The Agent Teams Shift

Practical Applications

The Competitive Landscape