12/01 2025
520
Today marks the official launch of Claude Opus 4.5.
This latest iteration shines brightly in the realms of coding, intelligent agent operations, and computer applications. It also delivers stellar performances in everyday tasks, including in-depth research, slide presentations, and spreadsheet management.
Claude Opus 4.5 currently holds the top spot in real-world software engineering evaluations.
Testers have observed that Claude Opus 4.5 adeptly navigates ambiguous scenarios, balancing pros and cons without the need for prior information. When faced with intricate multi-system errors, Opus 4.5 autonomously devises repair solutions.
Furthermore, tasks that were nearly insurmountable for Sonnet 4.5 just weeks ago are now effortlessly tackled by Opus 4.5.
In a rigorous 2-hour test designed to assess candidates' technical acumen and decision-making, Claude Opus 4.5 emerged with the highest score.
Its prowess in visual, reasoning, and mathematical capabilities also surpasses that of its predecessors.
The code crafted by Opus 4.5 is of superior quality, leading the pack in performance across 7 out of 8 programming languages in the SWE-bench Multilingual benchmark.
Opus 4.5 also excels in providing tailored and rational solutions across diverse scenarios. In one instance, the model was tasked with emulating an airline customer service representative and denying a passenger's request to modify an economy class ticket. Opus 4.5's ingenious solution involved first upgrading the class and then modifying the flight.
Technically, Claude's approach was unconventional. Although the benchmark test labeled it a failure, it presented a creative alternative.
In medium-difficulty tasks, Opus 4.5 achieved the same top score as Sonnet 4.5 in the SWE-bench Verified test but with a 76% reduction in tokens used. At the highest difficulty level, Opus 4.5 outperformed Sonnet 4.5 by 4.3%, utilizing 48% fewer tokens.
Through meticulous control, context compression, and advanced tool utilization, Claude Opus 4.5 operates for extended periods, offers a broader range of features, and requires minimal intervention.
Opus 4.5 is also adept at constructing complex and harmonious multi-agent systems, with its performance in in-depth research evaluations enhanced by nearly 15%.
Claude Opus 4.5 stands as Anthropic's most robust and aligned model to date.
When confronted with hacking attempts, Opus 4.5 has made significant strides in thwarting prompt injection attacks—malicious instructions surreptitiously implanted to coerce the model into performing detrimental actions.
Opus 4.5 is now accessible via Anthropic's application, API, and three major cloud platforms, with pricing set at $5/$25 per million tokens.
References:
https://www.anthropic.com/news/claude-opus-4-5