Agnes AI Releases Three Core Multimodal Models: Text, Image, and Video

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

05/27 2026 863

Recently, Agnes AI (https://agnes-ai.com) has seen its text model Agnes-2.0-Flash, image model Agnes-Image-2.0-Flash, and video model Agnes-Video-2.0 consecutively rank highly in multiple international evaluation benchmarks. At a stage where the industry is re-evaluating "real AI capabilities," the company is attempting to enter the market in another way: not just by emphasizing model parameters, but by simultaneously advancing benchmark capabilities, low-cost APIs, and real-world Agent scenario implementation.

Among them, the text model Agnes-2.0-Flash has ranked highly in the Agent evaluation system Claw-Eval. Unlike traditional mathematical, knowledge-based Q&A, or code-based benchmarks, Claw-Eval focuses more on a model's execution capabilities in real-world Agent scenarios, including tool invocation, multi-step planning, complex task decomposition, and automation workflow completion rates. This type of evaluation is regarded by many developers as the benchmark closest to real AI Agent capabilities.

Meanwhile, the image model Agnes-Image-2.0-Flash has secured a leading position in the Artificial Analysis Image Editing Leaderboard. The unique feature of this benchmark is its use of a blind evaluation mechanism with real users, where participants are unaware of the model source behind the images and make subjective choices and ratings based solely on the final generated quality. Therefore, compared to traditional automated scoring systems, such benchmarks are generally considered closer to real user experiences.

In addition to text and image models, the video model Agnes-Video-2.0 has also entered the Artificial Analysis Image-to-Video (With Audio) benchmark.

From the current industry landscape, few AI Labs can simultaneously cover text, image, and video modalities while consistently ranking highly in international evaluations. Agnes's strategy is gradually becoming clear: establishing recognition through multimodal model capabilities, then expanding usage scale through low-cost APIs and a developer ecosystem.

Beyond benchmarks, Agnes's currently announced API pricing is even more noteworthy.

According to official information, the input price for Agnes-2.0-Flash is $0.03/1M Tokens, with an output price of $0.15/1M Tokens. This pricing is significantly lower than many mainstream models on the market (just 0.6% of Claude Opus 4.6's price).

Over the past year, more and more developers have discovered that token consumption does not primarily come from simple chatting but from Agent workflows. Especially in scenarios involving Browser Agents, Coding Agents with multi-tool invocations, and long-chain task execution, a single task often generates substantial context, search requests, and tool invocations. As model capabilities improve, token costs have also begun to rise rapidly.

Many AI startups now face not just the question of "whether the model is powerful enough" but "whether the product can still afford long-term invocation costs."

Agnes's direction this time is clear: further lowering the barrier for developers to use AI.

In addition to the text model, the image model Agnes-Image-2.0-Flash also comes at a very low price, currently officially priced at $3/1000 images. In scenarios like e-commerce image generation, marketing materials, multi-version ad images, and bulk image editing, this pricing already supports large-scale deployment.

The video model Agnes-Video-V2.0 is priced at $0.30/minute, also significantly lower than the current industry average. Against the backdrop of AI video generation still generally facing high inference costs, this pricing has begun to enable large-scale content production and commercial deployment.

Meanwhile, Agnes's promotional focus has also noticeably shifted from "model benchmarking" to "real-world cases."

Developers have already begun building game Agents, web operation Agents, and automated workflow systems based on Agnes. Examples include task execution, UI operations, and multi-step strategy execution in gaming scenarios, as well as web navigation, form filling, information organization, and auto-searching in Browser Agent scenarios.

This case is based on the Agnes Harness architecture

and implemented with the self-developed text model Agnes-2.0-Flash

These scenarios demand more from models than just answering questions; they require more stable contextual memory, more accurate tool invocation, and stronger long-chain execution capabilities.

In image editing, Agnes-Image-2.0-Flash already supports completing complex editing tasks directly through natural language, including background replacement, style transfer, multi-image fusion, font modification, and product image editing.

Extract the female character and place her in a new background

High-density infographic

This approach is essentially transforming traditional complex image editing workflows into a unified interaction method of "natural language + image."

Currently, all three models are officially available on the website (https://agnes-ai.com), and developers can directly invoke APIs through the Agnes AI Platform.

A growing issue in the current industry is that while model capabilities continue to improve, the cost of using AI is also rising simultaneously. Especially in scenarios like Codex, Agent Workflows, multi-Agent systems, and Browser Use, large-scale deployment has become the core cost pressure for many teams.

Therefore, more and more AI companies are beginning to re-compete on "inference costs" and "developer ecosystems."

From the currently disclosed path, Agnes's strategy already shares certain similarities with the growth routes of some leading model companies over the past year: first establishing model recognition through international benchmarks, then rapidly expanding developer usage through low-priced APIs, and ultimately driving Agent and multimodal ecosystem deployment.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links