Large Model Wars: Agents Become the Key, Ranking No Longer Matters

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

12/10 2024 609

2025 is expected to be the inaugural year for the deployment of agents, with clients placing greater emphasis on their effectiveness. All parties in the ecosystem are gearing up for this.

The procurement of agents is becoming a market hotspot.

“If in 2023, project bids mainly revolved around intelligent computing centers and model middleware platforms, since the second half of this year, numerous client tenders have shifted entirely towards applications, with increasing specialization.” Wang Zhong, co-founder of Zhongshu Xinke, told Digital Intelligence Frontline that agents are now stepping into the spotlight.

Agents have become the central narrative of interest for clients, large model enterprises, and various service providers. Even if a company can excel in agents, it can bypass larger firms and successfully win bids.

For large model enterprises, the support capabilities of agent technology and toolchains have become basic requirements. Poor performance in this area makes it difficult to compete with other models. Relying solely on ranking advantages is now hardly recognized by clients.

However, the industry has not yet reached a consensus on the definition and specific content covered by agents. Some consider OpenAI's GPTs as agents; others believe that only those capable of invoking tools qualify as agents. Formally speaking, whether it's chatbots or various forms of large model applications, they all broadly fall under the category of agents.

But one point is universally acknowledged in the industry: agents must effectively solve business problems, offering at least a 10- to 20-fold improvement in cost reduction and efficiency enhancement to gain client recognition. As the industry increasingly realizes that the large model paradigm is more challenging to achieve a commercial closed loop, the simple burn-rate model is basically unsustainable.

This industry awareness is not only restructuring software architecture but also prompting large model ecosystem enterprises to reshape their business models. As Dr. Wang Jian said, AI should not be viewed as a tool revolution but rather as a revolutionary tool.

“Clients Demand 10- to 20-Fold Effectiveness”

“You say digital humans for e-commerce live streaming are good, so help me sell things. I'll give you two yuan for every order worth twenty to thirty yuan.” Song Jian, CTO of Zhongke Shenzhi, which develops generative AI avatars, told Digital Intelligence Frontline. In the e-commerce sector, clients have become particularly cautious when purchasing tools this year, preferring to pay based on effectiveness, adopting revenue-sharing or CPS (Cost Per Sale) models. Song predicts that by next year, 100% of their live e-commerce digital human clients will adopt this model.

“There are significant differences between domestic and foreign agents. Overseas, it may still resemble the traditional SaaS model, merely leveraging agents to reconstruct previous software architectures.” Song noted that the domestic situation is more aggressive, with some highly competitive industries, such as e-commerce, not only reconstructing software architectures but also reshaping business models.

This is because clients focus on whether agents can practically solve problems. “Whether it's cost reduction or efficiency enhancement, there must be at least a 10- to 20-fold improvement combined,” Song further explained.

“Clients are now disillusioned with large model technology.” Wang Zhong from Zhongshu Xinke confessed to Digital Intelligence Frontline that these clients have shifted from purely basic procurement to application effectiveness-oriented procurement, demanding that agents create value in cost reduction, efficiency enhancement, or new business expansion.

“Clients don't care how you achieve it; they only look at the results,” said Wang. The current industry procurement model typically involves technology vendors quickly conducting a Proof of Concept (PoC) scenario validation for clients. Although the corpus scope may be small and the interaction form relatively simple, it must demonstrate the ability to complete business logic in the client's scenario and apply the client's exclusive business knowledge. Only then will clients initiate the procurement process. Of course, they are also willing to invest a certain cost in these innovations.

A recent PoC conducted by Zhongshu Xinke was an emergency warning solution assistant for the meteorological department. Xiamen is prone to typhoons annually, and after predictions by the meteorological department, emergency notifications need to be sent to various departments such as the port authority and urban management. These emergency reports originally required four experts and three hours to complete. The PoC demonstration showed that with the assistant, only one expert was needed for one hour, resulting in an approximately 12-fold efficiency improvement.

In another case, what was previously unachievable is now possible. In college student training programs, there are numerous evaluation dimensions. Relying solely on existing programming technology for tens of thousands of students would only allow for general evaluations, making personalized customization difficult. In this case, multiple agents collaborated to achieve innovation in classroom settings. For example, some agents were responsible for transcribing lecture audio, while others analyzed and compared lecturing levels; some generated quizzes, others graded them on-site; and still, others recommended further learning materials based on each student's weaknesses... As a result, multiple evaluation dimensions were added to the student evaluation system.

“Every step is supported by agents,” Wang summarized. With the development of basic model capabilities, agents have become more versatile in capabilities and forms, initially possessing the foundation for collaboration. Wang said they have realized in practice that each agent must play a human social role, engage in social division of labor, and produce outputs, potentially forming collective intelligence. Agents are no longer a single tool; in the future, they will not merely manifest as a single super-agent.

To achieve this state, agents need three main characteristics: the ability to communicate and understand; the ability to reflect and self-plan based on feedback and results; and the ability to interact and collaborate with external capability units, such as business systems, other agents, and tool-level applications.

In the process of agent deployment, the trend of clients shifting from purchasing products to purchasing services is increasingly apparent. Unlike previous information technologies, agents require continuous optimization and adjustment, which is currently challenging for clients to achieve independently. Wang found that for projects worth around 2 million yuan, the proportion of annual service fees has increased from 10% to 15% in traditional information technology projects to 25% to 30%.

In the more competitive e-commerce sector, the changes are even more profound. Song noticed that the iteration speed of e-commerce live streaming digital humans is now measured in days. Once a sales tool transforms into a service provider, the involved processes become extremely complex. For example, close attention must be paid to various platform rules and their adjustments; the previous model of separating R&D, products, and business is no longer viable, necessitating collaborative efforts where technicians must dive into the front lines, daily reviewing data, analyzing operations, and optimizing iterations; when optimizations no longer work, products and client groups must be decisively replaced or adjusted. “The advantage of the CPS model is that it's easier for all parties to reach a consensus.”

The Ecosystem is Fully Mobilized

Although large enterprises are still the primary drivers of agent deployment changes, it is believed that due to agents' emphasis on responding to client needs, the future output sources of industry agents will mainly be two types of players:

One type is service providers with AI-native capabilities. The other type is traditional industry information technology service providers. Deploying agents is not easy, with core skills including business scenario selection, knowledge extraction, agent training, and orchestration. Without understanding business scenarios and the capability boundaries of large models, technical responsiveness, experience, and efficiency will be very low. This requires a two-way effort between AI professionals and business professionals.

Large enterprises will play a role in popularizing the ecosystem during this process. Major companies like Baidu, Alibaba, ByteDance, and Tencent have all launched one-stop agent development platforms. They follow a standardized approach aimed at lowering the barriers to agent deployment and expanding their application breadth and depth. For example, recent products launched by Baidu, Zhipu, and Inspur Cloud have focused on agents' abilities to take over devices like smartphones, enabling some anthropomorphic operations.

“Everyone is currently building an overall agent ecosystem based on their large models,” Yang Wen, a senior analyst at IDC China, told Digital Intelligence Frontline. The difference lies in the varying application scenarios and data accumulation among companies, leading to different priorities and directional choices. Industry observers note significant differences in these enterprises' agent support strategies:

Industry feedback indicates that Baidu has a comprehensive ecological layout, encompassing basic models, APP builder and Agent builder platforms, and corresponding hardware like all-in-one machines that can be directly deployed in client data centers. Baidu also supports joint solution development and customized product adaptation. Industry analysis links this to Baidu CEO Robin Li's call for the industry to focus on applications rather than models.

Some suggest that the Qianfan platform can strengthen the commercial model management beyond Wenxin and support the development of more common internet service plugins.

In November this year, Baidu launched the Agent tool flow. “Agents were particularly popular when they first came out last year, but soon, a bucket of cold water was poured on them as everyone realized how difficult they were to use. Ninety percent of our clients use RAG, and only 10% use Agents,” Zhu Guangxiang, head of the Qianfan AppBuilder product, told Digital Intelligence Frontline. Completing an enterprise-level task may require several steps or more. If the accuracy rate for each step is 95%, the accuracy will decline rapidly with multiple steps. It cannot support long-term thinking and reasoning. By adopting a workflow approach, experts describe the process, making agents much more stable and grounded.

Alibaba's Tongyi Qianwen model has strong capabilities and positive feedback domestically, with its open-source model widely used in the industry. Alibaba primarily supports applications through cloud-based forms with limited private deployment support. This may be related to Alibaba Cloud's strategy of “AI-driven, public cloud priority.” In terms of agent products, Alibaba's Tongyi Lab has newly launched OmniSearch, an adaptive planning multimodal retrieval agent, which can simulate humans by gradually breaking down complex problems for intelligent retrieval planning.

ByteDance is currently vigorously promoting the privatized deployment product Kouzi, the HiAgent enterprise-exclusive AI application innovation platform launched in August this year. Similar to initiating a large model price war earlier this year, ByteDance is again using price wars to capture the market. However, it currently does not come with a model, meaning the ByteDance Doubao model is not privatized; it does not offer applications or hardware, instead attempting to teach clients how to build agent applications based on the HiAgent platform. Its strategy and support conditions need to be further systematized, which is not unrelated to its recently launched large model privatization business line.

Tencent's agent deployment strategy aligns with its “all-in-one” approach of prioritizing empowering its own product matrix in large model deployment, with a greater focus on integrating with its own products that have massive traffic. For example, in September this year, Tencent Yuanqi launched a new feature supporting public account operators in independently creating exclusive agent applications, providing intelligent functions such as companionship, interaction, Q&A, and knowledge exchange within public accounts, aiming to enhance user experience and public and private domain operation efficiency.

Besides large enterprises, the “AI Six Little Tigers” member Zhipu AI has been continuously updating its agent technology. Zhipu's advantage lies in its strong research capabilities. Some developers have reported that certain performance aspects of its models surpass Tongyi Qianwen. However, ecological support still needs to be strengthened.

At the recent Zhipu Agent OpenDay, Zhipu CEO Zhang Peng stated that agents can be seen as the prototype of a general operating system for large models. Theoretically, they can be extended to various smart devices such as smartphones, PCs, and in-vehicle systems, enabling large model-based interconnectivity.

“Future agents will definitely operate across systems, marking a fundamental difference rather than a marginal one,” observed a senior industry insider. The ability to operate across systems and apps has now become a hotly contested area.

China Telecom began developing its agent platform in September this year and recently showcased its Xingchen Agent Application Platform. Relevant personnel from China Telecom told Digital Intelligence Frontline that benchmark projects for government and enterprise clients are currently under development.

Although large enterprises are continuously taking action, the industry expects them to support agents more quickly. “The current pace cannot keep up with extensive client demands. More business logic is not fully reflected, particularly the integration of client-specific knowledge with large model technology, which is also a significant industry concern regarding deployment. This indirectly indicates that it will take some time for large model enterprises to align more closely with client needs.”

When Will Agents Truly Explode?

The general direction for agents is set, but when will they truly explode?

Multiple industry insiders told Digital Intelligence Frontline that this largely depends on the development of model capabilities and market education processes.

“We have a simple judgment: when GPT5 comes out will be an intuitive benchmark timeline,” said Wang Zhong, co-founder of Zhongshu Xinke.

Liu Xiao, the technical lead of Zhipu's AutoGLM, also told Digital Intelligence Frontline that last year, agents could only meet 10% to 20% of user expectations, leading users to be reluctant to buy in. This year, they met 50% to 60% of expectations, causing some users to realize their usefulness. When agents can meet 70% to 80% of user expectations, the application rollout speed will be very fast. He predicts that large model capabilities will reach the required standard in about half a year.

Song Jian, CTO of Zhongke Shenzhi, expressed a different view. He believes that for agents to truly explode, they must truly penetrate into industries, but the explosion of the To B chain “should not reach a particularly large scale next year.” Yang Wen, a senior analyst at IDC China, also told Digital Intelligence Frontline that for the B-end, large-scale applications remain difficult to achieve without completely solving the illusion problem of large models and achieving 100% accuracy. The explosion of agents is expected to take one to one and a half years.

“Currently, agents are still in a market turmoil phase, with some distance to go before becoming the ultimate entry standard,” Yang said.

However, it is undeniable that players on the eve of the agent explosion, all facing this certain direction, are unanimously compensating for model deficiencies and seizing opportunities in advance through various engineering capabilities and a series of technologies.

"We have tested a large number of basic models, and their Function Call ability is relatively accurate when selecting from 10 tools. However, once it exceeds 10, the accuracy drops significantly. Yet, in real-world application scenarios, the number of execution steps often exceeds 10. To better achieve business controllability, we pioneered an agent workflow based on state machines. This ensures that the agent's self-planning and autonomous capabilities remain unaffected while achieving precise business control," said Wang Zhong.

Liu Xiao from Zhipu also revealed that they are using better reinforcement learning strategies to enable agents to perform longer step operations.

The industry is currently also researching workflows. "Because of workflows, we have found that although only one layer has been added, the growth of Agent deployment has been very rapid, potentially reaching 20% soon," said Zhu Guangxiang from Baidu. He predicts that in the next two years, Agents will gradually surpass RAG applications because RAG scenarios are limited to question-and-answer interactions, whereas Agents can realize customer service, marketing, enterprise scheduling, and one-stop platforms, offering a higher ceiling.

Some industry insiders believe that the application of agents can be divided into at least three levels from low to high. Currently, many primary and intermediate applications have emerged in the market, but advanced applications still need further development.

For example, Doubao, Kimi, Wenxiaoyan, etc., are considered the most basic agents. They have simple language interaction and task understanding abilities, understand human speech, and can perform simple operations based on instructions. "Currently, most agents are primary applications of single agents, including many GPTs, capable of simple tasks such as question-and-answer interactions," said Yang Wen from IDC.

Intermediate applications go deeper, deviating from the simple Chatbot form and no longer limited to dialog box mode. Their planning capabilities and complexity are further enhanced. Instead of using simple plugins like online search and weather inquiries, they require specialized plugins and capabilities tailored to specific scenarios to complete more complex tasks.

"For instance, intelligent customer service is a very typical Agent application, potentially seeing a 10- or even 20-fold change. Many phone calls we receive today are actually made by new Agents. You might chat with it for a while and still think it's a real person," said Song Jian. Moreover, compared to traditional customer service, software design has become simpler.

"The third level is what we are striving to achieve now. Besides richer interaction forms and the ability to complete complex tasks, it will add two additional labels," Wang Zhong told Digital Frontline.

The first is the ability to understand the business logic and knowledge framework behind the scenario. "It may not be as versatile but more specialized. If not, it will actively seek support from industry knowledge."

Second, in terms of plugin and tool usage, it can understand existing business systems and use them as part of its capability plugin sources, rather than being limited to custom-made plugins.

Wang Zhong gave an example, stating that they are conducting a pilot research and development project with a shipping company on a port scheduling assistant, which he considers an advanced application.

After a ship docks, it often requires a lot of work, including unloading, transshipment, tank cleaning, crew registration, and supply replenishment. Originally, these tasks were mainly arranged and scheduled manually, then registered into corresponding systems such as the vehicle management system and supply management system for execution by the responsible units. However, customers now hope to use AI agents to assist on-site employees in preliminary information collection, analysis, and business recommendations amid complex information and workflow, making them "digital colleagues" for on-site staff.

Of course, more advanced applications still rely on improvements in model capabilities and engineering capabilities.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links