01/20 2026
402
Written by | Hao Xin
Edited by | Wu Xianzhi
"The true pivotal moment that will shape the future of enterprises is unfolding now. It's not the distant prospect of Artificial General Intelligence (AGI), but the current rise of intelligent agents."
This bold prediction from Google at the dawn of the year posits 2026 as a watershed for AI Agents.
A significant trend is that AI is transitioning from merely answering queries to comprehending objectives, devising strategies, and executing actions across diverse systems. This shift implies that future agents will transcend conversational AI, evolving into productivity tools capable of tackling complex tasks and delivering tangible results.
In China, the development trajectory of AI Agents can be broadly segmented into two phases. The initial phase centered on conversation and search, marked by a proliferation of AI assistant apps that lacked distinct differentiation. Among the user base, familiar names include representatives from major tech giants such as Doubao from Douyin, Tencent Yuanbao, and Alibaba Qianwen, alongside startups like DeepSeek and Kimi. The essence of this stage was a race for traffic entry points and cultivating user habits.
As we approach the end of 2025 and the dawn of this year, marking the transition into the second phase, the evolution of AI Agents has exhibited notable divergence. Each company has charted distinct value propositions based on their strategic visions and resource allocations.
Doubao delves deeper into entertainment applications, encompassing voice conversations, image, and video generation. Qianwen leverages Alibaba's ecosystem prowess to specialize in lifestyle services, acting as an 'administrative steward.' Kimi, in contrast, emphasizes productivity, fostering the seamless integration of AI and workflows through its proprietary Agent model.
Returning to Google's initial assertion, signs of this evolution are already evident with Doubao, Qianwen, and Kimi. Beneath the surface of differentiated competition lies an emerging consensus: the true value of AI Agents lies in their capacity to solve real-world problems.
Input Shapes Output
To unravel the rationale behind the distinct paths chosen by Doubao, Qianwen, and Kimi, we must revisit a fundamental principle: input shapes the quality of output.
In the era of AI Agents, this principle takes on a new dimension. The output that agents generate is no longer confined to simple user instructions but encompasses a profound understanding of task context, precise invocation of available tools, and reliable planning of multi-step processes.
From an input-output perspective, Doubao's input scenarios are deeply rooted in ByteDance's entertainment and content ecosystem. Its inputs are open-ended, multimodal creative inspirations. Users can provide a snippet of text, an image, a voice clip, or even a vague idea. This uncertainty and entertainment-centric input necessitate a model with robust associative and content generation capabilities.
Doubao tends to confine task boundaries to the realm of creative content generation, prioritizing the process of stimulating creativity over solving a definite problem. Its core metrics are the novelty,趣味性 (interest, translated for context), and shareability of the content.
The delivered results often manifest as a short video script, a whimsical image, or a voiceover, with value derived from inspiring users' secondary creation and social sharing. Popular trends on Douyin, such as 'simulating a fan's live photo' and 'I want to dominate your smooth transition,' originated from Doubao, completing the relay from AI generation to user interaction.
Qianwen constructs an input-output model centered on service scheduling. Backed by Alibaba's mature ecosystem encompassing clothing, food, housing, and transportation, Qianwen's input comprises structured lifestyle service demands. Users typically provide clear instructions like 'book a flight ticket to Shanghai' or 'buy a cup of milk tea.' These inputs inherently encompass factors such as time, location, goods, and services.
Qianwen's input is directionally clear, with agent task boundaries defined within the services accessible through Alibaba's ecosystem. Its core is to translate natural language instructions into precise API calls, with success metrics being service completion rate, efficiency, and user experience.
What you envision is what you receive. Qianwen ultimately delivers completed service results, such as a ticketed order, a food delivery, or an itinerary plan. Qianwen aggregates Alibaba's traffic, with its value lying in replacing traditional APP interactions and becoming a unified intelligent entry point for the service ecosystem. Its future reach hinges on the depth of connection with the ecosystem and interaction with the external world.
Kimi's direction represents a strategic choice made by startups, steering clear of lifestyle entertainment and multimodal generation. The agent focuses more on in-depth research, data analysis, PPT creation, website development, and other productivity-oriented and complex task functions, which often demand long-term planning, intricate tool invocation, and possess high economic value potential.
Following this logic, Kimi Agent's input comprises complex professional workflows. Users predominantly submit industry documents spanning tens of thousands of words, multi-step project requirements, or datasets necessitating analysis. This type of input is characterized by ultra-long context, high information density, and robust logic.
When Kimi expands its task boundaries to encompass workflows requiring deep planning, multi-tool invocation, and long-chain reasoning, its success metrics shift to complete task delivery, professionalism, and efficiency enhancement. Consequently, Kimi delivers directly usable work outcomes, such as a structurally sound industry report or a set of data analysis charts.
Similar to OpenAI's Deep Research approach, its core value lies in directly replacing repetitive, low-creative professional labor.
The Productivity Trajectory of Agents
Not long ago, Doubao garnered attention for unveiling a phone equipped with an AI assistant, while Qianwen also made waves by further integrating with Taobao's ecosystem. Hence, our focus here shifts to Kimi, the representative of startups.
After Zhipu and MimiMax, once dubbed the 'AI Six Little Tigers,' successively went public on the Hong Kong Stock Exchange, the outside world directed its queries to Kimi, almost asking, 'What about you?'
In an internal letter released by Kimi's founder, Yang Zhilin, he responded that they had secured a Series C funding round worth approximately 3.5 billion yuan, with current cash holdings exceeding 10 billion yuan. They are not in a rush to go public in the near term, instead focusing on scaling the K3 model and concentrating on agents in terms of products and commercialization.
Reflecting on the past year, after Kimi pivoted towards foundational models and agent research at the beginning of the year, it achieved significant milestones while maintaining a low profile.
In 2025, Kimi adopted 'Token Efficiency + Long Context' as its core technological pathway, crafting agents with proactive planning and complex task execution capabilities, and surpassing existing intelligence limits through algorithmic and architectural innovations.
Token Efficiency and Long Context are two pivotal technological directions for Kimi. To enhance training efficiency, Kimi verified the value of the second-order optimizer Muon for the first time in the pre-training of ultra-large-scale models, achieving approximately double the token efficiency compared to the traditional Adam optimizer, which has been industry-standard for over a decade. This equates to training a model with higher intelligence levels using the same resources.
Industry experts remarked, 'It's remarkable to witness such substantial progress in a fundamental area like optimizers now.' As one of the biggest advancements in model architecture in 2025, the Muon optimizer has subsequently been adopted by Chinese open-source models, including Zhipu GLM and DeepSeek Engram, fully demonstrating the strength of China's open-source ecosystem.
In terms of expanding context capabilities, Kimi proposed the 'Kimi Linear' architecture based on improved linear attention. For the first time, it surpassed the full-attention Transformer in performance on long-context tasks, achieving a 6–10 times end-to-end speed increase at the million-level context length while maintaining stronger memory and expressive capabilities.
Yang Zhilin mentioned that Kimi's K2 model is 'China's first agent model.'
Through the upgrade of K2 Thinking, Kimi can complete complex tool invocations and aid in solving difficult problems. Kimi K2 can execute complex tasks with over two hundred steps in practice, already assisting users in completing a series of challenging tasks, demonstrating its ability to compete with the world's leading agent models.
Kimi's in-depth research function is tailored for professional users. Without requiring a preamble, users can directly outline their research requirements and visualization needs. Kimi can swiftly grasp user intentions and further confirm and clarify key points, albeit in a somewhat abstract manner. Immediately after, Kimi will automatically invoke browser tools, searching and analyzing while confirming, and then generate a detailed research report and a formatted visualization webpage upon completion.
Based on capabilities such as in-depth research, PPT creation, and data analysis in the general agent mode (OK Computer), Kimi has initiated the commercialization of its agent capabilities, primarily through a subscription model where different tiers of members can utilize varying frequencies of agent capabilities. According to a letter from Kimi's entire team, the number of global paying users has grown at a monthly rate of 170%, taking a challenging first step in a generally free domestic context.
In a recent opening speech by Marc Andreessen, co-founder of the renowned venture capital firm a16z, he specifically mentioned that the Kimi model from China is one of the leading open-source models. From benchmark tests, it has basically replicated the reasoning capabilities of GPT-5. Besides the global 'supernova' DeepSeek, Qwen, ByteDance, Kimi, and others also possess strong competitiveness. Among them, Kimi is the only startup.
The Essence of Intelligence
From Doubao to Kimi, the strategic choices of these three players in the AI Agents landscape not only reflect differences in product functionality but also offer insights into the core value of an agent.
Diverse interpretations shape their future competitive dimensions.
Doubao defines how to leverage agents to process unstructured creative inputs and deliver emotional and interactive value. This necessitates a model with robust multimodal generation and style imitation capabilities. The ecosystem determines where the competitive moat lies; Doubao's ecosystem is a traffic network for content creation and distribution, with its barrier lying in its ability to consistently produce viral content and inspire User-Generated Content (UGC).
Qianwen defines how to utilize agents to comprehend structured business intentions and deliver transactional and efficiency value, requiring its model to possess extremely high intent recognition accuracy and API invocation reliability. Qianwen relies on Alibaba's business operating system, with its barrier lying in the seamless integration depth of services such as payment, logistics, and local life.
Kimi Agent is attempting to define how to employ agents to handle complex professional tasks and deliver productivity and solution value. This requires the model to possess deep logical reasoning, task planning, and long-term memory capabilities. By constructing standards for 'model + tool + workflow' in professional scenarios, Kimi is strengthening its understanding and satisfaction of complex demands in vertical industries, attracting professional users and organizations with strong payment willingness.
Ultimately, Doubao, Qianwen, Kimi, and numerous other companies are defining and quantifying the value of intelligence in diverse forms and productizing it.
In this new era, AI Agents further amplify the value of intelligence.
The initial step is the tokenization of value, where each company breaks down vague intelligence capabilities into standardizable and measurable minimum units. This is akin to assigning the kilowatt-hour unit to electricity, making the consumption and pricing of intelligence feasible and laying the foundation for commercialization.
Next is the circulation of value. Once the value of intelligence is quantified, it can freely combine and circulate within the ecosystem, with agents becoming the trading interface for the value of intelligence. A typical example is Qianwen, where transactional intentions and services circulate, and the value of tokens multiplies across various scenarios such as e-commerce and local life.
Finally, there is value recombination, which is also the deepening from the tool layer to the work and organizational layers mentioned by Google.
If cost-effective intelligence can be accessed on-demand like water and electricity, the underlying logic of enterprises may be rewritten. Companies would not need to hire expert teams but could obtain top-tier capabilities in a field simply by accessing professional vertical agents, thereby breaking through original capability barriers. Innovation may not be confined to internal generation but could also stem from the creative combination of external intelligent services.
As the co-founder of a16z said, we are witnessing a historic encounter between a 'hyper-deflationary' unit cost of intelligence and a 'hyper-inflationary' demand for intelligent applications.
AI Agents happen to be the key to unlocking intelligent value and influencing its flow.