01/08 2025 407
In 2024, the global technology sector witnessed numerous breakthroughs, with advancements in artificial intelligence (AI), quantum computing, clean energy, and biotechnology profoundly transforming our lives. As we look ahead to 2025, technological development will further accelerate, aligning more closely with social, economic, and environmental needs. The upcoming year will mark a crucial juncture not only for technological innovation but also for the transition from groundbreaking discoveries to mature applications.
Among these advancements, the AI industry stands at the forefront of global attention. So, what trends can we expect in the AI industry in 2025?
01. Possible Trend 1: The Surge of Agents
The first notable trend is the proliferation of agents as an application form. In 2024, agents began to attract attention from major technology companies, and by 2025, they may witness significant development.
At the 2025 ICT Industry Trends Annual Conference, Wu Hequan, academician of the Chinese Academy of Engineering, stated that 2025 will be a pivotal year not only for agents but also for AI terminals.
Sam Altman, CEO of OpenAI, recently remarked, "We will continue to develop better models, but I believe the next major breakthrough will emerge from agents."
The rationale behind this surge is that while basic large models, equipped with vast knowledge bases, can provide responses and operations based on questions and tasks, they still have limitations. Specifically, large models, functioning as assistants, think swiftly and broadly but lack depth and precision. The quality of their responses depends on the clarity of the question or task description. In terms of empathy, large models lack experimental or clinical practice in fields like engineering or medicine, making it difficult for them to comprehend real-world scenarios. For specific tasks, the comprehensiveness of large models is often overqualified and inefficient.
AI agents, on the other hand, are AI-driven software tools capable of performing multi-step tasks with minimal supervision. Beyond natural language processing, AI agents can make decisions, solve problems, and interact with the environment during task execution. These agents are small programs that accept natural language commands, interact with scenarios, and possess preliminary thinking chains. They can divide tasks, exhibit memory, planning, tool invocation, and action execution abilities. By closing the loop of long-term thinking in action, agents can transform large models' knowledge into long-term memory or even comprehension, enabling them to perform specific tasks independently.
By endowing large models with "a priori" world knowledge, AI is learning to perceive, retrieve, analyze, reason, plan, decide, and execute, transforming into agents that can work, accompany, and integrate into human scenarios. In 2025, some companies will develop agents akin to training employees, allowing them to use tools, invoke functions and features across different applications and platforms, and assist or complete tasks independently. Agents will also collaborate with each other, rewriting software and services. AI agents are the creators of value.
A report by Capgemini, an information technology services and consulting company, reveals that while only about 10% of companies currently use AI agents, 82% plan to integrate them into their workflows within the next three years.
02. Possible Trend 2: Growth in the Multimodal Market
The second notable trend is the growth of the multimodal market. Multimodality is the natural state of the human world, and the development of AGI (Artificial General Intelligence) is inevitably moving towards this direction. Technology will facilitate the transition from text, images, and videos to sound, light, electricity, and even molecules, atoms, and other modalities, enabling cross-modal migration.
As general AI approaches, large models are shifting towards multimodality. Currently, these models are moving to the edge, offering advantages such as higher local data processing efficiency, reduced cloud server bandwidth and computing costs, enhanced user data privacy protection, and enabling new interactive ways and experiences, potentially becoming a new entry point for future interactions.
Meanwhile, AI is making significant strides in mathematical reasoning, new drug development, material discovery, protein synthesis, and other fields, with "AI scientists" expected to arrive sooner rather than later. The accelerated integration of digital interaction engines and technologies like GenAI will create more super digital scenarios in the future, pushing digital-physical integration to new heights.
Therefore, in 2025, multimodal AI may become the primary driver for enterprises to adopt AI. This technology integrates multiple data sources like images, videos, audio, and text, enabling AI to learn from a broader range of contextual sources with unprecedented accuracy. It provides more precise and customized outputs, creating natural and intuitive experiences.
According to relevant Google reports, the global multimodal AI market size is expected to reach $2.4 billion by 2025 and $98.9 billion by 2037.
03. Possible Trend 3: Enhanced Accessibility and Convenience
The third notable trend is enhanced accessibility and convenience. Large models are inherently application-oriented technologies with two primary development trajectories: increasing capability and decreasing cost, leading to rapid implementation and application of technological capabilities.
Throughout human technological development, the goal has been to create better products at lower prices, ensuring widespread access. For example, under Moore's Law, the density of transistors on chips has increased rapidly, while the manufacturing cost per transistor has fallen even faster, making televisions, computers, mobile phones, and the internet affordable for everyone.
With capital expenditures often amounting to tens of billions, large models, as infrastructure, need to demonstrate their scale effects promptly and clarify their commercialization paths as early as possible.
Regarding the democratization of AI, major domestic companies have already taken action. For instance, the price of the Doubao visual understanding model under Volcano Engine is 0.03 cents per 1,000 tokens of input. It is reported that 1 yuan can process 284 720P images, which is 85% cheaper than industry prices.
On December 31, Alibaba Cloud announced the third round of price reductions for large models this year, with the Tongyi Qianwen visual understanding model experiencing price reductions of over 80% across the board. Specifically, Qwen-VL-Plus saw a direct price reduction of 81%, with an input price of only 0.0015 yuan per 1,000 tokens, setting a new all-time low. The higher-performance Qwen-VL-Max was reduced to 0.003 yuan per 1,000 tokens, representing a decrease of up to 85%. At these latest prices, 1 yuan can process up to about 600 720P images or 1,700 480P images.
04. Conclusion: The AI Industry Embarks on a New Starting Point
As 2025 approaches, the AI industry stands at a new starting point. The burgeoning development of agents, the rapid growth of the multimodal market, and the democratization of technology collectively paint a picture of a world that is more intelligent, interconnected, and personalized. These trends not only herald technological innovation but will also profoundly impact our work styles, living habits, and even social structures.
The rise of agents showcases the transformation of AI from passive response to active execution. They will become capable assistants in our daily work, enhancing efficiency and demonstrating capabilities that surpass humans in certain fields. The development of multimodal AI will enable large models to better understand and respond to complex queries, providing richer and more intuitive interactive experiences. The democratization of technology means that more people and enterprises can afford and utilize AI technology, significantly promoting overall social progress and innovation.
- End -