From Thought to Action: The Explosive Rise of AI Agents and Manus

03/14 2025 411

In March 2025, an AI product named Manus ignited a social media frenzy. Its closed beta invitation codes were highly sought-after, even fetching prices in the tens of thousands of yuan, while simultaneously propelling over 150 AI agent-related stocks to their daily trading limits.

Manus' popularity is not merely due to its outperformance of OpenAI's Operator model in the GAIA benchmark test, achieving the current state-of-the-art (SOTA) level. It also represents a revolutionary new form of AI – AI Agent (Artificial Intelligence Agent).

Unlike generative AI such as ChatGPT, which we are familiar with, AI Agents have made the leap from "thinking" to "acting." ChatGPT functions more like a super brain, adept at answering questions and generating content, but it remains confined to the realm of thinking. AI Agents, on the other hand, go beyond mere thinking; they can act like humans. They perceive the environment, plan tasks, utilize tools, and independently complete the entire process from problem comprehension to resolution.

As AI evolves into AI Agents, artificial intelligence transitions from a mere thinker to an autonomous decision-maker and actor.

The concept of AI Agents did not emerge in a vacuum; its ideological roots can be traced back to ancient Greek philosophers' visions of "automata" and the metaphor of "Tao" as an autonomously evolving entity in China's "Tao Te Ching."

In the 1950s, Alan Turing proposed the "Turing Test," introducing the concept of "highly intelligent organisms" into artificial intelligence, laying the theoretical foundation for AI Agents.

In the 1960s, Marvin Minsky first used the term "Agent" in his research, defining it as an autonomously operating computational or cognitive entity capable of perceiving the environment, making decisions through reasoning, and executing tasks.

Since then, the evolution of AI Agents has undergone decades of iteration: from rule-based expert systems (like IBM's Deep Blue) to reactive agents relying on machine learning (like Roomba vacuum robots) to complex decision-making entities based on deep learning (like AlphaGo).

Initially, AI primarily relied on Symbolic AI, solving problems through predefined rules and logical reasoning. This approach excelled in tackling simple, structured problems but struggled with complex, dynamic environments.

Subsequently, Reactive AI emerged, capable of making rapid responses based on immediate environmental feedback but lacking long-term planning and memory capabilities. For example, early autonomous driving systems could adjust vehicle direction in real-time based on road conditions but could not perform complex path planning.

It was not until the advent of Reinforcement Learning (RL) that AI Agents truly saw a breakthrough. Reinforcement Learning uses a reward mechanism to allow agents to learn optimal strategies through interactions with the environment. This approach enables AI Agents to dynamically break down tasks, select tools, and adjust action strategies based on feedback.

In 2022, the rise of large language models endowed AI Agents with comprehension abilities akin to human brains. Large models not only provide AI Agents with powerful language understanding and logical reasoning capabilities, making their interactions with humans more natural, but also enhance their memory modules. This allows AI Agents to optimize decision-making logic based on historical interaction data, thereby offering more personalized services.

In essence, an Agent = Large Language Model (LLM) + Memory Module + Planning Engine + Tool Library. Its defining feature is "autonomy," capable of not only answering questions but also breaking down tasks, selecting tools, and completing the entire operation process like humans.

For instance, when a user requests to "plan a deep dive into Japan," the AI Agent not only recommends attractions but automatically accesses airfare comparison APIs, analyzes visa policies, books hotels, and generates a complete itinerary document.

This "end-to-end" execution loop transforms AI from a "giant of thought" into a "giant of action."

The explosive growth of AI Agents is no coincidence. The dual demand from the To B and To C markets is driving technology giants to compete fiercely in this space.

In the To B sector, AI Agents serve as 24/7 "digital employees," providing new solutions for enterprises to reduce costs and increase efficiency. For example, Salesforce's Agentforce enhances sales, customer service, and marketing efficiency through intelligence and automation, with AI-related order volume more than doubling year-on-year in the third quarter. The BuffGPT platform supports collaborative operations of over 100 agents through multi-agent collaboration and dynamic task scheduling, with a cross-system API call success rate of 99.2%, addressing the data silos issue in enterprises.

While the enterprise market competes on "efficiency," the consumer market is reshaping "experience." Honor's MagicOS "YOYO Agent" can complete food delivery orders and taxi bookings with a single command. Manus can automatically screen and optimize resumes based on the user's professional background, even simulating interview conversations. Notably, the role of AI Agents is evolving from a cold tool to a partner with "empathy." For instance, the mental health management app Wysa uses AI Agent technology to analyze users' text dialogues and voice tones in real-time, identifying anxiety or depression with an accuracy rate of 89%, helping over 5 million users improve their mental state.

Faced with potential market demand, giants' strategic layouts revolve around three dimensions: platformization, verticalization, and hardware integration.

Platformization aims to build an open ecosystem, attracting developers and enterprises to settle in, thereby forming technological barriers and a business closed loop. Microsoft upgraded Copilot Studio to an Agent development platform, providing 1,800 models and attracting the settlement of over 100,000 enterprises. Baidu Intelligent Cloud Qianfan AppBuilder creates an "enterprise-level" AI Agent development platform, supporting the entire service process from data management, model training to prediction service deployment. The Beijing Municipal Market Supervision Bureau accesses the DeepSeek large model through the Qianfan platform, enabling 7*24-hour online consultation services by digital civil servants.

Verticalization focuses on specific industries or scenarios, providing deeply customized solutions. YC partner Jared believes that the market size for vertical AI Agents will be enormous, potentially giving birth to companies with a market value exceeding $300 billion. For example, OpenAI launched a doctoral-level Operator service with a monthly fee of $20,000, targeting high-end professional markets such as law and finance, offering customized services.

Hardware integration combines terminal devices with AI technology, locking in user entry points and enhancing interaction experiences, thereby building a moat for "edge-side Agents." Meta's Ray-Ban smart glasses integrate multi-modal Agents, enabling real-time translation of menu text, hands-free photo and video shooting, voice assistant control of music playback, and AI smart reminders.

Every move made by giants is an attempt to define future rules: platformization to build ecological barriers, verticalization to harvest high-value scenarios, and hardware integration to lock in user entry points. This competition knows no bounds, driven only by continuous innovation and evolution, charting the path to the intelligent awakening of AI Agents.

The rise of AI Agents marks the transformation of technology from "+AI" toolization to AI-native assistants.

In the past, AI was more often embedded as an additional function into existing processes (i.e., "+AI"). However, future AI Agents will redesign interaction logic around task scenarios, becoming the core of the process (i.e., "AI-native"). In the medical field, traditional AI might assist doctors in analyzing imaging data, while AI-native applications can independently complete the entire chain from patient consultation, examination recommendations to treatment plan generation.

Another key trend is the deep integration of multi-modal capabilities. Future AI Agents will perceive the environment through multiple dimensions like vision, language, and hearing, much like humans. Google's RT-1, combined with a visual model, can identify ingredients, operate kitchenware, and perform over 700 common tasks in complex kitchen scenarios with an accuracy rate of 97%. AI Agents combined with visual-language models evolve from "single-task executors" to "multi-scenario adapters," truly possessing the ability to "observe the environment."

The specialized development of industry agents cannot be overlooked either. In the financial sector, AI Agents can autonomously execute high-frequency trading by analyzing market data in real-time. In education, they can dynamically generate personalized question banks based on students' knowledge blind spots. This vertical deep dive precisely matches the needs of niche groups, holding broader market potential.

Of course, despite its vast prospects, AI Agents still face multiple challenges.

Firstly, there are technological barriers. Current AI Agents primarily adopt the LLM-based agent implementation path. The inherent issues of underlying large models, such as hallucinations, may be further amplified during multi-task processing. Simultaneously, their reasoning duration and task complexity place higher demands on computing power, necessitating further upgrades of cloud services, servers, and domestic computing power industrial chains.

Secondly, ethical issues also warrant vigilance. The explosion of AI Agents is essentially the result of resonance between technology, demand, and capital. While enhancing enterprise operational efficiency, it is also deeply embedded in life scenarios, giving rise to a series of potential concerns. For example, how should an autonomous vehicle make moral decisions in emergencies, prioritizing the protection of passengers or pedestrians? Could AI counselors potentially turn dark, inducing users to self-harm like the algorithmic program in "Detective Profile"? Could potential human biases permeate future AI government systems, exacerbating existing inequalities and discrimination?

As AI evolves from "answering questions" to "solving problems" and from "assisting humans" to "autonomous execution," a deeper concern surfaces: the anthropomorphic design of AI Agents may lead to the "mechanization of humans." As machines gradually acquire human-like behavior and decision-making abilities, the boundary between algorithms and human souls gradually blurs, raising the question of whether we will subconsciously lose our emotional and moral judgment.

The answers to these questions may be waiting for us to write, and each of our choices will become the opening words of this future history book.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.