12/04 2024 504
Content/Ah Wen
Editor/Singing Goose
Proofreader/Brute
Agents are on the eve of a major outbreak.
Recently, at the Agent OpenDay, Zhipu AI showcased its latest achievements in AI Agents, releasing three types of agents that use AI to replace humans in tasks: AutoGLM for mobile phones, GLM PC for computers, and GLM-Web capabilities for web pages.
Prior to this, Microsoft also announced that it had established the world's largest enterprise-level AI Agent ecosystem. Coupled with the accidental leak of Google's Jarvis and OpenAI's upcoming launch of Operator... it seems that it won't be long before AI Agents take over our lives and work.
So what exactly is an AI Agent?
Conceptually, an AI Agent is an AI system that does not require continuous human intervention. It can independently analyze various problems based on environmental and contextual information, make logical decisions, and handle multiple tasks without continuous human input. For example, AlphaGo is a typical AI Agent that can independently decide the next move in a game of Go against a human opponent based on the current board position and the opponent's actions.
AI technology has undergone a long and complex evolution from basic models to the rise of the Agent concept, to today's software that can independently reason and perform specific tasks.
Reinforcement learning is one of the key technologies for the development of AI Agents, and the development of large language models (LLMs) has provided new possibilities for AI Agents. As the core brain of AI Agents, LLMs can disassemble complex problems and enable human-like natural language interaction.
AI Agents represent a new stage of AI technology moving towards more intelligent and autonomous interaction. Instead of simply executing instructions, they can independently plan, decide, and execute tasks based on complex situations and goals, like human assistants. Imagine if you're hungry, you only need to give an AI assistant the instruction to "order takeout," and it can automatically complete all actions from searching, inquiring, ordering, and confirming the takeout.
This is not only a manifestation of improved efficiency but also a brand-new human-computer interaction mode that can bring people and machines closer together. Last December, Bill Gates predicted that Agents will not only change how people interact with computers but also disrupt the software industry; Li Yanhong also believes that Agents will be the websites of the AI era, with millions or even more Agents emerging to form a vast ecosystem.
Part.1
Evolution of AI Agents:
From Simple Dialogue to Personal Assistants
The concept of Agents is not a product of the third wave of artificial intelligence but rather the result of the continuous evolution of the concept of 'intelligent entities' that have accompanied AI.
In 1966, Joseph Weizenbaum from MIT's Artificial Intelligence Laboratory developed ELIZA, the first chatbot in history. Named after the protagonist in George Bernard Shaw's play 'Pygmalion,' ELIZA had only 200 lines of code and a limited dialogue library, capable of responding to keywords in questions.
ELIZA was not intelligent in any sense. It operated based on rules, neither understanding the content of the other party nor knowing what it was saying. But even so, it pioneered human-computer dialogue and can be considered the ancestor of question-and-answer interaction tools like Siri and Xiaoai Classmate.
Entering the 21st century, with the continuous development of technology, the development of AI Agents entered a stable period. The rise of machine learning technology provided impetus for the enhancement of AI Agents' intelligence, and breakthroughs in deep learning technology brought revolutionary progress to the development of AI Agents, leading to significant breakthroughs in image recognition, speech recognition, and natural language processing.
Currently, artificial intelligence is widely used in various fields such as healthcare, education, transportation, and finance, where AI Agents have improved work efficiency.
2011 was a pivotal year. Firstly, IBM Watson defeated human contestants in the quiz show Jeopardy!, demonstrating the immense potential of AI. Secondly, Apple launched Siri that year, ushering in a new era of mobile intelligent assistants. In 2014, Microsoft launched the AI chatbot 'Xiaoice' in China, showcasing the potential of AI in emotional computing and social interaction for the first time.
Strictly speaking, however, the true advent of AI Agents began in November 2022 when OpenAI released ChatGPT, sparking a global AI craze. On March 14, 2023, the multimodal large model GPT-4 was released, supporting image input with significantly enhanced understanding and generation capabilities, pioneering autonomous AI Agents. The advent of ChatGPT marked a shift from "talking" to "doing," capable of autonomously executing complex tasks beyond mere dialogue.
The rapid development of AI Agents is inseparable from breakthroughs in key technologies, such as advancements in deep learning and neural networks, large-scale pre-trained language models, reinforcement learning with human feedback, multimodal interaction capabilities, tool usage, and environmental adaptability.
It is reported that the number of Agents is exploding, with the total number of Agents in China exceeding 10 million in one year, 85 times the number of apps launched annually in the Apple App Store. Bill Gates considers Agent creation platforms to be the next generation of application development platforms after Android, iOS, and Windows.
Part.2
Major Companies Enter the Fray
AI Agents Flood the Terminal Market
Agents may become the next breakthrough point after PCs and mobile devices. Li Yanhong has publicly stated that the value of basic models can only be realized through applications. Agents are almost universally applicable based on large models.
Data shows that from January to October 2024, the top five AI-native apps (Apps) in China in terms of cumulative downloads were Doubao, Wenxiaoyan (original Wenxin Yiyan), Kimi, Xingye, and Tiangong AI, with cumulative downloads of 108 million, 22.6 million, 21 million, 17.9 million, and 11.7 million, respectively.
Baidu was the first to launch a large model native application in China, Wenxin Yiyan, on March 16, 2023, just three months after OpenAI released ChatGPT. Subsequently, domestic internet giants and emerging startups have successively launched domestic AI large model applications, such as Alibaba's Tongyi Qianwen large model in April 2023, iFLYTEK's Xinghuo large model in May 2023, and Zhipu AI in September 2023. The up-and-coming Kimi intelligent assistant followed suit in October 2023, just 10 months after the release of ChatGPT.
On June 25, 2024, OpenAI announced that China is not on the list of 188 countries and regions currently supported by its API services, meaning ChatGPT will terminate its services in China. For domestic AI players and Agent developers, this is undoubtedly a huge opportunity.
It is reported that among internet giants, Baidu, Alibaba, Tencent, ByteDance, Huawei, and others have all deployed in the Agent sector and launched one-stop Agent development platforms. For a time, various Agent development platforms such as ByteDance's Kouzi, Tencent Cloud's Tencent Yuanti, Baidu Intelligent Cloud's Qianfan AgentBuilder, Alibaba Cloud's Damo Platform, and iFLYTEK's Xinghuo Agent Platform flourished.
Compared to the computational power, data, and talent resources of major companies, which can collaboratively develop model-side, application-side, and middleware Agent development platforms in a closed loop, other large-scale vendors have varying focuses in their large model technology paths but all have a certain foundation for large model research and development.
In April 2023, SenseTime Technology launched a large model named 'Ririxin SenseNova,' integrating various powerful functions such as natural language processing, content generation, automated data annotation, and custom model training.
Another example is Baichuan Intelligence, founded by Sogou's founder Wang Xiaochuan, which has developed at an astonishing speed since its inception in 2023. In just over half a year, Baichuan Intelligence has successively released four open-source and commercially free large models, Baichuan-7B/13B and Baichuan2-7B/13B, as well as two closed-source large models, Baichuan-53B and Baichuan2-53B, with a new large model being released every 28 days on average.
Since its establishment in 2019, Zhipu AI has been deeply engaged in large model research and development, leveraging the strong technical support of Tsinghua University's Knowledge Engineering Group (KEG). In 2023, it launched the chat dialogue application 'Zhipu Qingyan' for C-end users.
Unlike internet giants, startups like Zhipu AI and Baichuan Intelligence primarily focus on AI application assistant products and have not yet launched Agent development platforms.
Notably, in 2024, more and more smartphone manufacturers have begun to frequently mention the concept of mobile Agents. Recently, vivo launched a mobile Agent named PhoneGPT at its 2024 Developer Conference. As a multimodal assistant that can actively complete tasks based on user intent within mobile phones, it can accurately identify content on the mobile screen interface, automatically operate various applications on the phone, and complete tasks assigned by users, such as making calls and sending text messages.
As early as September this year, Honor released a cross-application open ecosystem Agent at the 2024 IFA Berlin. Earlier, in June, Huawei also announced 'Harmony Intelligence' during its Developer Conference, upgrading Xiaoyi to a system-level Agent.
Bill Gates predicted that Agents will be the next platform after large models, and an increasing number of large model companies and technology enterprises are beginning to deploy Agents. In the competitive equipment race for large models, if the first half focused on basic capabilities, the application and implementation of AI Agents have now become the most important form of product competition.
Part.3
On the Eve of the Outbreak:
Can Universal Agents Become a Reality?
The capabilities of AI Agents will continue to improve. Firstly, they can decompose tasks and formulate execution plans; secondly, they can call APIs, access networks, and operate software; thirdly, they possess the ability for continuous learning and knowledge accumulation; and finally, they can make autonomous decisions in complex environments.
Agents can not only converse but also possess reflective and planning abilities. If the user's feedback indicates an incorrect result, it will reflect on where the problem lies and can independently plan the assigned tasks, considering what tools to use to achieve the ultimate goal.
Therefore, it can be predicted that as technology matures and application scenarios become clearer, the capability dimensions of AI Agents will gradually improve, leading to better user experiences. This will also be highly valuable for enterprise-level Agent applications. In the future, enterprise-level AI Agents may usher in a period of rapid growth, with various industries beginning to adopt customized Agent solutions on a large scale.
Furthermore, future AI systems will no longer be single Agents operating independently but will instead see multi-Agent collaboration replacing single-Agent systems. This involves a collaborative network composed of multiple specialized Agents that can divide labor and cooperate to complete complex tasks together.
A research report by Galaxy Securities points out that the rise of AI Agents is reshaping the AI industry chain and bringing new investment opportunities. It is estimated that by 2028, the size of the AI Agent market in China will surge to 852 billion yuan, with a compound annual growth rate of 72.7%. The AI Agent industry chain is a diversified and highly synergistic ecosystem with vast future market potential. AI Agents are driving the gradual shift of the App ecosystem towards the end-side ecosystem, becoming a new trend in AI application development. One of the pain points of traditional end-side AI is the inability to invoke the operating interface through user commands to achieve user goals. The AI Agent model addresses this pain point by enabling natural language interaction with hardware.
While AI Agents bring much imagination, there are still many challenges in their actual implementation. For instance, reliability, performance, and cost remain significant issues. As is well-known, LLMs are prone to hallucinations and inconsistencies, and connecting multiple AI steps can exacerbate these problems, especially for tasks requiring precise output. Additionally, GPT-4, Gemini-1.5, and Claude Opus perform well in tool/function calls but are still relatively slow and costly, especially when loops and automatic retries are required.
Amidst this trend, every enterprise and individual wants to seize the opportunity, but ensuring user data security and privacy in Agents has always been one of the most discussed issues in the industry.
Once Agents are deployed, they are bound to access customers' core data. Data leaks can cause significant losses to users and society. Establishing user trust is difficult in sensitive actions involving payments or personal information such as billing, passwords, and shopping. However, data is indispensable as a "means of production" for model training.
Currently, China is at a critical juncture in the rapid development of AI Agent applications, and it is normal to face challenges. Who can quickly adapt to and leverage Agent technology will directly impact whether the enterprise can stand out in market competition in the future. Will the popularity of AI Agents usher in a golden age even more bustling than the Internet era? We will wait and see.