12/03 2024 357
As technology matures and application scenarios become clearer, the capabilities of AI Agents are gradually improving, leading to better user experiences. This will also be valuable for enterprise-level Agent applications.
Content/Awen
Editor/Yong'e
Proofreader/Mangfu
Agents are on the eve of a great explosion.
Recently, at the Agent OpenDay, AISpectrum showcased its latest achievements in AI Agents, unveiling three Agents designed to replace humans in tasks: AutoGLM for mobile phones, GLM PC for computers, and GLM-Web capabilities for web pages.
Earlier, Microsoft also announced the establishment of the world's largest enterprise-level AI Agent ecosystem. Coupled with the accidental leak of Google's Jarvis and OpenAI's upcoming Operator... It seems that it won't be long before AI Agents take over our lives and work.
So what exactly is an AI Agent?
Conceptually, an AI Agent is an AI system that does not require continuous human intervention. It can independently analyze various problems based on environmental and contextual information, make logical decisions, and handle multiple tasks without continuous human input. For example, AlphaGo is a typical AI Agent that can make autonomous decisions based on the current game situation and opponent's moves during a game of Go.
AI technology has undergone a long and complex technical evolution, from basic models to the rise of Agent concepts, to software that can now independently reason and perform specific tasks.
Reinforcement learning is one of the key technologies driving the development of AI Agents, and the development of Large Language Models (LLMs) has provided new possibilities for AI Agents. As the core brain of AI Agents, LLMs can dismantle complex problems and enable human-like natural language interactions.
AI Agents represent a new stage of AI technology towards more intelligent and autonomous interactions. Instead of simply executing commands, they can autonomously plan, decide, and execute tasks based on complex situations and goals, like human assistants. Imagine if you're hungry and just need to give an AI assistant the command to "order takeout," it can automatically complete all actions from searching, inquiring, ordering, to confirming the delivery.
This is not just about improving efficiency but also a new mode of human-computer interaction that can bring machines and humans closer. Last December, Bill Gates predicted that Agents will not only change how people interact with computers but also disrupt the software industry. Robin Li also believes that Agents are the websites of the AI era, and millions, or even more, Agents will emerge to form a vast ecosystem.
Part.1
Evolution of AI Agents:
From Simple Conversations to Personal Assistants
The concept of Agents is not a product of the third wave of artificial intelligence but rather the result of the continuous evolution of the "intelligent entity" concept accompanying AI.
In 1966, Joseph Weizenbaum from MIT's Artificial Intelligence Laboratory developed ELIZA, the first chatbot in history. Named after the protagonist in George Bernard Shaw's play Pygmalion, ELIZA had only 200 lines of code and a limited dialogue library, capable of responding to keywords in questions.
ELIZA had no intelligence; it operated based on rules, neither understanding the content nor knowing what it was saying. However, it pioneered human-computer dialogue and can be considered the ancestor of modern question-and-answer interaction tools like Siri and XiaoAi Classmate.
Entering the 21st century, with the continuous development of technology, the development of AI Agents entered a stable period. The rise of machine learning technology provided impetus for the intelligence enhancement of AI Agents, and breakthroughs in deep learning technology brought revolutionary progress to the development of AI Agents, leading to significant breakthroughs in image recognition, speech recognition, and natural language processing.
Currently, artificial intelligence is widely used in various fields such as healthcare, education, transportation, and finance, where AI Agents have improved work efficiency.
2011 was a pivotal year. Firstly, IBM Watson defeated human contestants on the quiz show Jeopardy!, demonstrating the great potential of AI. Secondly, Apple introduced Siri that year, ushering in a new era of mobile intelligent assistants. In 2014, Microsoft launched the AI chatbot "Xiaoice" in China, showcasing the potential of AI in emotional computing and social interaction for the first time.
Strictly speaking, the true advent of AI Agents began in November 2022 when OpenAI released ChatGPT, sparking a global AI craze. On March 14, 2023, the multimodal large model GPT-4 was released, supporting image input with significantly enhanced understanding and generation capabilities, pioneering autonomous AI Agents. The advent of ChatGPT marked a shift from "talking" to "doing," capable of autonomously executing complex tasks beyond mere dialogue.
The rapid development of AI Agents is inseparable from breakthroughs in key technologies, such as the development of deep learning and neural networks, large-scale pre-trained language models, reinforcement learning with human feedback, multimodal interaction capabilities, tool usage, and environmental adaptability.
It is reported that the number of Agents has exploded, with over 10 million Agents in China in one year, 85 times the number of apps launched annually on the Apple App Store. Agent creation platforms are also considered by Bill Gates to be the next generation of application development platforms after Android, iOS, and Windows.
Part.2
Major Tech Companies Enter the Fray
AI Agents Flood the Terminal Market
Agents may become the next explosion point after PCs and mobile terminals. Robin Li has publicly stated that the value of basic models lies in their applications. Agents are almost universally applicable based on large models.
Data shows that from January to October 2024, the top five downloaded AI-native apps in China were Doubao, Wenxiaoyan (Ernie Bot), Kimi, Xingye, and Tiangong AI, with cumulative downloads of 108 million, 22.6 million, 21 million, 17.9 million, and 11.7 million, respectively.
Baidu was the first in China to launch a large model native app, Wenxiaoyan, on March 16, 2023, just three months after OpenAI released ChatGPT. Subsequently, domestic internet giants and startups successively launched their AI large model applications, such as Alibaba Cloud's Tongyi Qianwen in April 2023, iFLYTEK's iFLYTEK Spark in May 2023, AISpectrum in September 2023, and the up-and-coming Kimi intelligent assistant in October 2023, just 10 months after ChatGPT's release.
On June 25, 2024, OpenAI announced that China was not on the list of 188 countries and regions currently supported by its API services, meaning ChatGPT would cease operations in China. For domestic AI players and Agent developers, this is undoubtedly a significant opportunity.
Major internet companies like Baidu, Alibaba, Tencent, ByteDance, and Huawei have all deployed in the Agent sector and launched one-stop Agent development platforms. Platforms like ByteDance's Kouzi, Tencent Cloud's Tencent Element, Baidu Intelligent Cloud's Qianfan AgentBuilder, Alibaba Cloud's ModelScope, and iFLYTEK's Spark Agent platform are thriving.
Compared to the computational power, data, and talent resources of major companies, which enable coordinated closed-loop development across model, application, and middleware Agent development platforms, other large-scale vendors have varying focuses in their large model technology paths but all have a foundation in large model research and development.
In April 2023, SenseTime Technology launched a large model called "SenseNova," integrating various powerful functions such as natural language processing, content generation, automated data annotation, and custom model training.
Another example is Baichuan Intelligence, founded by Wang Xiaochuan, the creator of Sogou. Since its inception in 2023, it has developed at an astonishing speed. In just over half a year, Baichuan Intelligence has released four open-source, commercially free large models (Baichuan-7B/13B and Baichuan2-7B/13B) and two closed-source large models (Baichuan-53B and Baichuan2-53B), with a new large model released every 28 days on average.
Since its establishment in 2019, AISpectrum has been deeply involved in large model research and development, supported by the powerful technology of Tsinghua University's Knowledge Engineering Group (KEG). In 2023, it launched the chat application "AISpectrum Qingyan" for C-end users.
Unlike internet giants, startups like AISpectrum and Baichuan Intelligence focus more on AI application assistant products and have not yet launched Agent development platforms.
Notably, in 2024, more and more mobile phone manufacturers have begun to frequently mention the concept of mobile phone Agents. Recently, vivo launched PhoneGPT at its 2024 Developer Conference, a multimodal assistant that can actively complete tasks based on user intent on mobile phones. It can accurately recognize content on the phone screen, automatically operate various applications, and complete tasks assigned by users, such as making phone calls and sending text messages.
As early as September this year, Honor unveiled a cross-application open ecosystem Agent at the 2024 IFA Berlin. Earlier, in June, Huawei also announced "Harmony Intelligence" during its Developer Conference, upgrading Xiaoyi to a system-level Agent.
Bill Gates predicted that Agents will be the next platform after large models, and more and more large model companies and technology enterprises are beginning to deploy Agents. In the competition for large models, if the first half focused on basic capabilities, the application and implementation of AI Agents have now become the most important form of product competition.
Part.3
On the Eve of the Explosion:
Can Universal Agents Become a Reality?
The capabilities of AI Agents will continue to improve. Firstly, they can decompose tasks and formulate execution plans. Secondly, they can call APIs, access networks, and operate software. Thirdly, they possess continuous learning and knowledge accumulation abilities. Finally, they can make autonomous decisions in complex environments.
Agents can not only converse but also reflect and plan. If the user's feedback results are incorrect, it will think about where the problem lies and can autonomously plan assigned tasks, considering which tools to use to achieve the final goal.
Therefore, it can be predicted that as technology matures and application scenarios become clearer, the capabilities of AI Agents will gradually improve, leading to better user experiences. This will also be valuable for enterprise-level Agent applications. In the future, enterprise-level AI Agents may experience rapid growth, with various industries beginning to adopt customized Agent solutions on a large scale.
Furthermore, future AI systems will no longer be single Agents operating independently but rather multi-Agent collaborations replacing single-Agent systems. A collaborative network composed of multiple specialized Agents can divide and conquer tasks, working together to complete complex tasks.
A research report by China Galaxy Securities points out that the rise of AI Agents is reshaping the AI industry chain and bringing new investment opportunities. It is estimated that by 2028, the size of China's AI Agent market will surge to 852 billion yuan, with a compound annual growth rate of 72.7%. The AI Agent industry chain is a diversified and highly coordinated ecosystem with vast future market potential. AI Agents are driving the transition of the App ecosystem towards the terminal-side ecosystem, becoming a new trend in AI application development. One of the pain points of traditional terminal-side AI is the inability to invoke the operating interface through user commands to achieve user goals. The AI Agent model addresses this pain point by enabling natural language interaction with hardware.
While AI Agents offer much potential, there are still many challenges in their actual deployment. For example, reliability, performance, and cost remain significant issues. As is well-known, LLMs are prone to hallucinations and inconsistencies, and connecting multiple AI steps can exacerbate these issues, especially for tasks requiring precise output. Additionally, while GPT-4, Gemini-1.5, and Claude Opus perform well in using tools/function calls, they are still slow and costly, especially when loops and automatic retries are required.
At the forefront of this trend, every company and individual wants to seize the opportunity, but ensuring user data security and privacy with Agents has always been one of the most discussed issues in the industry.
Once Agents are deployed, they will inevitably access customers' core data. If data leakage occurs, it may cause significant losses to users and society. Establishing user trust is difficult in sensitive actions involving payments or personal information such as billing, passwords, and shopping. However, data is indispensable "means of production" for model training.
Currently, China is at a critical juncture in the rapid development of AI Agent applications, and it is normal to face challenges. Those who can quickly adapt to and leverage Agent technology will directly influence their ability to stand out in market competition in the future. Will the popularity of AI Agents usher in a golden era even more explosive than the internet era? We'll have to wait and see.