Tech giants are going crazy for 'Agents', have the 'big guys' finally sprouted on large models?

09/02 2024 430

AI is shifting from technology to scenarios.

In the past few days, there have been new updates on the highly anticipated new project from OpenAI.

The Information quoted internal sources and reported that OpenAI plans to launch a new AI codenamed "Strawberry" as early as this fall, which boasts unprecedented "reasoning" capabilities, enabling it to handle complex mathematical and programming tasks, and even non-technical issues in daily life.

Furthermore, the report emphasized the significance of this technology for future AI products, particularly 'Agents' designed to tackle multi-step tasks.

Agents again.

Following the explosion of popularity of ChatGPT at the end of 2022, 'Agents' quickly emerged from obscurity, attracting widespread attention across the industry. From open-source projects like AutoGPT to GPTs and GPT Stores officially launched by OpenAI, these early iterations have, to some extent, showcased the potential and necessity of AI Agents.

However, if we consider that in 2023, the development and competition in the AI industry were largely focused on large models themselves, with the exploration of Agents just taking its first steps, then in 2024, from Google to Baidu, Alibaba, ByteDance, and OpenAI, both domestic and international players have significantly accelerated the pace of Agent deployment.

Everyone's talking about 'Agents,' but what exactly are they?

If you follow the AI field regularly, I'm sure you've seen or heard the term 'Agent' quite often. But what exactly is an Agent? It might be difficult to articulate clearly.

In fact, in his 1995 book "The Road Ahead," Microsoft founder Bill Gates mentioned the concept of 'Agents.' However, over the past three decades, the definition of 'Agents' has undergone significant changes, particularly post-ChatGPT, with the emergence of Agents based on large models.

Image/OpenAI

Even today, there is no universally accepted definition of 'Agents' in academia. However, it is generally believed that an Agent is an intelligent entity capable of autonomously perceiving its environment, making plans, and executing tasks. It's not a "co-pilot" but the "main driver."

This can also be expressed using a straightforward formula:

Agent = LLM (Large Language Model) + Planning + Feedback + Tool Use

Taking the example of a human writing an article with the assistance of ChatGPT, to ensure the quality of the article, we typically start by selecting a topic, then let AI assist in generating an outline. The AI's search capabilities are utilized for analysis and research, followed by generating a first draft. Finally, continuous feedback is provided to optimize the content and arrive at the final version.

Building upon large models, AI Agents further reduce manual intervention through autonomous planning, feedback, and tool utilization capabilities. Specifically, AI Agents can independently employ tools such as information search, reading comprehension, and numerical calculations, and plan multi-step tasks including outlining, researching, drafting, and feedback-driven optimization, achieving the effect of "one command from humans, endless AI work."

In short, AI Agents work in an iterative and conversational mode, evolving beyond simple instruction executors to become self-reflective, planning, and correcting participants.

Agents: The 'big guys' that have grown on large models

"Agents will not only change the way each individual interacts with computers. They will also disrupt the software industry, ushering in the biggest computing revolution since we transitioned from typing commands to clicking icons."

Last November, Bill Gates published a blog post titled "AI Will Radically Change How We Use Computers" on his personal website, asserting that AI Agents would revolutionize the way people interact with computers over the next five years.

Bill Gates is not alone in his bullish outlook on AI Agents.

Andrew Ng's speech, Image/YouTube

In March this year, Andrew Ng, a professor at Stanford University, noted that the Agent workflows they developed based on GPT-3.5 outperformed GPT-4 in applications, and those built on GPT-4 showed even better results. He further predicted that AI Agents would drive significant AI advancements this year (2024):

Potentially even surpassing the impact of the next generation of foundational models.

At the World Artificial Intelligence Conference held in July, Agents based on large models emerged as the absolute focus. In his speech, Jack Ma, Chairman and CEO of Ant Group, stated that from a practical standpoint, professional Agents are an effective path for large models to land in rigorous industries. Robin Li, Founder of Baidu, went even further, explicitly stating that Agents are Baidu's most promising direction for AI application development.

Meanwhile, Google also introduced Oscar, an AI Agent platform, enabling developers to generate various AI Agents with minimal configuration. However, Google is not alone in this space:

Baidu Wenxin has AgentBuilder, ByteDance has Coze and HiAgent, Alibaba has Bailian Agent and DingTalk Agent, and Tencent WeChat has Cloud Development AI Agent. Agent platforms are gradually becoming a "standard" offering for large model vendors. Baidu has even coined the slogan "Everyone is a developer."

Image/Coze

After a year of exploration and contemplation, AI Agents have emerged as the new consensus in the AI industry in 2024.

Agents shift AI from technology to scenarios

At the end of last year, OpenAI launched GPTs and GPT Store, allowing users to create their own versions of GPT without coding. However, GPTs still play the role of a "co-pilot," offering more customization options but lacking the ability to break down tasks and execute them step-by-step.

In reality, many so-called 'Agents' today are more akin to chatbots than true Agents. On the other hand, DingTalk's AI Assistant (known as AI Agent in English) comes closer to the essence of an Agent.

Determining whether an entity qualifies as an Agent is not difficult. The core lies in the degree of human intervention during task execution and the extent to which large models participate in planning and decision-making. This assessment helps distinguish between genuine Agents and conventional AI chatbots.

Image/DingTalk

It must be noted that current Agents, transitioning from "co-pilot" to "main driver," still have significant room for improvement in technology and have yet to deliver a game-changing experience. Nevertheless, the emphasis is on the future. For AI to penetrate deeper into and transform our lives, it must possess greater autonomy beyond chat-based interactions.

Ideally, AI Agents should be capable of making intelligent decisions and plans based on various conditions. For instance, when planning a trip, an Agent could autonomously search for transportation, accommodations, and various travel information, taking into account the user's historical preferences and habits, and iteratively refining the plan.

Another scenario could involve an AI Agent anticipating the user's arrival home after a tiring day at work, based on the car or phone's location, and autonomously turning on the air conditioning, robot vacuum cleaner, and lights at appropriate times.

As envisioned by Gates, in the future, we won't need to switch between different apps for different tasks. Instead, we'll tell our computers and phones what we want to do in everyday language, and based on the data we're willing to share, Agents will respond personally.

Closing Thoughts

Essentially, Agents draw inspiration from human thinking to construct more specialized reasoning and decision-making capabilities on top of AI, thereby delivering a more intelligent user experience. To some extent, AI Agents represent a step forward from ChatGPT.

However, it's understandable that a single Agent cannot satisfy the diverse needs of countless individuals. Therefore, Jack Ma believes that future intelligent user experiences will require the collaborative efforts of numerous specialized Agents, each fulfilling their unique roles. Robin Li predicts that millions of Agents will emerge in the future.

Competition for the next platform is inevitable.

Similar to the App Store in the mobile era, with the rise of Agents, the AI Agent Store is emerging as a new focal point of competition. Beyond scene-based rivalry, for major players with foundational large models, the ecosystem is central to the development of Agents and a strategic battleground.

Source: Leitech

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.