11/15 2024 376
The annual Baidu World Conference has become a window to observe Baidu's AI strategy and industry trends.
At the 2024 Baidu World Conference, Li Yanhong seemed more confident than last year. He said that the capabilities of the basic model are ready, and we are about to witness the shining moment of AI applications.
As the battle of large models intensified in the second half of 2023, Li Yanhong proposed that "focusing on models is less effective than focusing on applications." However, subsequent industry developments have shown that domestic applications have mainly focused on AI assistants similar to ChatGPT, with few disruptive AI applications in other fields.
One reason for this is that innovation has a cognitive threshold, and developing AI applications also has a technical threshold.
Perhaps realizing this, Li Yanhong and Baidu have further focused on the large model strategy this year, betting on the explosion of agents.
If asked what Baidu's first principle is now, the answer is becoming clearer and clearer - application-driven. Li Yanhong added an annotation to this "application-driven" that distinguishes it from the mobile internet era, saying, "It's not about launching a 'super app' but continuously helping more people and businesses create millions of 'super useful' applications."
01
"There's no shame in doing engineering"
Over the past 24 months, the much-anticipated AI super application remains elusive. Some people can't help but wonder: Is the global frenzy for large models a new technological revolution or just another bubble?
But if one recalls what Stefan Zweig wrote in "The Moment of Truth": "Those peak moments in history require a long gestation period, and every event with far-reaching impacts needs a process of development," onlookers might feel more certain.
As one of the earliest and deepest participants in this AI wave, Li Yanhong foresaw that "peak moment." His judgment that AI applications are about to explode is mainly based on two main factors:
Firstly, breakthroughs in underlying technology. Over the past 24 months, the biggest change in the AI industry is that large models have largely eliminated hallucinations, significantly improving the accuracy of their answers. Based on advancements in retrieval-augmented technology, Baidu ventured into iRAG (image-based RAG) at the beginning of the year, further enabling text-to-image generation to eliminate hallucinations, thereby enabling AI to be applied in fields such as film and television, comics, comics, and poster production.
▲Image generated based on the prompt "Einstein at the Temple of Heaven"
Left: Wenxin iRAG work, Right: Other large model works
Secondly, the surge in large model invocations and the awakening of developers. In May this year, the daily API call volume of Wenxin's large model reached 200 million. At the time, Baidu executives discussed a question at the meeting: How do large models "succeed"? Li Yanhong provided a quantitative indicator, stating that the daily average API call volume should increase tenfold within a year. This indicates a genuine need. By early November, the daily API call volume of Wenxin's large model exceeded 1.5 billion, developing faster than expected.
Another practical reason not discussed at the conference was invocation costs. Over the past two years, the cost per token for large models has decreased by over 99%. Taking the United States as an example, the cost of processing one million tokens using a large model is now only 60 cents. In China, some of Baidu's models are even provided free of charge.
Based on the information shared at the conference, Baidu is currently focusing on two major application directions: consumer-facing agents and industry applications for businesses.
According to our observations, the breadth of large model technology applications on the consumer side may be higher than expected by technology optimists, as many usage scenarios are not limited to traditional product forms like apps.
For example, one of my lawyer friends sometimes asks AI to help draft a simple legal agreement, which he then revises and presents to clients. If he were to draft the agreement himself, it would take at least half an hour. To complete this interaction, he doesn't even need to specifically download an AI tool app; he can easily access it through a legal industry public account.
It can be said that AI application scenarios are everywhere, but the product form may not be the same as before. At the beginning of last year, when large models first emerged, people envisioned that the iPhone moment of AI would be replicated in the field of large models, and super applications like "Angry Birds" and Instagram from the mobile internet era would soon appear in the field of large models.
A year later, super native AI applications have yet to emerge, but a new consensus is gradually forming: agents will become the mainstream form of AI applications.
A few months ago, OpenAI published an article stating that driven by technology giants like Google and Apple, 2025 will be the year when AI agents finally become mainstream. At the Baidu World Conference, Li Yanhong reiterated his judgment on agents: agents are the most important development direction for large models, and they are about to reach their breaking point.
According to the framework of AGI's evolutionary sequence, agents are considered the next stage after Copilot. They have a certain degree of autonomy and can perform long-term tasks, such as interacting with the environment multiple times, collaborating, and possessing self-evolution capabilities.
Currently, many top technology companies worldwide are focusing on agents, but Li Yanhong revealed at the conference that Baidu has made agents the most important strategic direction for large models.
The reason is that the threshold for agents is low enough, and the ceiling is high enough. Last year, the industry realized that applications should be the focus, but there was much uncertainty about how to specifically implement this and how to transform models into applications, deterring many people. Judging from the changes in product forms during the mobile internet era, each reduction in product thresholds has led to an explosion of application paradigms. The most typical example is the migration from apps to mini-programs, from which the WeChat ecosystem has greatly benefited.
Agents have similarities. When the platform provides developers with an efficient and simple path to build agents on large models, momentum will gather. On the Wenxin Agent Platform, for example, it has attracted 150,000 enterprises and 800,000 developers, with tens of thousands of new agents being created each week, including one created by an 11-year-old student, which are then distributed through search and other channels.
However, judging from Li Yanhong's speech, Baidu's goal is not to launch a super app but to help more people and businesses create millions of "super useful" applications.
In other words, the width of Baidu's ecological moat in the AI era will be built by these applications in the future.
It is thus not difficult to understand why among AI leaders, no one is more focused on application implementation than Li Yanhong. Baidu's AI advantage is more evident in underlying technology and model layers. However, the success of technology ultimately depends on its application in real life. As emphasized by Demis Hassabis, the founder of DeepMind and CEO of Microsoft AI, an AI model without a clear purpose is nothing more than an intriguing computer science demonstration.
02
The Emergence of "Free Canvas"-Style Agents
The explosion of agents is inseparable from "usefulness."
Li Yanhong previously cautioned against falling into the trap of "super apps." In the AI era, "super capable" applications are perhaps more important than just focusing on DAU-driven "super apps."
Combining cases from major domestic agent platforms, corporate agents, role agents, tool agents, and industry agents are currently the main development directions.
Among these, corporate agents are considered the corporate websites of the AI era. Considering Baidu's roots in search engines, we believe that this type of agent represents an incremental opportunity for Baidu Search, serving as a touchpoint for Baidu to upgrade and strengthen its relationships with key account customers, providing them with more value.
According to official Baidu data, after the launch of BYD's official agent, the conversion rate of sales leads increased by 119%.
However, among all the presentations at the World Conference, the most interesting and eye-catching for participants was the tool agent "Free Canvas." Jointly launched by Baidu Wenku and Baidu Netdisk, this new species of AI creation attracted over 200,000 people to queue for invitations on the first day of its public beta.
According to the live demonstration, "Free Canvas" is a versatile whiteboard powered by Wenxin's multimodal large model, representing the industry's first content operating system from Baidu Wenku and Baidu Netdisk.
Using large model technology, the "canvas" connects public and privately authorized content, enabling mixed understanding, generation, and creation of multi-format and full-modal files through a minimalist "drag-and-drop" operation, and supporting one-click sharing and storage of rich media documents, achieving freedom in input, editing, creation, and sharing.
In September this year, Baidu Group announced that the C-end business of Netdisk was reassigned to the Mobile Ecosystem Business Group (MEG) and managed by Wang Ying, Vice President of Baidu and Head of the Wenku Business Unit. If many people were puzzled by this decision at the time, the unveiling of "Canvas" now reveals that through further integration using AI capabilities, Wenku and Netdisk have become brand-new intelligent productivity tools.
Among the above four types of agents, from a commercialization perspective, "Noise Reduction NoNoise" is also paying attention to the prospects of industry agents. Taking the insurance industry as an example, AI entrepreneurs focused on large models for B2B have told us that in this 30 trillion-yuan market, some insurance agents can earn commissions accounting for 20%-40% of their clients' insurance premiums. If large models can turn insurance expertise into AI products and then develop insurance plans for clients for some simple insurance products, the prospects would be promising.
For example, the legal agent "Faxingbao," known as the "free AI lawyer," has answered over 16.6 million legal questions in just six months since its launch. Product information shows that this agent was created by Baidu itself, with its main interface providing dialogue interaction, free tools such as legal calculators, intelligent legal documents, and online legal opinions, as well as links to similar judgment cases for reference.
For example, users can inquire about compensation schemes and how to calculate claim amounts for traffic accidents or work-related injuries, or ask the agent to draft a complaint. In the past, when there was a need for such legal consultation, ordinary people either paid for professional legal advice or searched the complex internet themselves, bearing the risk of incomplete or inaccurate information.
However, some lawyers have pointed out that professional legal services such as legal strategies, analyses, and judgments based on professional knowledge and personal experience are still difficult for AI to achieve at present.
From another perspective, this might just be the imagination space for the future of agents, as agents are inherently autonomous products that will continue to learn and evolve.
It's worth noting, however, that the emergence of agents does not happen just because the infrastructure is in place. The explosion of an application direction is inseparable from clear commercial incentives. In 2012, when "Angry Birds" landed on the iOS platform and became a huge success, the game development company behind it saw its revenue increase by 101% to 150 million euros compared to the previous year, thanks to downloads and in-app purchases on iOS and advertising placements on Android.
Regarding this, Sam Altman, co-founder of OpenAI, has also issued a warning. In a recent interview, he reminded entrepreneurs that while embracing new technologies faster may lead to short-term explosive growth, in the long run, you still need to build a product or service that continuously provides value. "Everyone can now create fantastic demos, but the key is actually building a successful business. That's the toughest part, and business rules still apply."
According to our understanding, the Wenxin Agent Platform has gradually established a commercial closed loop, enabling agents to go from development to distribution to monetization. The highest single-time conversion revenue for a single agent has reached 100,000 yuan, supported by a comprehensive range of commercial components such as link mounting, product conversion, lead conversion, affiliate advertising, and capsule placements.
03
Long-termism, Idealism, Realism - All Are Essential
Chinese and American AI giants are often compared. Compared to their American counterparts' grand vision for AGI, driven by scientific research and underlying breakthroughs, Baidu, as indicated by its statements at the World Conference, adopts a typical engineering approach and application-driven strategy.
In an interview with Jizi Guangnian after the conference, Li Yanhong bluntly stated, "There's no shame in doing engineering. Engineering is likely to discover opportunities and laws earlier than science." Just as airplanes flew before people began to study aerodynamics. Therefore, Baidu's AI roadmap prioritizes solving the most common technical problems encountered in scenarios and applications.
At first glance, this contrasts with Li Yanhong's social labels. The story of Baidu's 170 billion yuan in R&D investment over 10 years is already well-known to the public, and Li Yanhong's labels of "long-termism" and "futurism" have long been recognized within the industry.
For instance, when Robin Li was selected as one of the global AI leaders by Time magazine along with Elon Musk, Jen-Hsun Huang, and Sam Altman in September last year, Time commented, "Robin Li is China's most prominent futurist, who has long been committed to the wave of AI development." In Sullivan's recently released "2024 Global AI Ecosystem Overview" report, Baidu was listed as an AI-Native Giant, occupying the same quadrant as Google and OpenAI.
Futurism is inherently idealistic, yet an application-driven strategy appears very practical.
This apparent contradiction may depend on how participants perceive the essence of the AI revolution. If AI heralds a new industrial revolution, this transformation will not conclude in just three to five years but will continue to permeate all aspects of society for decades to come. As participants in this transformation, we need both patience and continuous substantial investment. Then, through commercial success and ecological barriers, we can maintain our leading position in global competition.
As Robin Li predicted in a previous interview, AI competition will be fierce in the next two to three years. "As for who will be the ultimate winner, my view is that whoever makes money will survive."
This might be the original intention behind Robin Li's tireless preaching and urging everyone to create agents and utilize AI. It's also the motivation behind Baidu's decision to both develop its own agent applications like "Free Canvas" and "Faxingbao" and create tools and foundational platforms, continuously lowering the threshold for developers to create applications.
For example, another significant announcement at the World AI Conference was the no-code development tool "Miaoda," which Robin Li described as "the most complex multi-agent collaboration tool in human history so far." Its purpose is to empower everyone with the abilities of a programmer, allowing anyone who can speak to create applications. In the design field, overseas products like Cursor and Replit have already achieved the ability to generate application interfaces solely based on users' natural language commands without writing a single line of code.
▲Baidu's no-code tool "Miaoda" allows multiple agents to collaborate to complete the development of a registration system by simply describing the requirements in Chinese and providing a document with the conference's time and location.
These tools share a common underlying logic: enabling ordinary people to earn money through creativity. If similar ideas can be implemented, AI applications will undoubtedly be one step closer to large-scale adoption. We also noticed that although "Miaoda" won't be launched until the first quarter of next year, it is reportedly already attracting more than 5,000 companies for testing.
Following this line of thinking, it's also understandable why Baidu didn't join the competition for a Chinese version of OpenAI's Sora after its impressive debut. In Robin Li's view, Sora and multimodality are two different things. While Sora essentially provides video generation capabilities in any scenario, which is very meaningful, achieving this requires long-term investment.
However, this doesn't mean that Baidu isn't investing in multimodality. "We are very optimistic about multimodality and have made very long-term investments in it. In scenarios with real applications, our multimodality capabilities are very strong." Baidu simply chooses to advance in a more down-to-earth direction. For example, "Free Canvas" is a platform that integrates multiple AI functions into one.
When asked for his final thoughts at the end of the exclusive interview on the conference day, Robin Li said that the AI revolution won't end in three to five years; it's more like a comprehensive societal reconstruction over the next three to five decades. It requires a combination of long-term thinking, idealism, and realism.
For Baidu, the future and reality may be two sides of the same coin in its AI strategy.
Reference:
[1] "Dialogue with Robin Li: Application-Driven is Baidu's First Principle, and a Major Update to the Basic Model Every Two Years is Sufficient," Jiazi Guangnian