11/25 2024 542
Baidu's Smart Move: Making Apps Rely on Its Large Models
Original by Digits·Digital Economy Studio
Author | You Shu
WeChat ID | yds_sh
China's Internet once went through a C2C (Copy to China) phase, where overseas innovations were adapted for the Chinese market. However, after entering the mobile internet era, China surged ahead, and now overseas companies are looking to China for successful models, turning it into a 2CC (to China Copy) trend.
Following the emergence of generative AI applications, the development paths of large AI models differ between China and abroad: Foreign large models place too much emphasis on intelligence while neglecting user costs and application experience, leading to a decline in new user acquisition. In contrast, the Chinese market, which is closer to users and the market compared to the US's focus on computing infrastructure, has launched multiple AI applications integrated with various industries. As AI's next battleground looms, giants from both China and abroad are gearing up.
'Don't Compete on Models, Compete on Applications'
'In the past 24 months, the biggest change in the AI industry is that large models have largely eliminated illusions, significantly improving the accuracy of answers. This makes AI truly usable and reliable.' At the recent Baidu World 2024 conference, Baidu founder Robin Li delivered a keynote speech titled 'Applications Are Here.' In his view, the large-scale deployment of AI applications is becoming a reality.
After investing over 100 billion yuan in AI over a decade, Baidu, as one of the earliest domestic enterprises to delve deeply into AI, sets the trend for domestic large AI models with every move. In 2023, when ChatGPT was all the rage and AI was booming, while China was still focused on the 'hundred models war,' Robin Li proposed, 'Don't compete on models, compete on applications.' He believes that large models should not only compete internally on computing power and parameters but also externally on scenarios and problems to fully unleash their potential and value.
Robin Li believes that the development path of AI in China is driven by applications, and agents are the direction he is most optimistic about for native AI applications.
An 'agent' is an intelligent entity capable of autonomously perceiving the environment, planning, and executing tasks. In a straightforward formula, an agent = large model + planning + feedback + tool use. Compared to traditional monolithic large language models, which are 'black box models,' agents have the advantage of making it easier to understand and analyze how different components contribute to the system's overall behavior.
Great minds think alike. In November 2023, Bill Gates wrote on his personal website, 'Agents will not only change how we interact with computers but will also disrupt the software industry, bringing about the biggest computing revolution since the advent of the graphical user interface.' He claimed that agents will become the next platform, much like Android, iOS, and Windows in the past.
Mark Zuckerberg also predicted that the number of agents in the future could reach billions, even surpassing the global population. In March this year, Professor Andrew Ng of Stanford University pointed out that the agent workflows they built based on GPT-3.5 in their research performed better in applications than GPT-4, and those based on GPT-4 would, of course, perform even better. He believes that AI agents will drive significant progress in AI in 2024, potentially surpassing the impact of the next generation of foundation models.
Even an 11-year-old elementary school student can participate in creating agents
When asked what people should learn at the 2024 World Government Summit, Jen-Hsun Huang replied, 'Because our job is to create computing technology so that no one needs to 'program' in the traditional sense, making everyone in the world a programmer.'
The most obvious feature of agents is their low barrier to entry, making them accessible to everyone. For example, on Baidu's ERNIE Bot platform, an 11-year-old elementary school student created an agent that was distributed through search and other channels. With the disappearance of large model illusions, more and more people can create useful agents with natural language prompts.
While agents lower the barrier to entry for users, their potential is vast, enabling powerful applications. Collaboration among multiple agents can also solve more complex problems.
For instance, corporate agents can iteratively upgrade traditional corporate websites. Taking BYD's agent as an example, it not only covers basic website functions such as company introduction, product images and specifications, and offline store locations but also incorporates AI capabilities like proactive recommendations, timely responses, and one-on-one services. After BYD's official agent was launched, sales conversion rates increased by 119%.
For example, an agent can accurately understand and recommend a 'well-balanced' car model and display product images:
In our personal and professional lives, we encounter various legal issues, but not everyone can immediately find a professional lawyer for advice. This is where the legal industry agent 'Faxingbao' comes in handy.
Suppose there is a traffic accident dispute. Faxingbao provides a four-step guide: first, retain relevant evidence and request traffic management department mediation. If mediation fails, you can file a civil lawsuit and apply for court enforcement. If you need to calculate accident losses and compensation amounts, you can enter detailed information, and Faxingbao's 'compensation calculator' will do the math for you. Not only that, but Faxingbao can also help you draft a complaint:
The tool agent 'Free Canvas' allows users to freely drag and drop documents, audio, video, and other rich media materials onto a 'canvas'-like interface to quickly generate multimodal content. Robin Li calls it 'immediately available off-the-shelf.' It bridges the gap between public and private domain materials. For example, industry analysts can use it to write research reports, and documents, videos, audio files, and other formats can be excerpted to Free Canvas with one click.
AI assistants may become the new entry point for mobile internet
With agents as the entry point, domestic internet companies have found a direction for AI applications that is also being echoed across the ocean, as leading US companies are racing to launch AI assistant products.
An AI assistant (AI Agent) is an intelligent entity that can perceive the environment, make decisions, and execute actions, with the ability to independently think and use tools to gradually achieve given goals. It can provide personalized applications for consumers and cost-reduction and efficiency-enhancement solutions for businesses. For ordinary users, the core function of an AI assistant is its ability to operate a mobile phone autonomously and assist in complex reasoning tasks.
OpenAI is preparing to launch a new AI assistant product code-named 'Operator' that can automatically perform various complex operations, including coding, travel booking, and automated e-commerce shopping. According to insiders, OpenAI's leadership expects to release the product in January 2025, initially as a research preview and development tool, with API access for developers.
OpenAI CEO Sam Altman said, 'We will have better and better models, but I believe the next big breakthrough will be AI assistants.' From OpenAI's perspective, it faces increasing pressure in its commercialization process, and incremental improvements in ChatGPT may not attract users to pay higher prices. Executives are eager for a breakthrough product to justify the huge investments in AI development.
Meanwhile, Microsoft recently quietly open-sourced the AI tool OmniParser, which helps users create personalized agents to operate personal computers. On October 22, Microsoft announced the integration of 10 autonomous AI agents into Dynamics 365, supporting OpenAI's latest model o1, with self-learning capabilities to automatically execute complex cross-platform business operations.
Google plans to preview its large action model 'Project Jarvis' in December, which will help users perform tasks such as 'gathering research, purchasing products, or booking flights.'
Apple has chosen to integrate Siri with ChatGPT for smarter human-computer interaction. Some netizens have also discovered that Apple has quietly released two implementation versions of Ferret-UI (based on Gemma 2B and Llama 8B, respectively), a technology Apple released in May this year that allows AI to understand mobile phone screens.
In an era where hardware manufacturers invariably tout AI, AI assistants may become the breakthrough for terminal intelligence. Even more imaginatively, AI assistants, on which both Chinese and foreign giants are placing their bets, may hold the new entry point for mobile internet. Due to their strong interactivity and convenience, AI assistants are expected to break down the natural barriers between different apps on the same terminal, which would inevitably reshape the traffic distribution landscape.
Whether it's Baidu, KIMI, Tongyi, Doubao, or ChatGPT, Apple, Google, Microsoft, the one that will ultimately grasp the new mobile internet entry point and secure a ticket to the future will be the one with the smartest and most user-friendly applications.
The battle for the super entry point of the AI mobile platform has just begun. Whether first-mover or late-mover advantages are truly advantageous depends on who takes the lead in creating an AI super application platform. This platform is not a single application function but an omnipotent super complex.
The future awaits with anticipation.