11/14 2024
458
It is becoming popular in the tech industry to use AI to buy coffee.
At the beginning of September this year, when Alipay announced the launch of its AI life butler "Zhi Xiao Bao" at the Inclusion·Bund Conference, it used "Zhi Xiao Bao" to order a cup of Starbucks coffee; on October 28, at the 2024 CNCC Conference, WisdomAI also demonstrated ordering a cup of coffee using its latest agent application AutoGLM.
Not only software vendors, but also more and more mobile phone manufacturers prefer to start demonstrating AI functions by ordering a cup of coffee.
In early October, vivo's mobile phone agent PhoneGPT could also execute the operation of buying coffee upon user instructions; in late October, Zhao Ming, CEO of Honor, also used the intelligent assistant YOYO to order coffee with just one sentence at the conference.
From initially only supporting generative AI (Generative AI) that outputs content through dialogue to agent AI (AI Agent) that can help people perform specific tasks, various companies have specifically demonstrated their products through the small act of "ordering a cup of coffee".
Of course, the imagination for AI does not stop here. Besides ordering coffee, major companies undoubtedly expect AI to do more errands, such as automatically ordering takeout, booking hotels, and shopping with just one sentence. However, compared to the broad and vague term "agent," "task-oriented AI" obviously has a lower threshold for understanding and acceptance.
As task-oriented AI becomes more abundant, the industry is also beginning to ponder: both software and hardware have their irreplaceable advantages, so how can all parties break down ecological barriers to truly provide useful and convenient services for users.
01. Task AI: Being able to do is more important than being able to talk
Over the past year or two, the industry's primary focus has been on generative AI, but now, pure generative AI can no longer fully meet the needs of C-end users for AI implementation.
As previously mentioned in an article from Xinlichang, generative AI and agent AI are two different directions in artificial intelligence. The former mainly generates new content (text, images, audio, etc.) through learning data, with the most common application being chatbots; the latter can not only chat but also focuses more on simulating intelligent behavior, interacting with the environment, and making decisions and performing tasks based on collected data.
Agent AI can be seen as a progression of generative AI. It is evident that due to the explosion of popularity of OpenAI's ChatGPT, which points out a development direction for AI, most domestic explorations of AI To C in the past year or two have revolved around generative AI. However, as technology gradually becomes more practical and is made available to a broader range of C-end users, purely listening and speaking generative AI is no longer sufficient, and AI needs to help people with more tasks.
As a result, major manufacturers have gradually begun to focus on agent AI.
This is not only because the industry has perceived the evolution of user needs. The barriers to generative AI lie in the model's effectiveness and data quality, while the barriers to agent AI lie in the richness and connectivity of the ecosystem. After the initial battle of hundreds of models, during the current relative calm period, manufacturers have gradually realized that, in the context of specific AI application implementation, purely competitive generative AI is no longer sufficient, and how AI can truly serve life has become even more important. The ecosystem accumulated in various industries during the previous mobile internet era represents virtually untapped fertile ground for AI.
Recently, it seems that more and more manufacturers have realized the importance of developing agent AI by starting with a specific small task.
Taking Alipay's conference in September as an example, after the conference, having AI buy coffee became a hot topic of discussion in the tech and media circles. At the subsequent Honor conference, there was even a demonstration of Honor CEO Zhao Ming using AI to order 2,000 cups of coffee.
After all, acts like buying coffee can simultaneously meet the core needs of both users and manufacturers at this stage: from the user's perspective, the cost of a cup of coffee with just one sentence can demonstrate whether there has been any progress in the efficiency of AI tasks; from the manufacturer's perspective, using AI to buy coffee encompasses all the key processes of agent AI.
By having AI buy coffee, there is now a more tangible and practical example of having AI handle tasks with just one sentence, opening up even more possibilities for the future.
02. Capability assessment: Each has its strengths in the technical ecosystem
When we want to find a term to describe this type of agent AI, we naturally discover that "task AI" is more specific and has more local characteristics than "agent" itself.
Currently, task AI can be divided into three main categories based on the nature of the manufacturer: platforms, hardware, and large models. Taking the scenario of users buying coffee as an example, a comparison of the specific similarities and differences between the three types of task AI is shown in the image below.
Internet platforms such as Alipay's "Zhi Xiao Bao." When a user gives an instruction, Zhi Xiao Bao will retrieve the corresponding service, such as buying coffee, booking flights, hailing a taxi, etc., and can complete the entire process from ordering to payment within the app. From the current test results, the services retrieved and the supported cities are still gradually being opened up.
Zhi Xiao Bao's advantage lies in the richness of its ecosystem. In April of this year, Alipay began grayscale testing a brand-new intelligent assistant that can be experienced by pulling down the homepage, which is the predecessor of the current Zhi Xiao Bao. It was not until September that "Zhi Xiao Bao" was officially launched as an independent app for lifestyle services, based on Alipay's rich service ecosystem, allowing users to access a vast array of extreme services with just a simple sentence.
As a national-level application, Alipay's more than 4 million mini-programs and over 8,000 digital lifestyle services are undoubtedly fertile ground that Zhi Xiao Bao can continue to cultivate. Zhi Xiao Bao's future advantage lies in its ability to span different hardware platforms, making AI services accessible to as many people as possible.
Hardware categories such as Honor YOYO. When a user gives a voice command using the phone assistant, the AI assistant simulates the user's screen interaction behavior, automatically taps the screen to retrieve apps, finds merchants within the apps, and automatically places an order, with the final payment needing to be confirmed by the user.
Unlike independent apps like Zhi Xiao Bao, AI assistants for phones like Honor YOYO can call other apps or services within the phone from the system level, making them an upgraded version of previous phone voice assistants. Of course, the advantage of such hardware manufacturer AI products is that they occupy user entry points and have hardware calling permissions for apps.
However, the disadvantage is the high threshold for continuous optimization and iteration. Manufacturers may need to invest in high-performance chips, develop supporting operating systems, and also consider the situation where users replace their phones. Therefore, mobile phone manufacturers need to continuously seek a balance between product, cost, and performance, further exploring the application potential of AI agents.
For example, before the Honor Magic7 conference in October, Honor had already released the AI operating system MagicOS 9.0 in advance and presented it at the conference along with YOYO, which had been upgraded to an agent.
Not to mention that to pave the way for AI, Apple has not only released new phones and systems compatible with AI but also recently launched a Mac series equipped with the new M4 chip, seen by the outside world as a further step towards AI PCs.
Due to the greater costs involved, hardware-based task AI is now taking more cautious steps.
For large model manufacturers that neither occupy user entry points nor have a lifestyle service ecosystem, there is still a certain advantage in developing task AI: they can simulate user screen interaction behavior like hardware manufacturers but are not restricted by the brand of hardware manufacturers, exploring to become a platform that is open both upstream and downstream.
For example, WisdomAI's AutoGLM. After the user gives an instruction, the application will also simulate the user's screen interaction behavior to retrieve other applications. The user needs to intervene for key operations, and the final payment needs to be confirmed by the user. From a positioning perspective, the task AI of large model manufacturers falls between hardware and platform categories.
03. Interconnection and interoperability: Broad entry points and diverse applications
If we were to make a prediction about the future of various types of task AI, we could compare them based on some relatively mature products.
For example, the voice control systems of smart homes or in-car entertainment systems, similar to intelligent AI products for lifestyle scenarios, already closely resemble the task AI we are talking about now, but they still have limitations. Essentially, they still perform tasks in the closed environment of a "home" or "car," providing relatively limited services and simpler commands for AI.
However, tasks like having AI "buy coffee" obviously occur in a more open lifestyle scenario, where AI faces a more complex and diverse real world. On one end of AI is the user's personalized lifestyle, and on the other end are merchants or services from various industries.
At this time, both hardware and platform manufacturers will leverage their respective dominant positions to develop task AI. Mobile phones, wearable devices, smart home appliances, smart cars, and other hardware are more direct entry points for users to access AI, which is the source advantage of hardware manufacturers.
AI software products like Zhi Xiao Bao, on the other hand, rely on Alipay, which is not only a payment platform but also the largest digital lifestyle service platform in China, capable of mobilizing a vast number of merchant organizations to provide users with more convenient and seamless task services. These complex ground-level infrastructure tasks are the platform's advantage in the service ecosystem. After all, during the previous mobile payment era, Alipay did just that: deploying QR codes and building mini-programs, and currently, Alipay has 4 million mini-programs and connects over 80 million merchants.
In the AI era, Zhi Xiao Bao is reconstructing the platform ecosystem using AI logic. Alipay previously launched the intelligent agent development platform "Treasure Box," allowing merchant organizations to quickly create exclusive intelligent agents with zero code and publish them with one click to Alipay mini-programs, the Alipay app, and the Zhi Xiao Bao app.
Currently, when we open Zhi Xiao Bao, we can see that many third-party service providers already have their intelligent agents, such as the "Huang Xiaosong" intelligent agent launched with Huangshan Scenic Area and the "Hang Xiaoyi" intelligent agent launched with Hangzhou Culture and Tourism. It is foreseeable that other industries will also undergo an upgrade and transformation from mobile internet services to agent services in the future, and this process may come faster and more fiercely than the mini-program era.
However, just as the prosperity of the mobile internet required open collaboration between mobile phone manufacturers and internet platforms, entering the AI era also requires concerted efforts from multiple parties to enable AI to truly grow hands and feet, integrate into life, and help you with tasks.
After all, whether hardware or software manufacturers, the ability of their respective AI products to perform tasks still needs to be strengthened, and currently, they are both struggling to achieve the effect of decomposing complex instructions and completing tasks in one go.
For example, when hardware calls software, user intervention is still required for multiple key steps, and WisdomAI's AutoGLM cannot skip ad pop-ups; when software like Zhi Xiao Bao calls services within Alipay, collaboration with merchants also needs further development. For example, when users need to order food but have difficulty choosing, can merchants make intelligent recommendations based on user preferences?
Therefore, although the task AI businesses of different types of manufacturers overlap to some extent, they are still far from competing for AI entry points. In open scenarios, both software and hardware manufacturers need each other's capabilities to complete the closed loop of task processes.
Hardware manufacturers need the ecological resources of internet platforms to truly penetrate into tens of millions of merchant organizations, delve deeper into the fabric of life, and expand service supply. AI products on internet platforms also need to be integrated with hardware, such as car infotainment systems, watches, glasses, etc., to rejuvenate service forms.
Previously, ByteDance's Doubao launched the AI agent earphone Ola Friend, which allows users to check information, ask for tips, learn English, etc., just by wearing the earphone; there were also rumors that Rokid would cooperate with Zhi Xiao Bao, allowing users to have AI help order coffee, hail a taxi, and handle various tasks with just one sentence by wearing AR glasses. It can be seen that various manufacturers are now actively exploring the combination of software and hardware to find the optimal path for the implementation of task AI.
If we refer to the development of the mobile internet over the past decade or so, we will also find that with the continuous improvement of the degree of interconnection and interoperability among major platforms, life has become more convenient today. For AI to truly help people with tasks, interconnection and interoperability is also unavoidable, which largely determines the speed of AI ToC implementation in the next stage.
This is a long road, but at least we know where the direction is. How the native agents of hardware manufacturers and the agents of internet platforms can combine and collaborate, what standards should be followed, and how to further shape the upstream and downstream ecological model are all issues that the industry needs to consider.
*The lead image and images in the text are sourced from the internet.