11/13 2024 497
A new competitor has arrived in the AI glasses market.
On November 12, Li Ying, CEO of Xiaodu Technology, officially unveiled Xiaodu AI glasses at Baidu World 2024.
With an appearance almost identical to traditional glasses, Xiaodu AI glasses not only integrate a pair of earphones and a camera, but more importantly, as the world's first native AI glasses equipped with a large Chinese language model, they support a series of AI functions such as asking questions on the go, object recognition, audio-visual translation, and intelligent reminders, all based on multimodal interaction.
Xiaodu's entry into the market has further fueled the fire of "AI glasses".
Since the beginning of this year, the influx of large models into hardware has rapidly given rise to a large number of AI hardware products. "AI glasses," as one of the hottest items in this category, saw Ray-Ban Meta sell one million units just six months after its launch. Meanwhile, a large number of technology companies are attempting to enter this emerging field. Even Bloomberg recently reported that Apple is also pushing forward with an AI glasses project codenamed "Atlas" internally.
This enthusiasm is not hard to understand. Behind the popularity of AI glasses lies a very clear logic: while meeting the optical functions of traditional glasses, AI glasses can also integrate open headphones and cameras, satisfying many consumers' smart needs in a more convenient way. Just as traditional watches have been upgraded to smartwatches, which add more smart functions beyond the basic need of conveniently checking the time.
However, AI glasses or smart glasses are not a sudden emergence. Several years ago, Huawei and other manufacturers have been launching similar products—integrating headphones or cameras into traditionally styled glasses—but they have never sparked consumers' interest. The key variable behind this transformation is:
The arrival of large models.
Simply put, large models have brought smarter AI, making voice interaction and even multimodal interaction between humans and AI usable and user-friendly, and giving meaning to the intelligent upgrade of AI glasses. Just as without the support of multi-touch, the intelligence of the first-generation iPhone would be meaningless.
But what changes have large models brought to AI glasses? What capabilities are required for a good pair of AI glasses? And what role will Xiaodu play in the development of AI glasses?
Large models pouring into glasses, AI reshaping the smart experience
It is an indisputable fact that many people need glasses. But what changes will occur when large models pour into glasses?
Take the newly released Xiaodu AI glasses as an example. They not only eliminate the need for a pair of headphones but also integrate a 16-megapixel camera, allowing users to capture exciting moments in life and travel from a first-person perspective. More importantly, the application of large model technology enables AI glasses to understand users' intentions, bringing revolutionary multimodal interaction and truly surpassing single-function devices.
As we all know, one of the biggest changes brought by large models is their enhanced natural language understanding ability, making users more willing to interact with AI more frequently. Just as Xiaodu swapped out its "brain" earlier this year for a new one—the DuerOS operating system built on Baidu's ERNIE Bot large model, resulting in a qualitative leap in intelligence. The number of multi-round interactions between users and Xiaodu in AI conversations has increased by a whopping 700% compared to traditional voice conversations.
In other words, the "AI intelligent evolution" based on large models lays the foundation for AI glasses to use voice as the primary human-computer interaction method. Even basic experiences like voice-activated photography, video recording, and playing music/podcasts will see fundamental improvements.
Moreover, large models have brought unprecedented functional and experiential upgrades, especially when paired with the visual capabilities of cameras. For example, the translation and summarization functions on Xiaodu AI glasses can help you understand information at airports, hotels, and restaurant menus during travel after Xiaodu "reads and comprehends" them.
When traveling and encountering an unfamiliar building or attraction, AI glasses can serve as an "AI tour guide," introducing attractions directly by combining vision, location, and knowledge bases.
In short, AI glasses driven by large models not only make human-computer interaction natural and smooth but also transform from "smart hardware" to "smart assistants," enabling users to understand the world from different perspectives and depths through AI anytime, anywhere. If they also have the ability to share the user's visual perception, that would be an even greater step forward.
What is the key to making a good pair of AI glasses?
Currently, AI glasses are becoming increasingly popular, with more and more new products being launched. However, on the whole, they can be broadly categorized into two types: one type gains AI capabilities by accessing large models on AR glasses; the other type is AI glasses natively developed based on large models, integrating audio and AI modules, and even cameras, but without optical and display modules. Typical representatives are Ray-Ban Meta and Xiaodu AI glasses.
The differences between these two types of AI glasses are not only reflected in their capabilities but also directly impact their daily user experience.
Due to the integration of optical display modules, AR glasses, while capable of display, also significantly increase in weight and cost, face significant challenges in battery life, and are difficult to wear for extended periods.
On the other hand, AI glasses with cameras, while not supporting display, can engage in more natural and smooth human-computer interaction due to their native intelligence derived from large models. Eliminating the optical display module not only means lower costs and improved battery life but also allows for a thinner and lighter design, enabling longer wear times and broader usage scenarios.
In reality, making a truly good pair of AI glasses involves much more than simply assembling supply chain resources. It encompasses various aspects, with the foundation lying in software and hardware integration and deep integration with large models.
This is also the key basis for Meta and Xiaodu to make a good pair of AI glasses. Xiaodu AI glasses are equipped with the DuerOS AI operating system based on Baidu's ERNIE Bot large model. They not only support visual AI like Ray-Ban Meta but also excel in natural language interaction, multimodal perception, and anthropomorphic presentation. Secondly, leveraging its strengths in software and hardware integration, Xiaodu has created a variety of smart devices such as smart screens, fitness mirrors, smart study machines, and buddy machines, accumulating profound hardware capabilities and supply chain resources.
However, on the other hand, users' demands for the intelligence of AI glasses exhibit distinct localization characteristics, and differences in language, culture, and usage scenarios will also determine the experience of AI glasses to a certain extent. In other words, in the Chinese market, AI glasses must have a deeper understanding of Chinese and the usage needs and scenarios of Chinese users.
In this regard, domestically produced AI glasses clearly have an advantage. Especially Xiaodu AI glasses, backed by Baidu's ERNIE Bot large model, one of the world's most proficient in Chinese, can more accurately understand our intentions and generate higher-quality responses in a Chinese environment. Additionally, Xiaodu can seamlessly access Baidu and its industry partners' rich data and applications, further expanding the "soft power" of AI glasses.
At the same time, Xiaodu also understands domestic users better. As mentioned earlier, Xiaodu has a wide range of devices, all of which have achieved remarkable results. According to official data, Xiaodu's various smart devices currently cover more than 46 million households.
This is precisely where Xiaodu AI glasses' localization advantage lies. Coupled with the application support of Xiaodu, Baidu, and the industry ecosystem, users can also enjoy comprehensive services in scenarios closely related to glasses, such as travel, study, work, entertainment, and health.
In a word, by deeply exploring the Chinese large model and the actual scenario needs of Chinese users and fully leveraging its content and ecological advantages, Xiaodu AI glasses can more flexibly and accurately meet the diverse application scenarios in the domestic market.
On the eve of market explosion, AI glasses need new breakthrough makers
As a new product category, with the popularity of the product and the rising market heat, AI glasses can be said to have gained a certain level of user awareness. However, for AI glasses to reach a broader user base in the domestic market, there are still many aspects that need to be addressed and improved.
Weight, battery life, imaging, audio... as well as more functions and stronger intelligence, they collectively constitute the features and experiences that influence users' perception and acceptance of this new product category. These improvements are certainly not simple and pose stringent requirements on software, hardware, and large models, especially while maintaining the lightness and long battery life of AI glasses.
Therefore, the AI glasses industry needs new breakthrough makers, especially in the domestic market. By vertically integrating software, hardware, and large models, they can address users' pain points and itches in usage, create a pair of AI glasses that impress Chinese consumers, and thereby drive the entire market forward.
From this perspective, Xiaodu's entry into the market is not only a continuation of its own "AI+hardware" strategy and a product upgrade for AI glasses but also an exploration of the product paradigm for Chinese AI glasses, promoting the iterative optimization of the entire industry.
Source: LeiTech