11/14 2024 572
On November 12, at Baidu World 2024, Li Yanhong delivered a speech titled "Applications Are Here," expressing his view that "the biggest change in the AI industry over the past 24 months is that large models have essentially eliminated illusions" and unveiling the iRAG (Information Retrieval Augmented Generation) technology, claiming it could "address the illusion problem in image generation by large models, significantly enhancing practicality."
However, during real-world testing, Xinshi Research Institute found that Wenxin Yiyan's elimination/resolution of illusions in text and image generation might just be Li Yanhong's "illusion."
1. Li Yanhong says large models eliminate illusions, but Wenxin Yiyan's replication reveals flaws?
As a technology demonstrated at the conference, besides not failing on-site, it should at least allow users to replicate the results. However, Li Yanhong's on-site demonstration raises doubts about whether he used pre-set materials.
At the conference, Li Yanhong grandly introduced the iRAG technology. According to the introduction, this technology combines Baidu Search's billions of image resources with basic model capabilities to generate highly realistic images. Li Yanhong emphasized that iRAG's effectiveness far surpasses native text-to-image systems, successfully eliminating the "machine flavor" and significantly enhancing the practicality of AI-generated images.
So, how does it eliminate the "machine flavor" and prove the authenticity of the generated images? Li Yanhong used the Temple of Heaven as an example.
First, Li Yanhong used an open-source model to generate an image of Beijing's Temple of Heaven and then informed the audience that this image was incorrect because the Temple of Heaven has only three tiers, while the open-source model generated an image with four tiers, illustrating the most authentic case of illusion in multi-modal models like images.
After pointing out the error in the open-source model, Li Yanhong also showed an image of Albert Einstein at the Temple of Heaven generated by Wenxin Yiyan using iRAG technology, demonstrating Baidu's iRAG technology's ability to eliminate multi-modal model illusions.
Image source: Xiaoxiong Caijing
While this all seemed smooth, and the image generated using iRAG technology no longer showed "fantasy," issues arose when we attempted to replicate the image.
Under the same requirements, the "Temple of Heaven" generated by Wenxin's large model did not resemble the one Li Yanhong demonstrated on-site but instead showed a "Temple of Heaven" with four tiers, which ironically mirrored the mistake Li Yanhong had pointed out in other models.
Image source: Wenxin Yiyan
Besides the number of tiers, the generated "Temple of Heaven" image also did not match the real Temple of Heaven in the number of balustrades below. The real Temple of Heaven has three layers of balustrades, while the generated image showed four or even five layers.
Image source: Wenxin Yiyan
Some might consider my requirements nitpicky, but under five identical requests, Wenxin Yiyan produced results that "slapped" Li Yanhong's claims three times, which is somewhat embarrassing.
In another on-site demonstration, a picture of a Volkswagen Talagon jumping over the Great Wall, Wenxin Yiyan also gave an answer vastly different from the displayed image.
Image source: Wenxin Yiyan
The top-left image is the real Volkswagen Talagon, the bottom-left is the one displayed at the conference, and the right image is a newly generated one. The image Li Yanhong showed was not far off from the real Talagon, but why was the Volkswagen logo "illusory" in the newly generated image?
Beyond images, Li Yanhong stated that RAG at the textual level has performed well, essentially eliminating illusions from large models. However, as a frequent user of large models like Doubao, Kimi, and Wenxin Yiyan, the reality falls far short of Li Yanhong's claims.
(In fact, Toronto is not the capital of Canada)
Technically, RAG (Retrieval Augmented Generation) is essentially a technology that alleviates LLM (Large Language Model) illusions based on information retrieval methods, which cannot completely eliminate model illusions. This seems to be an inherent issue with the Transformer architecture, especially when handling tasks requiring reasoning, such as code writing and mathematics, where RAG's performance is less ideal. Is it really problem-free to assert so arbitrarily that RAG has essentially eliminated illusions from large models?
2. Is Li Yanhong the one being "deceived"?
If such incidents were isolated, they might be understandable. However, considering Li Yanhong and Baidu's frequent statements and actions since the AI era began, it's not unreasonable to suspect that Li Yanhong might have been "deceived" into having illusions by those developing and promoting products and services within the company.
As the first company to release a large model, Baidu's Wenxin Yiyan was in the spotlight early last year, almost being crowned the star of China's large models. However, a year and a half later, Wenxin Yiyan's MAU (Monthly Active Users) is only a quarter of Doubao's, and Kimi, a similarly aged product from the Dark Side of the Moon, is catching up fast.
Data source: AI Product Rankings public account
To investigate the reasons, while it's true that Doubao's exclusive promotion on Douyin and Kimi's aggressive spending have played a role in recent months, ultimately, it's due to Baidu's failure to maintain its leading edge in large models.
At the Create 2024 Baidu AI Developer Conference in April this year, Baidu unveiled three AI development tools: AgentBuilder for agent development, AppBuilder for AI-native application development, and ModelBuilder for customizing models of various sizes. Among them, AgentBuilder, the agent development tool, seemed highly innovative and was highly anticipated by Baidu.
However, in terms of agent creation capabilities, Baidu's superiority is not as great as claimed.
Taking Doubao as an example, the ability to discover and create AI agents has long been available to C-end users and does not lag behind agents developed using Baidu's AgentBuilder in real-world use.
Image source: Doubao
Some of Li Yanhong's statements and judgments about large model technology and trends often seem inconsistent with real-world developments.
Also at the Create 2024 Baidu AI Developer Conference in April this year, Li Yanhong stated, "Open-source models will become increasingly outdated." He attributed this to the fact that while open-source models were previously considered cheap, in the context of large models, they are actually the most expensive, leading to their decline.
Does the high cost of open-source models necessarily lead to their decline? Clearly, this is not a direct causal relationship. Moreover, in the technology sector, almost every developer believes in the power of open source, which drives most technological innovations. So, why does Li Yanhong hold a different view?
Going further back, when Baidu launched Wenxin Yiyan last year, Li Yanhong claimed that "the gap between Baidu Wenxin Yiyan and OpenAI might be around two months." However, Wang Xiaochuan bluntly stated, "This might be something Li Yanhong from a parallel universe said, not from our world." As for the actual gap between Baidu Wenxin Yiyan and ChatGPT at that time, and whether Wenxin Yiyan's user experience has caught up with GPT-4o now, anyone with discernment can see the truth.
Coupled with a long-standing emphasis on announcements over implementation and promotion over user experience in large model application feature iterations, it's hard not to wonder if Robin, with a technical background, has truly embraced the AI era with Baidu or has gradually been assimilated by the information cocoon, becoming a "megaphone" for product developers, business personnel, and even public relations personnel.
References:
"Was Li Yanhong Deceived?", Digital Evolution Island;
"Large Model 'Illusions': Read This Article and You'll Know It All | Produced by Harbin Institute of Technology and Huawei", QbitAI;
"(Full Text) Li Yanhong's Latest Speech: Wenxin Large Model Has 1.5 Billion Daily Calls", DataView;
"Li Yanhong Announces: Baidu's iRAG Technology Makes AI-Generated Images More Realistic, Resolving Large Model Illusion Issues", Xiaoxiong Caijing;
Edited by Ding Li