Boiling Digital Humans: Firing the First Shot for Large Model Product Deployment

07/15 2024 466

In the era of large models, what does a truly disruptive product look like? Should it possess the self-evolutionary capabilities of large models, contribute new productivity tools to industries, or perhaps reconstruct business management models?

Digital humans are offering an answer.

Author | Piye

Produced by | Industry Expert

"How should large models move forward?"

At the end of June, during a closed-door conference on large models hosted by an investment institution, this topic was raised and quickly sparked widespread discussion among participants, to the extent that this "discussion" could also be described as "exploration".

This level of attention is a microcosm of the entire large model market. After a 12-month race for technical parameters, people are increasingly realizing that although AI, unlike other technologies, possesses extensive capabilities for cognition and industrial reconfiguration, it is still quite far from achieving true industrial AGI.

The specific manifestation of this distance is that in 2024, large models are still struggling to successfully land in industrial niche scenarios.

According to incomplete data statistics, despite countless enterprises experimenting with AI in the past period, the proportion of those who have truly implemented AI internally does not exceed 10%.

Where should large models go next? Or, to break down this question further: How can large models continue to progress and be deployed?

At the recently concluded WAIC conference, a widely mentioned point was reducing hallucination rates and AI applications, which were showcased across various industries such as industry, finance, education, agriculture, etc. Specifically, in industrial contexts, they also covered areas like internal management, marketing, logistics, data operations, etc.

Among them, digital humans were one of the most watched directions. "Digital humans are now one of the few AI applications that can be deployed, applied, and produce visible results," a conference visitor told Industry Expert.

In fact, if we focus on this race track that emerged before large models, it is not difficult to see that with the advent of large models, the digital human race track is undergoing a new reconfiguration, not only in product forms and technological breakthroughs but also in the value they bring to industrial scenarios.

"We believe that digital humans may be the representative disruptive product of the AIGC era," the head of AI business at JD Technology told Industry Expert. A sufficiently authentic statistic is that JD Cloud Yanxi digital humans have now served over 5,000 brands, driving GMV exceeding 10 billion yuan, and as more and more enterprises use digital humans, this GMV is accelerating its surge.

"Large model application deployment comes first, with applications driving large model evolution." Digital humans are firing the first shot of this flywheel.

I. "Digital Humans + Large Models,"

Crossing the Industrial "Uncanny Valley"

The uncanny valley effect has always been a constraint on the development of the digital human race track. Although numerous enterprises and service providers have been involved in digital humans over the past few years, due to the persistent "uncanny valley effect," the deployment of digital humans has remained lukewarm.

Further analysis reveals that this uncanny valley is manifested in multiple dimensions such as digital humans' movement flexibility, language responses (interaction), and natural posture display.

The CTO of a digital human company once told Industry Expert, "In academia and industry, the word 'subtle' is often used when creating digital humans because even a tiny difference can be perceived by people."

"The entire industrial chain of this race track is still imperfect, including hardware. Although many enterprises choose to purchase live broadcasts and training, the core technology is still inadequate, and many enterprises even build their own using free technology, with similar results," said a local cultural tourism official.

However, this constraint was "shaken" on the evening of April 16 during JD's procurement and sales live broadcast. At 6 pm that evening, "Procurement and Sales Dongge AI Digital Human" made its debut on JD Live, with character modeling, accent fit, and movement poses that were extremely similar to real humans. It even occasionally spoke in "Suqian dialect." Within less than an hour of its debut, the live broadcast viewing volume exceeded 20 million, and the GMV exceeded 50 million yuan.

"Digital humans have crossed the 'uncanny valley,'" the official told us. According to him, within JD, a goal being strived for is the "120s test," which means if viewers cannot distinguish that the person on the screen is a digital human within 120 seconds, it can be considered as having crossed the "uncanny valley," and this challenge has now been largely achieved.

This is no easy feat. In simple terms, the current industry's construction process for digital humans often adopts a "modeling-driving-rendering" approach, but to achieve "natural and indistinguishable," even surpassing 120 seconds of real human effectiveness, every step needs to be perfect, including overcoming NLP and TTS challenges.

"JD Cloud Yanxi digital humans are an end-to-end video generation model," the official told us. Sora is a typical representative of end-to-end, but we found that videos generated by Sora still often contain unreasonable elements, such as distorted body movements. "True commercialization must also confront the issue of hallucinations, and we have put a lot of effort into reducing hallucinations because such situations are not allowed in commerce," he added.

Data shows that in addition to "Procurement and Sales Dongge AI Digital Human," during JD's 618 event this year, over 18 presidents' digital humans, including Gree's Dong Mingzhu, Hisense's Hu Jianyong, LG's Lee Dong-seon, Miniso's Ye Guofu, and Jieliya's Shi Zhancheng, have stepped into the spotlight, becoming brand-new live broadcast treasures.

It can also be said that with the emergence of large models, AI digital humans are unleashing stronger practical value and more visible business models. Both the Procurement and Sales Dongge AI Digital Human and the increasingly frequent appearance of digital human anchors in brand live broadcasts are indicating the maturity of AI digital humans as disruptive large model products.

But beyond digital humans themselves, from an AI perspective, what does "large models + digital humans" mean exactly?

II. A True Data Loop and New "AI Productivity" Tools

There is a consensus within the industry regarding large models: "Let large models run first." This sentiment has become a consensus at large model forums or roundtable discussions over the past six months.

Why is that?

The answer is still data. As everyone knows, for OpenAI, one of the protagonists of this wave, one of its largest capital investments over the past two years has been in computing power costs. Whether it's A100, H800, or other series of GPUs, they all correspond to astronomical investments. This significant investment has also facilitated the iterative updates from GPT to GPT 4.0.

However, it is clear to observant individuals that from GPT 4.0 to 5.0 and beyond, OpenAI has slowed down its product update pace.

In fact, beyond computing power, another cost that is snowballing and becoming a significant portion of OpenAI's funding, even increasing in proportion, is data. If primary to advanced education corresponds to basic data samples online, then advanced study in specific fields requires more authentic and high-quality data to train models.

But these data have limits. At a recent technology forum, Moon's Dark Side founder Yang Zhilin also explicitly stated that the current challenge for large models is finding more authentic data, but such data is now difficult to find, and he is even "unsure" if it truly exists.

From an industry perspective, there are two mainstream approaches to obtaining advanced data for large model training. One is to generate data using large models, but this needs to be based on eliminating hallucinations. The other is to "create authentic data," i.e., finding AI applications that can run.

For the former, it is still a pseudo-proposition of whether the data is authentic. But for the latter, there is already an answer that has emerged, and it is digital humans.

It is clear to see that as AI digital humans are used repeatedly, they are also generating higher-quality data through real interactions, which in turn feeds back into large model training, driving the formation of the entire closed loop and even the flywheel effect of large models.

The formation of this closed loop is not accidental and can even become the long-term implementation of an entire AI project. Within JD, since 2018, the multimodal human-computer interaction project has been under development, and it has taken a step forward under the catalysis of generative AI today.

In terms of the hallucination aspect of large models, JD's internal team has also put in a lot of effort. "We believe that if large models do not solve the hallucination problem and do not extremely reduce it, AI large models will be difficult to become the true industrial edifice of the future," the head of JD Technology AI told us.

It is understood that JD's primary approach to reducing hallucination rates is through vector databases and high-quality data.

Since 2019, JD has been developing vector databases, which have been honed in e-commerce promotion scenarios. Today, its vector database Vearch can support billions of high-performance searches with latency reduced to milliseconds. JD's knowledge precipitation in vertical industries is even richer, with the Yanxi large model trained using 70% general data and 30% native supply chain data.

From the perspective of large models, the flywheel built based on AI digital humans also has its unique characteristics. That is, whether it is the large model capabilities behind the product or the field where it now exerts its functions and value—e-commerce platforms—it corresponds not only to the exercise or feedback of a single capability of large models but more to the practical testing and honing of all modalities combined.

In this rich practical experience, the flywheel of "large model—application—data feedback—training" is accelerating.

III. The "AI Product Revelation" Behind Digital Humans

In fact, the value of digital humans goes beyond this. Beyond e-commerce live broadcasts, they are becoming new AI productivity roles in an increasing number of scenarios such as finance, education, employee training, and corporate digital employees.

In the current development of large models, we have been trying to answer a question: What does a truly disruptive product look like in the era of large models? Should it possess the self-evolutionary capabilities of large models, contribute new productivity tools to industries, or perhaps reconstruct business management models?

Digital humans are offering an answer. In the current era of large models, where technology is evolving from incremental to disruptive industrial development, as a deployed large model product, digital humans have crossed the "AI+" threshold and officially entered the period of AI reconfiguration.

First, from a technological perspective, as the culmination of large models, whether it's the methods and steps for generating digital humans or their current capabilities, there is sufficient technological support to cross the "uncanny valley" and reach the stage of true commercialization. With today's low-cost configurations, enterprises can try them out with low barriers.

Second, in terms of specific effects, enterprises using digital humans can change their business management methods based on them. Taking e-commerce as an example, in areas such as live broadcasts, digital human responses, and AI outbound calls, digital humans are constructing new interaction methods, helping enterprises better convey their voices and create value.

Similarly, this is even more true from a commercial perspective. As a SaaS form, it is clear that AI digital humans have overturned the past market impression of SaaS products being low-stickiness, high-churn, and difficult to customize. To some extent, they are equivalent to "fixed employees" of enterprises.

It can be said that as the only AI product that can be deployed and used on a large scale today, digital humans are demonstrating the disruptive nature of large model technology in countless industrial scenarios and contributing AI-native "disruptive" increments.

In fact, this is also JD's thinking. In today's era of large models, JD's consistent slogan has been to advance into industries.

Over the past year, based on the powerful foundation of large models, it can be seen that JD has been focusing on digital humans, whether it's Dongge's live broadcasts, helping brand partners' CEOs create digital humans, or implementing digital humans in directions such as finance and cultural tourism. However, the outside world has also questioned this "single voice".

But behind this lies JD's consistent positioning of industrial large model thinking. Compared to the increasing number of conceptual debates among large model vendors on the market regarding small parameters, open source, closed source, etc., JD has focused on only one thing—finding the best and most practical release point for large model capabilities at this stage, combining AI power, product power, value power, and productivity. Today, the answer has been given, and it is digital humans.

It is evident that JD Cloud Yanxi digital humans have become a large model product truly used by countless enterprises, not only in e-commerce but also in live broadcasts. Inadvertently, in the most grounded direction of AI large models, namely digital humans, JD has already become the best domestic leader in terms of technology, deployment, scenarios, and commercial service systems.

But it doesn't stop there. Within JD today, disruptive product directions that combine product and productivity value, similar to digital humans, are also being accelerated in discovery and research.

"In the direction of large models, we are more pursuing a win-win model, ultimately hoping that AI products like digital humans can bring value to brand merchants, and they are willing to pay for them, establishing a virtuous cycle," the aforementioned official told us.

Starting with digital humans, in the era of large models, JD is giving its own answer.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.