Robotic Innovations Highlighted at the 2025 World Artificial Intelligence Conference

07/30 2025 360

Produced by ZhiNeng Technology

The 2025 World Artificial Intelligence Conference (WAIC) has emerged as a comprehensive showcase of China's AI industry chain's technical prowess.

Departing from previous years' focus on models or chip displays, this year's exhibition featured a highly systematic approach, encompassing embodied intelligence, humanoid robots, large vertical industry models, and domestic computing power, forming a seamless loop from perception, cognition, execution to computational support.

Particularly driven by new infrastructures like the 'Qilin' Embodied Intelligence Training Ground, embodied intelligence has risen as a new technological frontier, redefining the core methods of robot training, industrial automation, and service collaboration.

We will delve into the core dynamics of AI's systematic development at WAIC 2025 through three lenses: training platforms, application models, and domestic computing power.

01

Embodied Intelligence Takes Center Stage:

Evolution of Training Infrastructure

A standout trend at this year's WAIC was embodied intelligence transitioning from concept to engineering reality.

In the simulated "WAIC Skills Stage" within the exhibition hall, over 20 humanoid and structural robots executed a series of high-precision tasks—from peeling eggs and writing to lion dancing, stringing, cooking, and carrying—indicating that motion control and scenario generalization have attained practical proficiency. Behind these demonstrations lies the enhanced support capabilities of the underlying training system.

As China's first heterogeneous humanoid robot training platform, the 'Qilin' Embodied Intelligence Training Ground was systematically unveiled for the first time. Situated in Pudong, Shanghai, the training ground spans 4,000 square meters and can accommodate over a hundred robots for parallel interactive training.

Its core mechanism generates structured data through a real physical interactive environment, aggregating behavioral data including motion execution, task understanding, path planning, and human-robot interaction.

It produces approximately 50,000 high-quality interaction records daily, with plans to accumulate up to 10 million data entries by the end of 2025. This volume of data is crucial for fine-tuning complex control systems, verifying modal migration capabilities, and constructing generalized models.

Technologically, the training ground is not merely a sensor data collector; it also boasts the ability to support heterogeneous systems compatibly and standardize training processes.

On the same physical platform, multiple humanoid robot bodies, actuators, visual modules, and backend large models from various enterprises operate collaboratively. The key challenge lies in establishing a unified task decomposition standard and motion evaluation system.

Currently, the 'Qilin' platform enables collaborative training among robots of diverse models and architectures, initially forming a low-cost, high-frequency "embodied learning pipeline".

The training ground is not an isolated facility but is integrated within the Pudong Model Power Community, an urban-scale AI resource cluster. Leveraging the community's local computing power, cloud data scheduling, and hardware resource support, the training ground achieves a swift closed loop of "task driving - data collection - model training - feedback iteration".

This multi-agent collaborative ecological infrastructure offers a novel paradigm for future embodied intelligence capability validation and iteration across various scenarios, including manufacturing, elderly care, logistics, and rescue.

02

From Multi-modal Models to Computing Power Platforms:

Domestic AI Ecosystem Accelerates Integration

Apart from the evolution of embodied intelligence and humanoid robot infrastructure, another technological highlight of WAIC 2025 was the deep integration of large models and domestic computing power, alongside the systematic enhancement of scenario-based landing capabilities.

At the exhibition site, it was evident that model capabilities have broadened from single-text language processing to multi-modal inputs like images, videos, audio, and spatial perception, closely intertwined with vertical industry applications.

Models centered around 3D content generation made their debut in batches at this year's exhibition.

Contrary to previously requiring professional modeling tools and manual processes for constructing virtual three-dimensional environments, current models can generate complex scenarios within minutes through natural language or image input, significantly lowering the content production threshold.

This generative capability is not only suitable for virtual reality applications in entertainment and education but can also be reverse-embedded into physical space tasks such as robot path planning and scene understanding, becoming the upper-level perceptual support for embodied intelligence.

On the industrial front, model design is evolving towards an end-to-end architecture, supplanting the previous modular task decomposition and specific logical processes.

Some models directly output actions or control commands from collected sensor data, eliminating intermediate state modeling and logical structure design, thereby enhancing generalization performance. Especially in high-risk environments (like mines and post-disaster areas) or weakly structured environments (like home services), this integrated model markedly boosts deployment efficiency and system robustness.

Supporting the above model training and real-time deployment is the rapid expansion of the domestic computing power base. During WAIC 2025, multiple local chip enterprises showcased their new-generation AI chips and platform systems, spanning various architectural pathways from TPUs, GPUs to heterogeneous computing nodes.

In scenarios where inference and training are integrated, locally developed chip platforms already possess basic usability.

For instance, the new-generation integrated GPU system augments data throughput capacity through a combination of self-developed architectures and HBM high-bandwidth memory, fulfilling the training requirements of models with tens of billions of parameters.

Another type of TPU platform emphasizes energy consumption control and large-scale cluster interconnection capabilities, supporting high-speed communication between 1024 chips, capable of meeting the low-latency, high-concurrency computing scenarios of large models.

A more systematic advancement is that domestic computing power platforms are transitioning from single-point chip capabilities towards the construction of a full-link ecosystem.

From compilers, driver protocols, development kits, to computing power scheduling platforms and server-side systems, multiple exhibition stands exhibited complete solutions ranging from terminal chips to cluster platforms. This not only responds to the need for "autonomy and controllability" but also signifies that China's AI infrastructure construction has entered a mature stage oriented towards scenario adaptation.

The construction of the computing power base is also embedded within Shanghai's urban-scale resource layout. Taking Zhangjiang Robot Valley as an example, upstream key component (such as reducers, sensors) enterprises and downstream algorithm companies, high-computing power platforms are clustered, forming a tightly coupled industrial closed loop.

Areas like the Model Power Community and Model Speed Space also provide services such as local computing power, data governance, and application scheduling, constructing a comprehensive AI ecosystem encompassing "models - chips - scenarios - services".

Summary

The 2025 WAIC not only exhibited phased advancements in China's artificial intelligence technical capabilities but also presented a structured system landscape for the first time.

The parallel development of large models, embodied intelligence, and domestic computing power, alongside their ability to interconnect through scenarios, data, and platforms, indicates that China's AI is transitioning from "single-point innovation" to "system integration".

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.