Humanoid Robots: The Ultimate Vision, but Immediate Application Scenarios are the Key to Current Embodied AI

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

04/22 2026 552

Author | Li Murong

The excitement surrounding embodied AI is escalating rapidly.

Valuations are skyrocketing, and the narratives are becoming increasingly ambitious, yet truly deployable products remain rare.

Amid the current frenzy in the embodied AI sector, where billion-dollar valuations and record-breaking funding rounds are the norm, the relatively modest multi-million dollar funding secured by Xingcan Intelligence in late March this year stands out for its pragmatic approach.

However, if we shift our focus from the funding amounts and delve into Xingcan Intelligence's technological approach and business strategy, a clear signal emerges:

This company may represent an alternative path—one that is closer to real-world commercialization.

The Xingcan Intelligence team is distinctly characterized by its "autonomous driving DNA"—founder Li Zhanbin and his team members hail from intelligent driving teams at Geely, Baidu Apollo, and other leading companies.

Rather than diving headfirst into humanoid robots, they have focused on two products: lawn-mowing robots and smart wheelchairs.

Their core strategy is clear: to integrate L4-level spatial intelligence and embodied AI technologies into the essential scenarios of home care and rehabilitation, creating "spatial intelligence + embodied interaction" travel companion robots that achieve perception and control before advancing to cognitive intelligence.

It is evident that Xingcan Intelligence is validating a key point: embodied AI does not have to start with the "most complex form" but can begin with the "most implementable scenarios."

Beyond Roads: The Next Frontier for Autonomous Driving Talent

The fact that the Xingcan Intelligence team is entirely composed of professionals from the autonomous driving sector is not an isolated case.

Since 2024, core talent from the intelligent driving industry has been leaving in droves, flocking to the embodied AI sector.

Wang Kai, former CTO of Li Auto; Chen Wei, former Chief AI Scientist at Li Auto; and Yu Yinan, former President of Horizon Robotics' Intelligent Driving Division, are among those who have transitioned into embodied AI.

This talent migration is not coincidental but stems from the growth bottlenecks in the entire intelligent driving industry.

The root cause lies in the fact that the intelligent driving industry no longer guarantees returns.

Over the past decade, Robotaxi has been the "ultimate dream" of the intelligent driving industry, with capital pouring in tens of billions of dollars.

Companies such as Apollo Go, Pony.ai, WeRide, and Didi Autonomous Driving have achieved regular passenger transport in Beijing, Guangzhou, Shenzhen, and other cities. However, the large-scale implementation of L4 autonomous driving still faces significant bottlenecks.

Vehicle intelligence continues to encounter insurmountable technical limitations in long-tail extreme scenarios, all-weather robustness, and full-scenario redundancy, leaving safety baselines not entirely solidified.

Simultaneously, the fragmentation of scenarios and insufficient collaboration between B-end and C-end have yet to form a replicable business closed loop.

With repeated delays in technology implementation, capital patience is wearing thin, and many investors are adopting a wait-and-see approach.

More critically, automakers are scaling back their in-house R&D investments in intelligent driving, shifting toward external collaboration models. Against the backdrop of accelerated automotive intelligence, the drawbacks of high-investment, long-cycle in-house R&D routes have become increasingly apparent. What was once a "core growth engine" is gradually transforming into a "cost center."

For algorithm engineers within the industry, the more pressing issue is the decline in marginal value.

The intelligent driving industry has passed its "rapid expansion phase" and entered an engineering convergence phase.

Perception, decision-making, and planning modules have become relatively mature, with leading solutions performing near their ceilings in structured scenarios such as highways and urban expressways.

The remaining long-tail problems rely more on massive data collection and extreme scenarios rather than single-point algorithmic breakthroughs. Competition is intensifying, and the room for talent development is narrowing.

In contrast, embodied AI is still in its early explosive growth stage, with technologies highly overlapping with intelligent driving, yet it offers a broader imagination space and capital enthusiasm.

For talent, this is almost a natural extension of technology and a re-release of value.

Therefore, the "shift" of intelligent driving elites to embodied AI is less of an active choice and more a result of the fading "certainty" in intelligent driving. Embodied AI provides a new outlet with high technology reuse, rigid demand, and broader scenarios.

Understanding the Physical World: The Key to Bringing Robots into Homes

For home robots, the question has never been about "whether there is a large model" but whether they can understand the physical world.

So, what is spatial intelligence, and why is it the optimal solution for navigating complex home environments?

Simply put, while large language models enable AI to understand text, spatial intelligence enables AI to understand the physical world.

Li Feifei refers to spatial intelligence as the "cornerstone of world models" and once provided a precise definition:

"Spatial intelligence is the ability for AI to perceive, reason, and act in a three-dimensional world."

She explains that spatial intelligence addresses the closed loop between the "cerebellum" and "senses," enabling AI to possess real-time perception of three-dimensional space and physical coordination abilities.

Without spatial intelligence, even the most powerful language models remain confined to screens. With spatial intelligence, AI can truly enter our living rooms, kitchens, bedrooms, and other complex scenarios.

The common dilemmas faced by robots in home scenarios include being unable to cross sliding door thresholds, climb living room carpets, or navigate around chair and sofa legs—they simply spin in circles. These are widespread challenges for current robots.

Not long ago, a cleaning robot entered homes to perform household chores, but its shortcomings were evident:

It took nearly 10 minutes to fold a single piece of clothing, moved clumsily, dropped shoes, and could operate only in highly restricted scenarios.

The core reason for these issues is that robots' actions are largely based on preset programs. Even slight environmental changes can leave them at a loss, as they fail to form an efficient closed loop in "perception, prediction, and control," resulting in insufficient anti-interference capabilities in unfamiliar environments.

This also reveals an industry status quo:

While robot hardware is advancing rapidly, their "brains" and "cerebellums"—the abilities to perceive, understand, and respond quickly to the physical world—are lagging.

Why is spatial intelligence the optimal solution for home scenarios?

Because the core characteristic of home environments is their "unstructured, unpredictable, and highly diverse" nature.

Children's toys are scattered randomly, furniture layouts vary from household to household, and sudden changes in lighting, obstructions, and floor irregularities are commonplace.

Every home has a unique layout, item storage, and living habits, placing extremely high demands on embodied AI entering homes.

Traditional robots operate based on preset rules and fixed paths, making them prone to "malfunctioning" when faced with unexpected situations.

In contrast, robots equipped with spatial intelligence can construct a real-time semantic understanding of three-dimensional spaces, like humans, and autonomously decide whether to detour, wait, brake immediately, or proceed slowly.

Xingcan Intelligence's approach with smart wheelchairs is to first address spatial understanding and motion control before adding intelligence.

They have developed the proprietary XcanSense 5D Perception System as the technological foundation for spatial intelligence.

Through multi-sensor fusion, it achieves real-time centimeter-level semantic mapping and understanding of indoor and outdoor environments, granting robots a universal "spatial cognition" ability.

Simultaneously, they have "lightweighted" the automotive-grade technical architecture of L4 autonomous driving, proposing a "cerebrum-cerebellum collaboration" architecture.

The "cerebrum" (decision-making layer) handles task planning and scenario understanding;

The "cerebellum" (control layer) specializes in high-real-time motion control and dynamic obstacle avoidance.

This equips robots with a "universal high-precision map for indoor and outdoor use" and an "environment-understanding brain," enabling them to navigate around desks and chairs in studies and identify the boundaries between lawns and flower beds in yards.

Now, more and more companies are beginning to explore spatial intelligence.

The Unitree G1 humanoid robot, equipped with OpenClaw, can now preliminarily understand space and time, recognizing the positions of rooms, people, and objects.

At the 2026 AWE event, cleaning robots are gradually developing "brains" (AI chips), "eyes and ears" (multimodal sensors), "hands and feet" (robotic arms and wheeled legs), and learning to "think" (autonomous decision-making via large models).

For example, Roborock's G30S Pro introduces a fusion perception system combining an RGB camera and triple-line structured light, along with Reactive AI obstacle avoidance algorithms, enabling it to recognize over 280 common obstacles.

The Dreame X60 series implements AI large model-supported natural language interaction, allowing robots to understand vague user requests and autonomously plan and execute tasks.

In summary, the transition from virtual to physical and the cognition and reconstruction of real space have become essential paths for AI evolution. Spatial intelligence is the optimal solution for robots entering home environments.

Commercializing Embodied AI: Functionality Trumps Form

Why is Xingcan Intelligence considered one of the players closest to "real-world commercialization"?

In the current embodied AI sector, most players are fixated on developing general-purpose humanoid robots but often fall into the trap of prioritizing form over functionality, making short-term technological implementation unattainable.

Xingcan Intelligence's core strength lies in its precise grasp of industry needs and technological strategies.

Instead of chasing the "humanoid robot" industry trend, it has anchored itself in the essential sector of home care, concentrating resources on the most critical perception and decision-making capabilities to rapidly achieve product closure.

Traditional electric wheelchairs, and even most comparable smart wheelchairs, have only addressed the basic need for "mobility" without solving safety, independence, and companionship issues for elderly users.

Comparable smart wheelchairs either overemphasize feature stacking, simply adding basic functions like navigation and voice control, or become mired in "technological one-upmanship," blindly incorporating high-end hardware, leading to soaring costs and making them inaccessible to ordinary households.

Xingcan Intelligence has precisely captured the core needs of elderly mobility, deeply integrating multimodal perception technologies from the intelligent driving field into smart wheelchairs.

In terms of perception, Xingcan employs a multimodal perception solution combining LiDAR, binocular vision, millimeter-wave radar, and IMU, constructing a perception network that far exceeds comparable products. It achieves centimeter-level environmental perception and dynamic prediction, ensuring stable operation in nighttime, bright light, or dim environments.

In decision-making and planning, Xingcan adopts a hybrid architecture combining on-device small models and rule-based fallback mechanisms. Common commands such as following, obstacle avoidance, and path planning can be executed with millisecond-level responsiveness.

For complex semantic understanding, it calls upon local lightweight LLMs, avoiding delays and network outages caused by cloud-based large models.

Compared to the Strutt ev¹ in the same sector, which, according to public information, is equipped with two LiDARs and ten time-of-flight sensors, offering 360° obstacle avoidance capabilities and relatively complete hardware specifications, its product positioning leans more toward "high-precision control" rather than "deep environmental cognition." It still has limitations in recognizing dynamic objects and semantic understanding, while its core capabilities rely on continuous algorithmic iterations.

Additionally, its price tag of $7,499 makes it prohibitively expensive for ordinary households, posing significant challenges for commercial popularization.

In contrast, Xingcan Intelligence, with "spatial intelligence" at its core, achieves performance breakthroughs through technological optimization. Its 5D Perception System can run on embedded chips, meeting the cost requirements of consumer-grade products, with mass production expected in Q4 2026.

The strategy of the "perception-first, lightweight model" inherently provides cost controllability, aligning more precisely with the dynamic requirements of home environments.

This indicates that the hardware costs associated with Xingcan's methodology have decreased to a level conducive to widespread adoption.

From an industry standpoint, a more noteworthy aspect is that Xingcan adopts a "laying eggs as you go" approach:

Lawn-mowing robots generate extensive data from outdoor unstructured environments.

Terrain variations, lighting fluctuations, and dynamic obstacles constitute precisely the training materials most essential for spatial intelligence systems. Once amassed, this data can be directly repurposed for the indoor and outdoor perception R&D efforts related to smart wheelchairs.

Subsequently, the technology can be incrementally extended to humanoid companion robots. Each product functions both as a profitable commercial entity and as infrastructure for the next phase of technological advancement, achieving synergistic effects through the accumulation of data and technology.

This model, which "avoids seeking a one-step solution, starts with fundamental scenarios, and iterates through technological reuse," also emphasizes that the crux of embodied AI commercialization lies not in form but in "usability, effectiveness, and profitability."

While humanoid robots may epitomize the ultimate objective of embodied AI, fundamental scenarios represent the essential pathway to achieving that goal.

Xingcan's decision hints at a potential scenario:

In this industry, the entity that first attains a sustainable commercial closed loop may not be the most glamorous contender but rather the one that initially addresses genuine needs and resolves practical challenges.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links