10/09 2025
596
Editor's Note:
Embodied AI represents not a singular technological breakthrough but a global trend propelled by the synergy of capital, engineering prowess, and real-world applications. Much like the Age of Discovery five centuries ago, we are now redefining the boundaries of the physical world through intelligence.
Galaxy Frequency is proud to present the "Embodied AI Great Navigation" series, offering a comprehensive global perspective on key sectors including general-purpose robots, companion robots, robot dogs, and large-scale robot models. Through in-depth analysis, we spotlight the industry's leading players.
Dubbed "Great Navigation," this series chronicles how pioneers harness the power of algorithms, hardware, and capital to chart new territories.
Just as every great voyage reshaped the world, the journey of Embodied AI is redefining the relationships between humans and machines, and between technology and society. In this series, our focus extends beyond who reaches the shore first; we delve into who sets the course, how to navigate through the challenges (Note: "foam" retained for context, referring to turbulent conditions), and where the truly valuable future lies.
Previous article: "The Billion-Dollar Club of Embodied AI: 10 Global Players"
Author | Mao Xinru
Recently, Figure AI set a new benchmark for the Embodied AI industry with a staggering $1 billion financing round, propelling its valuation to $39 billion. In just three years, the company has emerged as the highest-valued startup in the global Embodied AI sector.
For a time, Figure AI was labeled by some media as the "American counterpart of Unitree." However, a closer examination of the criteria behind this label reveals that it stems more from public discourse than technical merit. From a technical standpoint, the two companies exhibit significant differences.
Unitree is a quintessential "hardware-centric" company, with 65% of its current revenue derived from the quadruped robot dog business. In contrast, Figure AI is a full-stack software and hardware enterprise, boasting self-developed core components such as robot bodies, batteries, and sensors. On the software front, its Helix robot brain empowers robots to perform tasks like towel folding and package sorting.
Undoubtedly, any company aiming for absolute dominance in the humanoid robot market must possess exceptional capabilities in both software and hardware. This is why numerous companies are actively addressing their weaknesses to enhance market competitiveness.
Companies like Figure AI, which excel in "brain + hardware + capital," have become benchmarks for Embodied AI companies worldwide.
In the Embodied AI competition spearheaded by China and the United States, four Chinese contenders have emerged as potential "Chinese counterparts of Figure AI": Galbot, Starrobo, Sping Intelligence, and Astribot.
Astribot and Sping Intelligence: Breakthroughs in Brain Technology
In the realm of Embodied AI, hardware iterations are often the most visible and easily dissected. However, it is the software and models that ultimately determine the upper limits of robot intelligence.
In other words, even with the most exquisite arms and joints, robots will struggle in complex real-world environments without a smart brain, cerebellum, and coordinated joints.
Although Astribot and Sping Intelligence differ from Figure AI in robot morphology—opting for wheeled-legged designs versus Figure AI's bipedal humanoid—when focusing solely on the brain dimension, both exhibit ambitions akin to those of the Helix model.
Astribot's DuoCore adopts a fast-slow brain architecture. The fast brain handles real-time reactions and basic motion control, while the slow brain processes complex decision-making and long-term planning. This design shares similarities with Figure AI's Helix model.
Comparing Astribot's fast-slow brain with Figure's Helix, both support decision-making mechanisms that require different time scales to balance reasoning and real-time control.
Figure packages this approach as a combination of System 1 (fast) and System 2 (slow), where System 1 executes high-speed control, and System 2 handles high-level planning, language, and scene understanding. Astribot modularizes a similar hierarchy into "DuoCore + real-time trajectory generation," emphasizing strategies like imitation learning and incremental end-effector space control to enhance robustness.
The distinction lies in Figure's Helix focusing on applying a general-purpose VLA across multiple agents and optimizing it through large-scale demonstration data closed-loop feedback. In contrast, Astribot emphasizes top-down software-hardware synergy and achieves large-scale real-world sample accumulation and transfer through engineered teleoperation and simulation collection.
Sping Intelligence, dubbed the "Chinese counterpart of Figure AI" in media reports, unveiled its large model Spirit V1 and humanoid robot Moz1 this year. The former conquered the challenge of long-range manipulation of flexible objects for the first time, while the latter is China's first embodied AI robot with high-precision whole-body force control.
Compared to Figure's Helix, Sping Intelligence shares the commonality of viewing end-to-end VLA as a key capability, aiming to enable a single model to understand natural language instructions, process perceptual inputs, and output continuous actions.
Meanwhile, Sping Intelligence faces challenges similar to those of Figure, namely how to transition from prototype demonstrations to stable operation in numerous real-world scenarios while continuously collecting high-quality training data.
From an architectural perspective, all three models improve upon the traditional trade-off between speed and precision by dividing labor between high-frequency systems for action execution and low-frequency systems for decision-making and planning.
Helix's advantage lies in its extreme lightweight design and low data dependency, enabling operation on embedded GPUs without cloud computing support.
DuoCore enhances dynamic correction capabilities, with a fast brain response frequency of 250Hz, surpassing Helix and better suited for high-precision industrial applications.
Spirit V1's innovation lies in flexible object manipulation, achieving complex long-duration tasks like full-process clothes folding through multi-source data fusion, with generalization capabilities closer to household scenarios.
Galbot: Multidimensional Similarities in Capital and Brain Power
In the "capital-intensive" sector of Embodied AI, technological prowess alone is insufficient; financing capabilities significantly impact technological iteration. Substantial funding ensures sustained financial support for technological development, while ample capital fuels talent acquisition.
Figure AI's high profile stems not only from its technological narrative but also from its consecutive large-scale financing rounds, providing ample resources for prototype development and big data closed-loop construction.
In terms of capital, China's Galbot shares similarities with Figure AI. Both companies have achieved rapid valuation surges through a strategy of few financing rounds with large single investments.
Founded two years ago, Galbot has secured three rounds of financing, raising over RMB 2.4 billion with a valuation reaching RMB 10 billion, ranking among the top three domestic Embodied AI companies. Figure AI, founded three years ago, has completed four rounds of financing, raising over $1.754 billion with a valuation of $39 billion.
Additionally, both companies have benefited significantly from the "capital-industry" dividends brought by their investors. For example, CATL, the lead investor in Galbot's Pre-A round, provided deployment scenarios in real factory environments for robot testing and data collection.
Brookfield, a global top-tier alternative asset management company and participant in Figure's Series C round, will assist in building the world's largest and most diversified pre-training dataset for humanoid robots and provide access to numerous real household scenarios for testing.
Beyond "investing in people" and "investing in stories," the strong "capital-raising" abilities of these two companies ultimately stem from their hardcore strengths at the model level.
Like Astribot and Sping Intelligence, Galbot opts for wheeled-legged robots, differing from Figure's bipedal humanoid design. However, Galbot's large model GraspVLA, like Helix, adopts an end-to-end VLA approach and has garnered widespread attention in the industry.
GraspVLA's standout feature is its exceptional generalization capability. It is the world's first model capable of zero-shot generalization with pre-training alone, adapting to variations in height, planar position, object category, lighting, interference, and background. It also possesses autonomous decision-making and strong anti-interference capabilities.
Unlike Helix, GraspVLA is more scenario-specific. Galbot primarily targets commercial scenarios such as pharmacies and retail, having already secured contracts with 100 smart pharmacies. Galbot's strategy is to first address scalability in specific scenarios before pursuing broader model generalization.
The robotics industry may appear as a trillion-dollar market from afar, but upon closer inspection, it consists of 10,000 "$100 million markets." Thus, Galbot chooses to first penetrate measurable and controllable commercial scenarios.
In contrast, Figure's Helix model emphasizes versatility, aiming to perform various tasks in diverse environments with the goal of large-scale applications in both factories and households.
From Figure's early collaboration with BMW's factory to its recent partnership with Brookfield, it follows a "factory-to-home" generalization path. This approach requires higher upfront investment, longer cycles, and greater risks but offers potentially more enormous returns.
Starrobo: Similarities in Full-Stack Software and Hardware Routes
Among the four contenders, Starrobo most closely resembles Figure AI's full-stack layout in product design. Starrobo has developed the humanoid robot L7, wheeled robot Q5, and dexterous hand XHAND1, forming a complete product matrix.
This diversified hardware strategy differs slightly from Figure AI's focus on bipedal humanoid robots, reflecting Chinese companies' flexible adaptation to market demands.
Starrobo comprehensively benchmarks Figure AI across three dimensions: body morphology, brain architecture, and scenario deployment, constructing a complete framework of "similar hardware, aligned software, and comparable scenarios."
In body design, Starrobo's humanoid robot L7 won the high jump championship at this year's World Humanoid Robot Olympics, demonstrating explosive power and athletic performance. Meanwhile, L7 can also perform tasks like package sorting, intelligent assembly, dancing, and bartending.
Figure's humanoid robot Figure 02 similarly showcases capabilities like package sorting, towel folding, and dishwasher operation. Powered by the Helix brain, it can even achieve dual-robot collaboration.
Although Figure's robots have not extensively demonstrated their athletic abilities, in structured or semi-structured environments like factories, robots from both companies have mastered similar skills such as package sorting.
Another critical commonality on the hardware side is that both companies utilize self-developed dexterous hands to demonstrate fine manipulation capabilities. This full-chain self-development capability, from robot bodies to key actuators, is one of the core reasons for their high market recognition.
The high-performance displays of humanoid robots from both companies rely on support from high-performance models. Starrobo also bets on an end-to-end VLA model, with its ERA-42 model integrating vision, understanding, prediction, and action into a unified system.
Compared to Figure's Helix, the two models share three commonalities:
High-frequency response capability: ERA-42's reasoning frequency exceeds 30Hz. Although lower than Helix's 200Hz, it still meets real-time action feedback requirements, enabling proactive action planning through video prediction to avoid operational errors.
Data-driven logic: Both models learn skills by watching human operation videos. ERA-42 can directly extract operational logic from videos, significantly reducing data costs, akin to Helix's automated annotation technology.
End-to-end architecture: Both models eliminate intermediate conversion steps, directly translating natural language instructions into actions, simplifying the application development process.
Although Starrobo's financing does not rank among the top tier in China, it has built a globalized ecosystem in terms of orders, having delivered 300 units with an additional 500 units pending delivery.
Furthermore, nine out of the top 10 global tech giants by market capitalization are Starrobo's clients, along with domestic partners like Lenovo, Haier, and Beijing Universal Technology. This market recognition follows the same logic as Figure AI's BMW order, reflecting a virtuous cycle of "technological strength - scenario deployment - market recognition."
Who is more like the Chinese version of Figure AI?
After comparing the four companies, Stardust Intelligence, Qianxun Intelligence, Galaxy General, and Xingdong Era, with Figure AI, we can break down 'being like Figure AI' into three more specific dimensions:
Whether there is an end-to-end VLA large model capability validated across multiple scenarios
Is there hardware engineering capability that is integrated with the model?
Is there adequate funding and access to the industrial chain to support long-term iterative development?
When evaluating the four companies across these three dimensions, it's clear that none can truly replicate [another company's success] in its entirety.
Galaxy General boasts strengths in funding and scenario-based implementation. Its large-scale simulation data model, pilot store orders, and substantial financial resources provide it with an unassailable competitive advantage.
Stardust Intelligence exhibits methodological similarities in model architecture, betting that its software framework will ultimately define its performance ceiling. Similarly, Qianxun Intelligence shares many technical aspects with Figure's approach, but it needs to produce more verifiable results through real-world prototype testing and scenario validation.
Xingdong Era most closely resembles Figure in both hardware and software aspects, whether in terms of technical alignment, closed-loop full-stack capabilities, or the commercial strategy of prioritizing industrial scenarios.
Of course, we must remember that this year marks only the inaugural year of mass production for embodied intelligence. For this long-term industry, it may be premature to declare success now. However, we can still glean insights into enterprise development from four key indicators.
First, long-term operational data of robots in real-world settings. While demo showcases have become common in the industry, the key to performance enhancement lies in transforming one-time demonstrations into consistent, reliable execution. Specifically, whether the same system can maintain low failure rates over extended periods across multiple stores, workshops, or even households in real-world conditions.
Second, real-world mass production and delivery data from enterprises. Since the beginning of the year, an increasing number of companies have announced order signings. It is conservatively estimated that humanoid robot sales in China will exceed 10,000 units this year.
Meanwhile, many investors have indicated that the period from the end of this year to early next year will be a critical evaluation phase. The number of robots a company can ultimately deliver, along with post-delivery operations, maintenance, and reputation, will all test the company's supply chain and service capabilities.
Third, the efficiency of improving the 'model-hardware' closed loop. Whether a company can effectively convert simulation, remote operation, and other data into model updates and deploy them to real machines is crucial for enhancing robot intelligence. Once intelligence reaches a certain level, robots will be able to execute any command in unfamiliar environments, marking the industry's long-awaited 'ChatGPT moment' for robots.
Wang Xingxing predicts that this moment could arrive within the next 1-2 years at the earliest, and the company that achieves this milestone first will gain the core momentum for accelerated progress.
Fourth, the sustainability of funding and industrial resources. Zhao Tongyang from Zhongqing once stated that without 5 billion yuan in cash reserves, a company is likely to be forced out of the competition due to a funding crisis. Therefore, the company that can maintain its R&D pace while balancing cost reduction, mass production, and supply chain control across multiple iterations will possess greater long-term competitiveness.
In summary, whether it's the comprehensive layout of Xingdong Era, the financial strength of Galaxy General, or the algorithmic advantages of Stardust Intelligence and Qianxun Intelligence, the different strategies of these four companies reflect the diversified ecosystem of China's embodied intelligence sector.
As industry hype gradually fades, true competition will depend on each company's ability to synergize 'technology-scenario-data-commercialization implementation'.
The ultimate outcome of this race will not be the emergence of a 'carbon copy' of Figure AI but rather which company can forge a development path that better aligns with industrial needs.
It's even possible that the ultimate winner will not be a single company but rather entities that can simultaneously achieve a closed loop between the brain (intelligence), body (hardware), and commercialization, or ecological players that integrate different strengths through collaboration.
After all, the ultimate outcome of this intelligence revolution has never been about the victory of a single model but rather a future jointly shaped by diverse technological paths.