RoboScience Machine Science Makes ICRA Best Paper List for Two Years Running with Its 'Embodied Brain' Innovation

06/07 2026 535

Editor: Lv Xinyi

For the second year in a row, RoboScience Machine Science has secured a spot on the ICRA best papers list.

Each year, there is a pivotal moment in the robotics community that captures the attention of university labs, industrial firms, and leading researchers worldwide, focusing their efforts on a single, prominent platform. ICRA undoubtedly stands as one of these premier stages.

ICRA, the IEEE International Conference on Robotics and Automation, serves as the flagship event of the IEEE Robotics and Automation Society and ranks among the most influential top-tier academic conferences in the global robotics and automation arena.

At ICRA, papers undergo not just standard peer reviews but also face the toughest competition from the global robotics community. For researchers in robotics, having their work accepted by ICRA is a significant achievement; making it to the best paper list places their research at the cutting edge of the global robotics technology landscape.

It was on this prestigious stage that the ICRA 2026 awards were announced. Among the Best Paper Finalists in the 'Robotic Manipulation and Motion' category, alongside top institutions such as UC Berkeley, Stanford University, MIT, and Tsinghua University, was a team from the National University of Singapore (NUS) led by Lin Shao, Chief Scientist at RoboScience Machine Science. Their paper, titled 'Bi-Adapt: Few-Shot Bimanual Adaptation for Novel Categories of 3D Objects Via Semantic Correspondence,' received recognition.

This marks the second consecutive year that Shao's team has been featured on the ICRA best paper list.

A year ago in Atlanta, another paper by their team, 'D(R,O) Grasp,' stood out among thousands of global submissions and clinched the 'Best Paper Award in Robotic Manipulation and Motion' at ICRA 2025. According to the official ICRA 2025 award list, the paper was lauded for its contribution to 'generalizable dexterous grasping representations across different robot hands and objects.'

In essence, RoboScience Machine Science has been recognized by ICRA for two consecutive years precisely because it continues to provide solutions to the most challenging problems in robotic manipulation.

With over 4,000 global submissions and an acceptance rate of less than 1%, and being the only Asian entry selected for two consecutive years, the significance of this accomplishment needs no further elaboration within the robotics community. Beyond the best paper accolades, Shao's team had 10 research studies successfully selected at ICRA 2026 alone, covering core directions such as dexterous grasping, social navigation, low-cost force sensing, and hybrid task planning, systematically pushing the technological boundaries of embodied intelligence.

This demonstrates that a Chinese company established just a year and a half ago is putting its technological roadmap to the test in the most rigorous arena of the global robotics community—and successfully convincing its peers.

Looking back at the papers themselves, both address the same fundamental issue: enabling robots to move beyond 'one object, one policy' limitations.

D(R,O) Grasp allows a single AI 'brain' to control dexterous hands with 3, 4, or 5 fingers, enabling cross-agent grasping of hundreds of objects with a success rate of over 87% and generation times under 1 second. Bi-Adapt enables robots to transfer learned bimanual collaboration actions to unseen object categories after seeing just a few examples—even achieving zero-shot generalization beyond categories.

One addresses 'hand-switching,' while the other tackles 'object-switching.' These seemingly different technical approaches are driven by the same insight: the next phase of embodied intelligence will be won through 'generalization.'

This is also the story and bet of RoboScience Machine Science, a Chinese company established just a year and a half ago—to 'break the generalization bottleneck.' The two founders, Chief Scientist Lin Shao and CEO Tian Ye, represent the two most critical ends of this bet: the methodology of cutting-edge research and the execution of large-scale engineering.

In short, despite being established for just a year and a half, RoboScience Machine Science has managed to push the most challenging generalization problems to the forefront in the robotics community, which emphasizes long-term path dependency, and has been recognized by ICRA for two consecutive years. At least at this level, RoboScience Machine Science is no longer just a simple startup name but a Chinese exemplar moving toward the center stage of the embodied intelligence world.

In the past two years' wave of embodied intelligence, nearly all leading players have converged toward the same technical paradigm—VLA (Vision-Language-Action), using vision and language to directly drive robotic actions. RoboScience Machine Science is one of the few companies that have openly taken a different route, with their core technical architecture called VLOA (Vision-Language-Object-Action). The extra 'O' stands for Object.

This seemingly minor difference—just one additional letter—represents a fundamentally different insight: for robots to truly become 'general-purpose,' they must first learn to understand how objects evolve in the physical world before deciding what to do.

The biggest issue with the VLA approach is that it skips 'understanding the physical world itself'—jumping directly from perception and language to action. While it can work in static, fixed scenarios, it requires recollecting data and retraining models whenever a new object, task, or robot configuration is introduced. In essence, VLA provides hard-bound 'instance-action' pairs that struggle to achieve true 'generalization.'

VLOA aims to solve precisely this issue—enabling a single 'brain' to command any robot, manipulate any object, and complete any task.

In terms of architecture, it consists of two layers: the 'Embodied World Model' and the 'General Manipulation Model,' connected by an interface called Object Trajectory:

The upper-layer Embodied World Model is responsible for 'understanding the world.' Before acting, it allows the robot to mentally preview the future: which object will move where, how its posture will change, and with whom it will interact. It outputs an intermediate representation called a '3D point cloud trajectory'—which intuitively shows the object's motion path while naturally satisfying physical geometric constraints, avoiding common issues like gravity errors or object penetration seen in 2D video generation.

Caption: The Embodied World Model outputs 3D point cloud trajectories.

The lower-layer General Manipulation Model is responsible for 'changing the world.' It translates this trajectory into robot joint angles, contact points, and force control signals, accurately reproducing it in the physical world. Instead of a fragmented pile of 'one task, one model,' it jointly trains all skills, sharing a unified underlying representation.

Caption: The General Manipulation Model drives dexterous hands based on input 3D point cloud trajectories.

The intermediate Object Trajectory is VLOA's most ingenious touch—it completely decouples 'cognition' from 'execution': the upper layer doesn't need to care about the hardware used, while the lower layer doesn't need to care about the specific task. They communicate via a universal language—the '3D point cloud trajectory of objects'—which is both human-readable and machine-executable.

This decoupling delivers precisely what VLOA aims to achieve in three ways:

Cross-object: From smooth shampoo bottles to transparent cotton swab boxes, from rigid parts to soft fabrics, the same model adapts automatically without requiring separate training for each new object.

Cross-task: Opening envelopes requires millinewton-level insertion forces, standing coins requires dynamic balancing, grasping potato chips requires avoiding crushing, and needle injections require precise speed control—tasks that previously required separate algorithm development are now unified under one model.

Cross-agent: The model is fully decoupled from hardware. Swapping to a different dexterous hand allows immediate use. For example, the same strategy can seamlessly transfer to entirely different morphologies like the X-hand (12-DOF gear quasi-direct drive) and LEAP Hand (16-DOF direct drive).

The most compelling case occurred last May when, based on VLOA, RoboScience Machine Science completed the world's most complex, highest-precision, and most step-intensive embodied manipulation task—assembling furniture. This task hit nearly all the challenges of robotic manipulation: in-hand manipulation, bimanual coordination, millimeter-level precision, long-horizon task planning, and force feedback control. The model could start assembly after reading the instructions and, if disrupted by human interference, could automatically recover and complete the task.

More critically, the 'upper limit' of this 'generalizability' is sustainably breakable. RoboScience Machine Science has accumulated over 1 million hours of object-centric multimodal video data for the Embodied World Model, growing by hundreds of thousands of hours weekly, aiming to build a 10-million-hour dataset by the end of 2026. For the General Manipulation Model, based on its self-developed multimodal physics simulation platform RoboMirage, it has accumulated 10 billion high-quality manipulation trajectories, targeting 1 trillion by 2026. Both models have engineering-validated Scaling Laws—the larger the dataset, the stronger the generalization capability, with predictable power-law improvements.

This means the path RoboScience Machine Science has bet on is not just a clever technical architecture but an engineering system that can continuously snowball—where greater scale leads to greater advantage.

In short, VLOA is more aggressive and far-sighted than VLA in that it aims to let robots truly break free from remote controls, autonomously understanding and changing the world.

When viewed alongside RoboScience Machine Science's two consecutively awarded papers, this model represents the same underlying effort: reconstructing robots from hardcoded 'instance-action' pairs into generalizable 'relationship-trajectory' systems. This foundational paradigm is the most valuable bet RoboScience Machine Science has placed.

The interdisciplinary nature of embodied intelligence makes it clear that relying on a single type of talent rarely closes the loop.

A team skilled only in academia can write papers but cannot build products; a team skilled only in engineering can build hardware but cannot develop truly cutting-edge algorithms. A direction like embodied large models, which requires both frontier originality and scalable deployment, demands bringing these two rare talent types together in one team.

RoboScience Machine Science's scarcity lies precisely in having assembled both ends.

Consider Chief Scientist Lin Shao. He is a key figure in China's 'Stanford school' of embodied intelligence. With an undergraduate degree from Nanjing University and a Ph.D. from Stanford University, he studied under Jeannette Bohg, a renowned scholar in robotics, with Leonidas J. Guibas as his co-advisor—a member of the U.S. National Academies and one of the foundational figures in computer graphics and geometric processing. Today, Shao serves as an Assistant Professor at the National University of Singapore (NUS).

Stanford's robotics circle has become an excellent vantage point for observing China's embodied intelligence landscape in recent years. Su Hao (founder of Hillbot), Wang He (founder of Galaxy General), Lu Cewu (founder of Qiongche Intelligence), and others share the same academic lineage as Shao. These individuals now support much of China's embodied intelligence sector—meaning Shao naturally holds a coordinate-level position within the country's most cutting-edge academic circles.

More critically, Shao Lin's main research focus has been on the "generalization" of robotic manipulation since his doctoral days: from the early UniGrasp, to D(R,O) Grasp, Bi-Adapt, and now T(R,O) Grasp, which will be presented at ICRA 2026—you can see a very clear research trajectory: continuous abstraction, unification, and pursuit of universal representations across different robotic bodies and objects.

This long-standing and consistent research direction means that the technological foundation of RoboScience is not a hastily assembled product, but rather a methodology refined over nearly a decade by a top scholar.

Now, let's look at Tian Ye. What he possesses most rarely is his engineering prowess, honed through his journey from the Physics Department at the University of Science and Technology of China (USTC), to Stanford AI Lab, and finally to Apple's AI Platform. Tian Ye graduated with a bachelor's degree in physics from USTC and a master's degree from Stanford AI Lab, where he was mentored by Andrew Ng, renowned as the "Evangelist of AI."

After graduation, Tian Ye joined Apple as the technical lead for the AI Platform—a position whose significance can only be truly appreciated by those within the AI circle. The core platform he spearheaded is dubbed "Apple's PyTorch and CUDA" by industry insiders: it supports the large-scale deployment of multiple key AI technologies within the Apple ecosystem, serving as the infrastructure that enables Apple's AI to run stably across billions of devices.

In essence, Tian Ye transcends the conventional image of a "tech-savvy CEO." He is a rare engineering leader who possesses a deep understanding of both state-of-the-art algorithms and the intricacies of integrating them into industrial systems designed for billions of users.

This proficiency is exactly what the industrialization of embodied AI demands. The industry's current tendency to demystify the once-esoteric process of model training, likening it to "industrial production," stems from the realization that transitioning cutting-edge models into mass-deployable, stable-running, and continuously-iterating products hinges not on algorithmic wizardry but on robust engineering foundations. This encompasses the establishment of a stable data pipeline to feed models, the development of an inference framework capable of supporting real-time control, and the adoption of an engineering paradigm ensuring consistent experiences across diverse hardware platforms. These challenges are not the forte of researchers but are precisely the problems that only industrial-grade AI engineers can tackle.

Hence, the collaboration between Shao Lin and Tian Ye forms one of the most formidable partnerships in the industry: a Stanford academic luminary paired with a Silicon Valley engineering leader.

The rationale behind this assertion is evident when examining China's embodied AI landscape. Most companies either concentrate on achieving 0-1 technological breakthroughs spearheaded by academic heavyweights or excel at 1-10 expansion under the guidance of engineering-driven founders. A partnership like RoboScience, which unites two of the strongest players, is a rare gem. It signifies that the company can stay abreast of cutting-edge original research while sidestepping engineering pitfalls during product implementation.

In brief, it can adeptly navigate technological cycles and flourish in the long-term, high-potential field of embodied AI.

Established in December 2024, RoboScience has swiftly ascended to the forefront of China's embodied AI scene within just a year and a half.

Firstly, let's consider its capital structure. The company has disclosed multiple rounds of financing, with investors including JD Group, SenseTime, Fortune Capital, China Merchants Capital Innovation, 01VC, and Puhua Capital, among other CVCs and financial institutions. It has recently secured additional funding rounds from a multitude of domestic and international industry leaders, internet giants, and top-tier financial institutions.

Next, its layout. RoboScience has established R&D and production networks across Beijing, Shenzhen, Suzhou, and Hangzhou. Its team members hail from prestigious institutions such as Stanford, USTC, and the National University of Singapore, as well as leading companies like Apple, ByteDance, Tencent, and DJI. This diverse background covers both cutting-edge AI algorithm research and engineering capabilities for mass-producing smart hardware.

In terms of product strategy, RoboScience adopts a full-stack, software-hardware integration approach. At the upper layer, it boasts the VLOA large model, while at the lower layer, it concurrently develops robotic bodies, end-effectors, and its self-developed multimodal physical simulation platform, RoboMirage. This signifies that the company aspires not merely to be a supplier of "embodied brains" but to forge a complete product closed loop, spanning from models and bodies to end-effectors and data training.

Finally, let's examine its implementation. The company has already embarked on pilot collaborations with multiple retail, logistics, and healthcare service enterprises, as well as robotic body and dexterous hand companies. According to public plans, it will also achieve mass production of standardized robotic body products for industrial and commercial scenarios this year.

Returning to the initial question: How has a Chinese company, founded merely a year and a half ago, managed to garner consecutive international recognition in the robotics academic community, which places a premium on long-term path adherence?

The answer is no longer shrouded in mystery. What this company is undertaking is not merely packaging existing fragmented technologies into isolated products but rebuilding a technological stack for embodied AI from the most fundamental methodologies. The continuous recognition from ICRA is merely the tip of the iceberg for this technological stack, which aims to create a universal intelligent system applicable to any task, object, and robot.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.