A Humanoid Robot Becomes an Office Intern: The 'Reinforcement Learning' Journey of a Former NVIDIA Engineer

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

07/02 2026 417

Humanoid robots can run and jump, but they are still one step away from being truly useful. A Swiss startup called Flexion, founded by a former NVIDIA engineer, uses 'reinforcement learning' to enable robots to autonomously learn tasks like opening doors, climbing stairs, and moving boxes—could the end of office chores be near?

Humanoid robots can run, dance, and occasionally kick. But to truly 'act human,' they need to learn various office tasks. Now, Swiss startup Flexion Robotics, founded by a former NVIDIA robotics researcher, believes it has found the answer. The company has developed a method to train robots to perform complex tasks, including opening doors, climbing stairs, and moving boxes. The key is to first teach robots individual skills in a simulated environment and then let a 'master AI algorithm' decide how to combine them.

Most robot demonstration videos showcase robots trained to perform a specific task, such as folding shirts or stocking shelves. Typically, this training is done through 'teleoperation'—where a person controls the robot's movements behind the scenes. However, this method becomes unreliable when robots enter unfamiliar environments. Flexion claims its system is different—and more efficient—because it trains robots in simulated environments with limited human instructions.

A Unitree Robot's 'Workplace Debut'

In a demonstration video, a modified Unitree humanoid robot receives the instruction: 'A snack package has been delivered to Flexion. Please take the stairs to retrieve it, then take the elevator up. Open it and place the snacks in an empty drawer on the snack area shelf.' The robot executes this series of actions completely autonomously. It achieves this by combining different AI systems.

The master AI model understands what to do by 'watching' videos of humans performing various tasks—for example, to go to the mailroom, it knows it needs to open certain doors and use the elevator. But the videos only teach it 'when to perform which action,' not 'how to physically execute it.' The software then triggers skills it learned in a simulated environment and executes them in the real world. The system also simultaneously controls the robot's motors, enabling it to walk, move its limbs, and maintain balance.

'Reinforcement Learning' as the Secret Weapon

Image source: Wired

According to Nikita Rudin, co-founder and CEO of Flexion and a former NVIDIA robotics scientist, the 'secret sauce' of this software is the large-scale use of 'reinforcement learning'—allowing computers to master tasks through repeated trial and error. From the master AI model to the simulated environment to motor control, every layer of the software adopts this approach. 'Humanoid robots themselves are not the interesting or revolutionary part,' says analyst George Chowdhury. 'What really matters are the AI models that support them.'

ABI Research estimates that the market for robot foundation models could reach $150 billion by 2036. Flexion is collaborating with multiple robotics companies and emphasizes that its software can be used across different humanoid robot platforms. Chowdhury points out that Flexion needs to work closely with hardware manufacturers to succeed and will face fierce competition. However, without the programming capabilities demonstrated by Flexion, 'this market simply wouldn't exist.'

The Business Logic Behind the 'Office Intern'

Tech leaders like Elon Musk and Jensen Huang believe that humanoid robots will have a massive impact on the economy, as they could eventually replace a significant amount of human labor. However, Flexion's demonstrations reflect the reality that enabling humanoid robots requires fundamental advancements in AI. While the hardware for humanoid robots on the market is becoming increasingly mature, what is lacking is the 'brain' that allows them to learn and adapt autonomously. If Flexion's reinforcement learning approach can be scaled, it could become the key to unlocking the commercialization of humanoid robots.

Notably, Flexion has chosen a 'software-first' approach—rather than manufacturing its own robot hardware. This strategy is similar to NVIDIA's approach of providing the 'robot brain,' both aiming to capitalize on software and AI as hardware becomes commoditized. In the 'gold rush' of the robotics industry, whether the hardware vendors 'selling shovels' or the software vendors 'selling maps' will come out on top remains to be seen.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links