What Robots Lack to Develop a Master's Touch: Tactile Data

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

06/26 2026 536

Author｜Li Murong

Around the Dragon Boat Festival, a video of a robot wrapping zongzi (traditional Chinese rice dumplings) attracted significant attention.

In the video, the robot completed 13 consecutive steps, including picking up bamboo leaves, folding them into funnel shapes, filling them with rice and stuffing, tying them with cotton strings, vacuum sealing, and packaging them into gift boxes. While these steps appeared to be routine and repetitive, each action continuously tested the same capability: tactile feedback.

When picking up bamboo leaves, too much force would tear them, while too little force would make them unstable. When folding the funnel shape, even a slight deviation in angle would cause rice grains to spill out through the gaps. When tying with cotton strings, the robot had to ensure the right tightness without breaking the bamboo leaves.

These seemingly simple operations essentially pointed to the same question: whether the robot knew what it was touching and how much force to apply.

Since the beginning of this year, the tactile technology sector has continued to heat up. Paxini has secured over 1 billion yuan in financing, with its valuation exceeding 10 billion yuan. Daimeng Robotics has also completed multiple rounds of financing worth hundreds of millions of yuan in a short period.

At the same time, more than 25 tactile sensing and data collection companies, including Lingchu Intelligence, Tujian Technology, and Xinzhi Embodied, have received capital support.

Tactile sensing is transforming from a long-underestimated niche sector into a new focal point of the industry.

Vision Reaches Its Limits; Tactile Sensing Determines Operational Capability Ceilings

For some time, the main focus of embodied intelligence has been almost entirely on "enabling robots to see more clearly." Higher-resolution cameras, more complex visual algorithms, and visual understanding and spatial reasoning capabilities based on large models have continuously raised the upper limits of robots' abilities to recognize objects, judge positions, and plan actions.

However, when robots truly enter the physical world and begin performing operational tasks, problems arise. They can only see but cannot confirm whether the "contact is correct."

In dim or complex reflective environments, the stability of visual systems significantly decreases when grasping fragile objects. When dealing with plates with detergent, even if the robot can see the position clearly, it cannot determine the appropriate contact force and friction state.

Unscrewing bottle caps is a typical example. Vision can tell the robot where the cap is and how to plan the path, but during the actual unscrewing process, it cannot provide key information such as whether the force is sufficient, whether slippage occurs, or whether the threads are engaged.

In other words, vision excels at solving "where" and "what" questions, presenting object positions, shapes, and spatial structures in the form of discrete image frames. However, when tasks enter the contact phase, robots face continuously changing physical processes.

Tactile sensing is the critical variable that fills this gap.

It describes the continuous physical state after contact occurs, including changes in contact force, distribution of surface deformation, fluctuations in friction coefficients, micro-slip trends, and dynamic feedback from materials under force.

Therefore, when robots transition from "seeing and moving objects" to "contacting and changing object states," relying solely on vision is insufficient.

For humans, many delicate operations are essentially based on tactile feedback.

Especially in industrial precision operations, so-called master craftsmen rely on their fingers to perceive changes in resistance when gears mesh and use their fingertips to control subtle differences in force during pipetting. This is essentially the result of long-term acquisition of high-dimensional tactile signals, forming a "feel" developed over time.

Visual data can record motion trajectories but cannot restore force feedback. Without force feedback, robots cannot truly understand physical interaction processes, making it even more difficult to develop human-like operational experience.

This is why tactile sensing has been placed back at the core in 2026:

Vision has done everything it can, but the last mile of precision operations remains unresolved.

To achieve millimeter-level or even sub-millimeter operational capabilities, acquiring tactile data is essential. Jensen Huang has also explicitly pointed out that achieving fine motor skills is extremely difficult, and tactile sensing is the key to addressing this gap.

Moreover, the value of tactile data has been preliminarily validated in model training.

Previously, real-world data only covered visual and motion data, lacking the tactile dimension, resulting in models that could "see but not feel accurately."

Paxini's specialized experimental data shows:

The VTLA model, which incorporates tactile data, can significantly improve robotic operational success rates. The success rate of ordinary grippers increased by 21.9 percentage points to 96.9%, while the success rate of six-degree-of-freedom dexterous hands increased by 6.2 percentage points, reaching 100%.

Duan Jianghua, CEO of Daimeng Robotics, also pointed out that training with data containing tactile information requires significantly fewer training sessions than pure visual data. The introduction of tactile data significantly reduces the robot's reliance on the total amount of training data.

It is evident that tactile sensing not only raises the ceiling for "doing better" but also lowers the cost of "learning things."

In short, when robots begin to encounter the limits of vision and physical interactions in the real world remain full of uncertainties, tactile data has shifted from being "nice to have" to "essential."

Why Tactile Data Collection Is an Order of Magnitude More Difficult Than Visual Data

The importance of tactile data has reached a consensus in the industry, but a more practical problem has emerged: large-scale collection of high-quality tactile data is an engineering challenge an order of magnitude more difficult than visual data collection.

This difficulty does not arise from a single factor but from the combined effects of "the physical properties of tactile sensing itself" and "the data collection system."

First, from the perspective of tactile sensing, its data generation mechanism differs from that of visual data.

Vision can continuously collect data from a distance using cameras, while tactile data must be generated at the moment of actual contact. This means that every piece of tactile data must be based on real mechanical interactions rather than passive observation.

This directly raises the hardware threshold. Tactile sensors require not only high resolution and high sampling rates but also long-term stability in high-frequency contact environments.

However, in reality, these devices are often installed on robot fingertips or gripper ends, where repeated collisions, squeezing, and friction inevitably lead to material aging, signal drift, and even structural damage. This causes the data distribution collected by the same system to change continuously over time.

A more fundamental issue is that the industry has not yet established a unified technical approach. Current mainstream solutions include resistive, capacitive, optical tactile sensing, and solutions based on six-axis force sensing and magnetic encoding. These approaches differ significantly in sensitivity, resolution, integration difficulty, and cost.

The data collected by different approaches are essentially inconsistent. Some output pressure distribution maps, some output deformation images, and some output six-axis force vectors, resulting in a lack of a unified representation space for tactile data at the fundamental level.

As Duan Jianghua, CEO of Daimeng Robotics, pointed out: "The data representation methods for vision are already relatively unified, but there is no standard for tactile sensing, nor is there a large-scale, multimodal real-world collection system."

At the data collection level, remote operation, UMI, and Ego are currently the mainstream data collection methods. However, these methods struggle to reproduce the delicate finger coordination of humans, making it impossible to fully capture tactile data.

Take UMI as an example. The operator holds a gripper to complete tasks, recording the operation trajectory through an end-effector camera and sensors. Its advantages lie in portability, low cost, and the ability to reuse data across different robots.

However, the gripper cannot replicate the delicate movements of five-finger coordination, and tactile information is minimized. Essentially, this approach is closer to motion trajectory data rather than complete contact physical data.

Although remote operation systems are closer to human hand movements, the delay and attenuation of mechanical transmission structures cause force signals to undergo delay, attenuation, and nonlinear filtering. The true fingertip tactile sensations are often weakened during transmission, making it difficult to fully restore subtle force changes during operations.

Even if the problem of single-modality collection is resolved, multimodal synchronization remains another challenge.

Visual signals are measured in frame rates, tactile signals in milliseconds, and joint angle and pose data have their own time references.

If synchronization precision is insufficient, the vision captures the scene of Action A while the tactile sensing records the force changes of Action B, leading to catastrophic misalignment in model training.

This explains why, despite the rapid rise in industry enthusiasm, tactile data as a whole is still in its early stages or even laboratory phases, with very limited publicly available tactile datasets.

Earlier this year, Weitai Robotics, in collaboration with the National Humanoid Robot Innovation Center and tactile sensor companies, released the "Baihu-VTourch" dataset. With a scale exceeding 60,000 minutes, it fills the missing piece of physical understanding for embodied intelligence.

In April, Daimeng released the Daimon-Infinity dataset, aiming to build a full-modality tactile data system with "millions of hours" of data, attempting to advance tactile data collection to a larger scale and higher dimension, providing robots with the long-lacking physical interaction information for fine operations.

From the perspectives of data richness and information completeness, this is currently the highest-resolution and most complete tactile embodied dataset in the industry.

Even so, in terms of data scale, openness, and cross-scenario generalization capabilities, existing datasets still fall significantly short of the needs for industrial-grade model training.

Ultimately, the challenge of tactile data is not a single technical issue but a systemic engineering challenge.

From the lack of a unified sensor approach to the ongoing evolution of collection methods, from the difficulty of multimodal alignment to the absence of annotation systems, every link continues to amplify the complexity of tactile data production.

Who Is Filling the Gap in Tactile Data and Moving Toward Scalability?

The current challenges in tactile data persist, but the deadlock is being broken.

The industry is beginning to shift tactile data collection from "custom experiments" to "standardized production," with a clearer direction emerging: lightweight tactile gloves are becoming an important entry point for large-scale tactile data production.

Compared to traditional collection methods, tactile gloves offer three core advantages:

First, they provide significant cost advantages, eliminating the need for complete robotic systems and professional workstations, thereby drastically reducing overall data collection costs.

Second, they offer higher degrees of freedom, enabling simultaneous collection of multimodal tactile, temperature, and deformation information, covering a full range of grasping and interaction actions.

Third, they are portable and easy to promote, allowing wearers to freely enter real-life and work scenarios. Data is no longer confined to laboratories, enabling rapid expansion of data sample pools.

Around this entry point, companies such as Paxini, Tashan Technology, and Tujian Technology have introduced different types of tactile gloves, forming their own technological approaches.

The first approach is the high-density multidimensional tactile sensing route, which pursues ultimate precision and spatial resolution in tactile information.

Paxini is a representative company in this route. Its PXCap series gloves are equipped with 10 self-developed 6D Hall array tactile sensors, forming a high-density sensing network across multiple fingers.

The five-finger version contains more than 30 six-axis tactile modules, covering 82 degrees of freedom and capable of collecting over 3,000 channels of tactile signals. Each degree of freedom is also equipped with an encoder.

In Paxini's view, the value of tactile data comes not only from force changes but also from the locations where forces occur. Tactile data without location information has almost zero practical value.

Similarly, Yuansheng Xianda, also in this route, achieves a small form factor and thin packaging while ensuring a high number of sensing points and multidimensional force sensing precision within a unit space.

However, unlike Paxini, Yuansheng Xianda follows a spatially encoded multidimensional piezoresistive technology path, extending its layout beyond fingertips and hands to full-body electronic skin, providing robots with full-domain tactile sensing capabilities.

Another route is the full-link interaction collection route, which focuses on the completeness of the interaction process and the comprehensive expression of tactile information.

For example, TactileEcho Capture data collection finger sleeves from Tashan Technology transform the complete interaction link from approaching → contacting → applying pressure → operating → detaching during human operations into structured data that can be trained, synchronized, and annotated.

They support three-dimensional force sensing, slip sensing, and proximity sensing, enabling pre-judgment capabilities before contact occurs.

The third route is the high-fidelity contact route, which pursues a high degree of consistency between data and real tactile sensations, emphasizing low interference and high durability.

Representative company Tujian Technology takes "intrinsically stretchable materials" as its core differentiator. Its self-developed flexible materials can stretch over 100% and fully recover, partially solving the durability issues associated with long-term wear and high-frequency use.

Its tactile gloves form a multi-point tactile perception network in key areas such as fingertips, finger pads, and palms. They filter out invalid data through "grasping without signal" to prevent noise from entering the training set;

They also enhance the resolution of contact information through high-density flexible tactile arrays, enabling the recording of minute force changes and sliding trends.

There are three routes, each with a distinct technical focus:

Some focus on improving perception density, others on ensuring data integrity, and still others on addressing the authenticity and durability of data collection. Although the technical paths differ, they all aim for the same goal: to transition tactile data from laboratory samples to sustainable, scalable production in the real world.

This is precisely the true dividing line in the current industry.

Today, what is truly scarce in embodied intelligence is not the model itself, but the "experience" that supports the model's understanding of the physical world.

In the era of large models, the competition revolves around internet data. Whoever possesses more text, images, and code can train a stronger model.

In the era of embodied intelligence, the source of data undergoes a fundamental change. Internet data is no longer applicable, and data from the physical world becomes the new focal point of competition.

During this process, visual data represents the first stage of competition. A relatively mature data collection system has been established, but it is also beginning to reach the limits of what can be achieved with vision alone.

Therefore, we have now entered the second stage, where tactile data has become the focus of competition. It is transitioning from a supplementary module in the puzzle of embodied intelligence to the most critical last centimeter in the robot's understanding of the world.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links