JD May Be Cracking the Key Challenge of Physical AI

05/27 2026 418

JD Aims to Solve Embodied AI's 'Data Scarcity' in the Physical World

On May 20th, JD announced the launch and official operation of its first embodied AI data collection community in Suqian. This marks another key move in JD's 'Embodied AI Super Supply Chain' strategy.

Prior to this, JD had already taken two major steps: In March, JD announced the construction of the world's largest embodied data collection center. It plans to mobilize up to 600,000 people to accumulate 10 million hours of real-world human video data within two years.

Following that, in April, at the JD Embodied AI Ecosystem Launch Event, a series of achievements were unveiled, including the world's first full-link embodied data infrastructure, self-developed collection terminal JoyEgoCam, embodied large model JoyAI-RA, and the embodied AI data trading platform.

These initiatives are part of a broader strategy—JD's goal, reiterated at this year's 618 launch event, is to build the world's largest physical world operation center.

JD's decisiveness stems from its clear understanding of embodied AI's Achilles' heel: data scarcity.

This is the current reality for embodied AI, which faces a critical bottleneck of mismatched data supply and industrial demand. Data shows that training general-purpose embodied large models requires tens of millions of hours of high-quality real-world data. However, the global industry's existing data stockpile is only in the hundreds of thousands of hours, leaving a gap of over 95%.

Why Must 'First-Person Data' Be Recreated?

Currently, embodied AI data collection can be broadly categorized into three types: robotic teleoperation and UMI (Universal Manipulation Interface) data; internet video data; and the first-person human data that JD is now recreating.

Industry experts explain that teleoperation data is limited in quantity—even many leading companies have only 20,000 to 30,000 hours. This reflects the high cost and slow pace of acquiring teleoperation data.

For example, collecting one hour of real-world robot data can cost hundreds of yuan and requires a professional motion capture setup. In terms of collection speed, human operators remotely controlling robotic arms via screens struggle to keep pace with real-world production rhythms.

Some even argue that the path of robotic teleoperation data may be unsustainable. The data generated this way cannot simultaneously support large-scale training and industrial deployment.

Internet video data is abundant but largely ineffective for robots. Such data provides a 'third-person perspective,' telling AI what happened but not explaining why actions were taken.

For instance, consider a seasonal video tutorial on stir-frying spring bamboo shoots with pork. The footage shows ingredients, steps, and the finished dish, but robots cannot learn details like oil temperature, heat adjustments, or the force needed for stir-frying when adding ingredients.

Robots trained on these two types of data are essentially 'actors.' They can replicate standard movements in controlled environments but struggle in the unstructured, random, and dynamic physical world—excelling at basic actions like running and jumping but failing at practical tasks like housework or delicate operations.

Fundamentally, the data fed to robots is disconnected from the real physical world.

The core of embodied AI lies in hands-on manipulation, physical perception, and dynamic decision-making. It requires interacting with the world through a 'first-person perspective,' like humans do: seeing with eyes, touching with hands, sensing force, judging environments, and accumulating error-tolerant experience—that is, first-person data (Ego-centric Data).

This type of data faithfully replicates human perception, judgment, and operational logic in the physical world, preserving implicit information: gaze trajectories, hand-eye coordination micro-movements, and spatial relationship judgments. It can be considered the 'native language' for physical AI.

Compared to internet and teleoperation data, first-person data explains 'why actions occur' and 'how actions are executed.' For AI to deliver true value in the physical world, first-person data must be recreated.

Over the past year, companies like NVIDIA, Tesla, Figure, and 1X have begun large-scale (massively) collecting human operation videos for robot imitation learning. NVIDIA's EgoScale framework, introduced this year, explicitly identifies large-scale first-person data as core infrastructure for robot training.

In a sense, first-person data is becoming the most scarce resource in the global embodied AI field.

More importantly, a Scaling Law similar to that of large models is emerging: the more videos robots 'watch,' the smarter they become.

This means the upper limit of model capability depends on the scale and quality of real-world behavioral data. The ultimate competition in embodied AI is not just about model capability but about who can continuously access large-scale first-person human data from real-world scenarios.

This is the core value of JD's layout (strategic layout ) in embodied AI data collection and its renewed commitment at 618 to 'build the world's largest physical world operation center.' By stepping into real-life consumer and industrial settings to collect first-person operational data, JD addresses the shortcomings of teleoperation and internet video data.

Suqian serves as JD's core hub for scalable, industrialized, and ecosystem-driven embodied AI data infrastructure. JD has already established the world's largest embodied AI data collection center there. Figure: Data collection in a supermarket scenario · Photographed by Tang Chen

In Suqian, I observed scenarios like households, medical care, nursing homes, garment factories, and fruit farming. Citizen collectors upload data from their daily lives and work.

For example, a stay-at-home mother performs household chores while capturing key parameters—upper body trajectories, force distribution, and human-environment interactions—during tasks like table wiping, clothes folding, organizing, and floor cleaning, using JD's self-developed collection terminals.

After uploading, quality inspection, and annotation, this data becomes high-quality 'data fuel' for embodied AI models, helping them better understand the real physical world.

This initiative, dubbed 'the largest data collection effort in human history,' now covers over 100 sub-scenarios, from households and offices to factories, logistics, stores, and sanitation.

Through data collection, JD is transforming human operational experience from 'physical actions' into 'digital assets,' serving as 'human behavior samples' for robots.

'3+1' Foundation Supports the 'World's Largest Physical World Operation Center'

In other words, JD is refining 'first-person data elixirs' for embodied AI.

This is undoubtedly arduous work. Among domestic AI giants vying for first-person data, JD was the first to enter the field.

Why JD? How has it led in reconstructing the physical world's data ecosystem? In my view, JD benefits from a '3+1' core competitive edge.

First, AI infrastructure. This is an advantage that pure algorithm firms, hardware companies, or single-scenario players cannot replicate. It is supported by JD's 'hard commercial strength + AI infrastructure.'

The former stems from JD's deep integration with the real economy, including a nationwide offline infrastructure network: over 3,600 warehouses, 10,000+ JD Mall stores, 200,000+ partner pharmacies, and 50,000+ professional home service personnel. These real-world settings serve as natural treasure troves of physical AI data, allowing high-quality data collection during daily operations without laboratory 'staging.'

The latter reflects JD's willingness to invest heavily in R&D, forging a replicable path for AI-real economy integration. JD's Q1 2026 financial report shows R&D spending surged over 59% YoY.

This is JD's version of 'brute force creating miracles.' JD AI has achieved multi-dimensional breakthroughs across 'model layer - technical foundation - industrial ecosystem,' providing technical guarantees for its focus on 'solving real problems.'

For example, at the model layer, JD's self-trained embodied large model JoyAI-RA achieves global-leading success rates in real-robot experiments. At the technical foundation layer, JD Cloud has formed a closed-loop, localization (domestically adapted) computing power stack. In the industrial ecosystem, JD drives digital transformation across sectors—from digital humans to embodied AI—building a trillion-yuan ecosystem through technological inclusivity.

Notably, JD has built the industry's only full-link embodied AI data infrastructure covering 'collection, storage, labeling, training, evaluation, simulation, and testing' while relying on JD Cloud for one-click data cloud uploads and full-process visualization. JD's data efficiency now reaches 95%, with overall processing costs reduced by 60% and daily processing capacity in the hundreds of thousands of data points, supporting its 10-million-hour data collection project with scalable, low-cost infrastructure.

Second, industrial depth. This distinguishes JD AI from pure laboratory innovations. JD's super supply chain is deeply embedded in thousands of retail, logistics, health, and industrial scenarios, granting it inherent supply chain and business advantages.

For example, in warehousing, delivery, home services, and factory operations, data like the force used by couriers to sort packages, the angles at which warehouse workers lift goods, or the techniques home service personnel use to clean windows—are irreplaceable by teleoperation or internet video data but essential for robot training.

A greater value I see is JD AI's ability to transfer capabilities across supply chain-embedded industries, further deepening sectoral expertise. For instance, JD Logistics' super brain model drives iterative improvements in wolf-pack robots; JD Industrial's JoyIndustrial model addresses long-standing pain points in industrial supply-demand-fulfillment cycles, serving over 10,000 Fortune 500 and state-owned enterprise clients.

This dynamic, evolving industrial depth ensures JD's data collection inherently carries industry Know-how. Every data point aligns with industrial needs and real-world scenarios, offering far higher precision and practicality than laboratory-standardized data, guaranteeing model deployability from the ground up.

Third, user experience. A industry consensus is that large models have moved beyond parameter competition—their ultimate goal is deployment and value creation. JD AI's strategic pillar is user experience, emphasizing 'technological usability and tangible experience.' This drives AI's shift from 'technical showmanship' to 'practical deployment.'

For example, in embodied AI, JD Retail aims to help robot brand partners exceed 10 billion yuan in cumulative sales by 2026; JD Industrial will achieve 100% coverage of robot manufacturing materials through its one-stop industrial supply chain technology and services; JD JoyInside partners with nearly 200 home appliance, robotics, and toy brands to implant 'high-EQ brains' into smart devices, enabling consumers to experience and adopt AI.

By integrating AI infrastructure, industrial depth, and user experience, JD's ultimate competitive edge lies in its unique super supply chain thinking.

This is JD's trump card, distinguishing it from all other AI players. The essence of supply chains is efficient orchestration of physical space, information flows, logistics, and manual operations—highly aligned with embodied AI's core needs to 'perceive the physical world, adapt to physical rules, and complete practical tasks.'

This mindset has positioned JD uniquely in the embodied AI race: as an 'infrastructure provider' and 'super supply chain service provider (service provider)' for the embodied AI era. By building full-link data infrastructure, opening an embodied AI data trading platform, and co-creating an 'embodied data ecosystem,' JD positions itself as the 'utilities provider' (water, electricity, gas) for the embodied AI age.

Simply put, while large models remain 'armchair theorists' in the digital world, JD has already charged into the physical world, refining 'first-person data elixirs' to redefine embodied AI's deployment logic.

This is the true confidence behind JD's announcement to build the 'world's largest physical world operation center.' In the future, all robotics firms, model developers, and industrial partners can leverage this 'utilities' system to rapidly iterate models, deploy products, and adapt to scenarios. Figure: China's first embodied AI data collection community launches in Suqian · Photographed by Tang Chen

Fueled by JD's refined data, the value of embodied AI is now spilling over.

On one hand, JD's first-person data fills over 95% of the industry's real-world data gap, reducing overall R&D costs by more than 60%. This means embodied AI will rapidly move beyond 'laboratory demonstrations' to industrialization and scaling (scalability), accelerating the arrival of physical AI's 'ChatGPT moment.'

On the other hand, by 'mining data to refine elixirs,' JD transforms physical operations into computable, reusable digital assets. This upgrades robots from 'world perception' to 'world understanding,' fostering new productive forces and injecting core momentum into the digital and intelligent transformation of the real economy. It will also drive urban transformation.

For example, Suqian is upgrading from an 'e-commerce city' to a digital and intelligent hub, aiming to become a Yangtze River Delta (Yangtze River Delta) leader in embodied AI industries. Meanwhile, it helps China's embodied AI sector break free from overseas technical and data dependencies, seizing the global high ground in physical AI development.

2026 has been defined by authoritative media and industry consensus as the 'first year of embodied AI data.' When JD's network of 600,000 citizen collectors establishes a 10-million-hour real-world database within two years, JD will secure dominance over the 'utilities' of embodied AI.

Perhaps the 'ChatGPT moment' for the physical world will begin with the refinement of these 10 million hours of data fuel.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.