Jensen Huang's Vision for Physical AI: Transforming 5G Networks into Distributed AI Computers!

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

03/24 2026 386

These days, discussions about NVIDIA's GTC conference have been dominated by Jensen Huang's 'token economics.'

'The data centers of the future are not storage warehouses but factories producing intelligent tokens; and performance per watt is the only hard metric in this race.' With this statement, Jensen Huang outlines a new paradigm for future corporate competition.

From computing costs to inference efficiency, from token pricing to AI business models, market attention is focused on a familiar question: How can 'intelligence' be produced and consumed more efficiently? However, if we shift our gaze slightly from the cloud, we find another equally important message from NVIDIA that is relatively easy to overlook—on March 16, NVIDIA announced a partnership with T-Mobile and Nokia to deploy Physical AI applications on distributed edge AI networks, attempting to upgrade wireless communication networks into high-performance edge AI computing platforms.

Compared to the further optimization of efficiency and costs through 'token economics,' this message points to a more fundamental question: When AI is no longer just about generating content but about entering the real world and participating in every real-time decision, do the network and computing architectures we rely on to run AI need to be rewritten?

Jensen Huang's answer to this question is straightforward: 'Networks are evolving into AI infrastructure, enabling billions of devices—from visual AI agents to robots and autonomous vehicles—to see, hear, and act in real time. By collaborating with T-Mobile and Nokia to transform 5G networks into distributed AI computers, we are creating a scalable blueprint for global edge AI infrastructure.'

For practitioners who have long focused on IoT and edge computing, this may be the most noteworthy signal from this year's GTC.

Breaking Through the Key Bottleneck in Scaling Physical AI

Previously, Jensen Huang has, on multiple occasions, introduced predictions about the stages of AI development: after perceptual AI and generative AI, AI has now entered the age of agentic AI, with the future belonging to Physical AI. While generative AI addresses the challenge of 'understanding and generating information,' Physical AI faces a more complex proposition: understanding the world and acting within it.

According to NVIDIA's definition, 'Physical AI refers to models that use motor skills to understand and interact with the real world, typically embodied in autonomous machines such as robots and autonomous vehicles.' We know that large language models like GPT and Llama excel at generating human language and abstract concepts, but they have limited understanding of the physical world and are constrained by its rules. In contrast, Physical AI can comprehend the spatial relationships and physical behaviors of the three-dimensional world we inhabit, thus extending current generative AI capabilities.

With Physical AI, autonomous machines can perceive, understand, and perform complex operations in the real (physical) world. For example, autonomous vehicles can use sensors to perceive and understand their surroundings, enabling informed decision-making in various environments (from open highways to urban landscapes), including but not limited to more accurately detecting pedestrians, responding to traffic or weather conditions, and automatically changing lanes. In industrial and logistics settings, autonomous mobile robots (AMRs) in warehouses can navigate complex environments and avoid obstacles, including humans, using direct feedback from onboard sensors. Robotic arms can adjust their grip and position based on the orientation of objects on a conveyor belt for precise operations. In urban spaces, systems composed of numerous cameras and sensors are attempting to understand and respond to environmental changes in real time.

It is precisely in this transformation that AI's requirements for underlying infrastructure are fundamentally altered—because once it enters the physical world, latency, reliability, and real-time performance can shift from 'experience issues' to 'life-and-death issues.'

Many systems cannot tolerate high latency or rely on the classic path of 'uploading to the cloud first, then processing.' As current industry practices demonstrate, scenarios such as autonomous driving, robotics, and smart cities require millisecond-level responsiveness and highly reliable connectivity. The problem thus becomes clear: A key bottleneck in scaling Physical AI lies in the 'lack of low-latency, secure, and ubiquitous connectivity.'

Under traditional architectures, there are two approaches to this problem, but neither is ideal:

'Cloud-Only': Terminal devices collect data, upload it to the cloud for processing, and then return the results. The issue with this model is that the link is too long, making latency and stability uncontrollable and nearly unusable in critical scenarios.

'Edge-Only': Stacking computing power on the device itself as much as possible. However, this approach also faces bottlenecks, as terminal devices are constrained by power consumption, cost, and size, making it impossible to sustain the operation of complex models. At the same time, the isolation of computing power on devices makes it difficult to support continuous model iteration and unified scheduling.

Precisely between these two paths, a new architecture is emerging: shifting computing power from the cloud 'downward' but not entirely onto the terminal, instead placing it 'within the network.' This is the core logic of the AI-RAN architecture being promoted by NVIDIA, T-Mobile, and Nokia: deploying AI inference capabilities at network edge nodes close to terminals, enabling Physical AI systems to offload substantial computing tasks from devices to the nearest base stations or edge data centers.

The direct result of this change is that developers no longer need to stack expensive computing power on every camera, robot, or terminal device. Instead, they can rely on distributed computing resources on the network side to deploy more complex AI capabilities at lower costs. Under this architecture, communication networks are no longer just 'data transmitters' but become computing platforms for intelligence, supporting AI applications at the scale of billions of devices.

Leading Developers Deploy Inference and Visual AI to the Edge

To transform networks into distributed AI computing platforms, ultra-low latency and spatiotemporal consistency must be provided for billions of terminals at the network edge—a core capability of NVIDIA's partner, T-Mobile. Unlike Wi-Fi, which has limited coverage and security, T-Mobile's standalone 5G network offers wide-area coverage and quality of service guarantees, enabling complex AI agents to operate in busy urban intersections, industrial facilities, and remote areas.

According to an official press release, T-Mobile is collaborating with NVIDIA-certified Physical AI developers (including Fogsphere, LinkerVision, Levatas, Vaidio, and Siemens Energy) to demonstrate 'how base stations and mobile switching centers can support distributed edge AI workloads' while fully leveraging public 5G network connections. They will integrate NVIDIA's Metropolis Blueprint on this platform for video search and summarization (VSS) functionality.

NVIDIA's latest VSS (3) Blueprint introduces multimodal visual understanding and intelligent search capabilities, provided in a modular architecture that can be reconfigured for different environments ('from retail stores to warehouses'). NVIDIA states that there are 1.5 billion cameras worldwide, but less than 1% of video content is manually reviewed. The VSS (3) Blueprint can 'break down complex natural language queries and search video footage within five seconds to find specific events' and 'summarize long videos 100 times faster than manual review.'

Currently, many leading developers are collaborating with NVIDIA and T-Mobile to integrate Physical AI agents capable of driving real-time actions into T-Mobile's distributed edge network based on NVIDIA Metropolis Blueprint for video search and summarization (VSS). Pilot application scenarios include:

Smart City Operations: LinkerVision, Inchor, and Voxelmaps are testing an integrated 'urban operations agent' based on computer vision and digital twins, capable of perceiving, simulating, and optimizing traffic signal timing, aiming to improve accident response speeds in San Jose by fivefold.

Utility (Power) Facility Automated Inspections: Levatas is leveraging NVIDIA's computing power to conduct 5G network-based automated inspections of hundreds of thousands of miles of transmission lines, detecting and quickly addressing issues such as leaning poles, corrosion, and abnormal heating, with speeds increased by fivefold. The two parties are currently evaluating AI-RAN infrastructure to further reduce costs, shorten fault recovery times, and accelerate the transition from reactive to predictive maintenance.

Vision-Based Facility Management: Developers like Vaidio are building facility management agents based on the VSS Blueprint for threat detection and fault prediction, triggering automated workflows to enhance facility management efficiency.

Real-Time Industrial Safety: Fogsphere provides safety AI agents for SAIPEM to detect and respond to hazardous events in real time in high-risk onshore, offshore, and drilling construction environments, such as workers under suspended loads or hydrocarbon leaks.

How Is AI Reshaping the Role of Communication Networks?

From a broader perspective, the changes described above also signify a fundamental transformation in the telecommunications industry's role.

For a long time, communication networks have been viewed as 'connectivity infrastructure'—their core mission is to efficiently transmit data between devices. However, the scale of this infrastructure is vast enough to rival the entire IT industry: the global telecommunications industry is worth nearly $2 trillion, with base stations spread across (distributed throughout) cities and rural areas, making it one of the most widely distributed technological systems in human society. In the past, they carried information flows; under the AI-RAN architecture, these nodes, primarily responsible for 'transmission,' will be redefined as distributed computing nodes, becoming the infrastructure platform for AI operations at the edge.

The reshaping of communication networks by AI has actually been quietly underway for some time. Previously, in 'Is LoRa Vying for 'Discourse Power' in the New Round of IoT Development?', I mentioned that it is no coincidence that LPWAN camps, represented by the LoRa Alliance, are beginning to emphasize concepts like 'Physical AI' and 'action loops.' In the past competitive landscape of LPWAN, whether it was NB-IoT, LTE-M, or satellite IoT, technological narratives long revolved around coverage capabilities, power consumption performance, and cost advantages. LoRaWAN was also widely recognized for its 'low power consumption, low cost, private network flexibility, and strong deployment elasticity.' However, in the AI era, it is attempting to redefine its role: not just a data connectivity protocol but an AI data entrance (entry point), action Export , and communication nervous system for Physical AI.

This trend will become even more pronounced in future network architectures. The design philosophy of 6G is pointing toward 'being built for AI,' not just improving speed. In February 2026, the 3GPP SA2 #173 meeting concluded in Goa, India, releasing an important signal in its R20 architecture panoramic report: industry consensus has moved beyond mere 'connectivity pipelines' toward 'native intelligence platforms.' Under this architecture, the core network element AIMF (AI Management Function) changes the way terminals interact with the network: previously, the core network was only responsible for bit transmission, while the R20 architecture begins to provide MaaS (Model as a Service). Through gradient splitting mechanisms, terminals only need to compute low-level gradients to protect privacy, while the core network handles high-level gradient computations. This means that network computing power will directly participate in the training and optimization of user-side large models, rather than just being a passive pipeline for information transmission.

Looking at the big picture, it is clear that AI is consuming communication networks, and communication networks are also reshaping themselves. Whether it is edge computing, Physical AI, or the future 6G native intelligence network, all herald the formation of a new paradigm: from 'transmitting bits' to 'providing intelligence,' from 'passive pipelines' to 'active computing platforms.' Under this new paradigm, AI will not only be software but will also become an intrinsic attribute of telecommunications networks; networks will not only be infrastructure but real-time ecosystems carrying intelligence.

Today, we may truly stand at the starting point of an intelligent world that is 'touchable everywhere and intelligent everywhere.'

References: Nvidia positions AI-RAN with Nokia, T-Mobile in (its) $1tn AI infrastructure market——RCR Wireless; Agents, inference and token economics – Nvidia pitches the AI future——RCR Wireless; State of enterprise IoT 2026: The shift from IoT to autonomous connected operations——IoT Analytics; NVIDIA, T-Mobile, and Partners Integrate Physical AI Applications on AI-RAN-Ready Infrastructure——NVIDIA Official Website; Physical AI——NVIDIA Official Website; 3GPP Latest Meeting Review: The Latest Evolution Trends in 6G Architecture——Wireless AI Perspective

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links