05/29 2025
442
With 2025 heralded as the "inaugural year of edge generative AI" by the industry, the semiconductor landscape is witnessing its most profound architectural transformation since the mobile internet era. Driven by smart terminal devices, industrial IoT, and the imperative for real-time decision-making, this technological revolution poses significant challenges to traditional computing power distribution models. IDC data reveals a 217% year-on-year growth in the global edge AI chip market size in Q1 2025, outpacing the cloud AI chip market's growth rate. Amidst this transformation, GPU, NPU, and FPGA architectures are charting distinct evolutionary paths, reflecting diverse visions of semiconductor companies regarding future computing paradigms.
01
GPU
During the recent AI wave centered on large models, general-purpose GPUs have shone due to their robust sparse computing capabilities and programmability. However, edge hardware must not only handle inference tasks for individual models but also manage all branch acceleration, user interaction, and device management tasks. Thus, an AI edge design necessitates a holistic approach to ensure AI scenarios coexist harmoniously with other functionalities. Furthermore, with performance enhancements and increased transistor density, thermal management has emerged as a critical issue that must be addressed. In future edge AI applications, power efficiency (TOPS/W) will be more pivotal than absolute computing power (TOPS).
Another vital rule proven in large model applications is that the field of AI models and algorithms is constantly evolving. Hardware designers must improve the energy efficiency and programmable design capabilities of accelerators to future-proof their devices. Additionally, the diversity of terminal/edge devices and applications is crucial. Hardware design must adapt to current popular models and applications while supporting next-generation models and rapidly evolving application needs. This necessitates software-hardware collaboration, with software content capable of adapting to future advancements, transcending the limitations of model- or application-specific accelerators. This is especially important for mainland Chinese system vendors and their main chip suppliers, who prioritize rapid product launches.
As a versatile accelerator, GPUs exhibit remarkable performance, scalability, and programmability in cloud AI workloads. Imagination's newly launched E-series GPU IP elevates INT8/FP8 computing power to 200 TOPS through Neural Cores and Burst Processors, offering a 400% performance boost and 35% improvement in power efficiency over the previous generation.
Phil Solis, Research Director at IDC, stated, "AI capabilities on various devices are evolving rapidly, yet AI system designers confront multiple challenges in performance, efficiency, and flexibility. Imagination, with its extensive experience in low-power GPUs, has successfully integrated flexible support for graphics and AI in GPU architectures. The E-Series combines GPU programmability with a leap in AI performance, providing an appealing solution for edge AI system developers."
02
NPU
As AI applications accelerate from the cloud to the edge, traditional processors like CPUs and GPUs gradually reveal limitations in power consumption, latency, and resource utilization. Neural Processing Units (NPUs), specifically designed to address these challenges, are increasingly invaluable in edge computing. NPUs focus on acceleration and optimization during the AI model's inference stage. Unlike general-purpose CPUs and GPUs that handle multiple tasks, NPUs significantly enhance AI task execution efficiency by eliminating redundant processing steps, particularly in high real-time performance scenarios such as object detection, speech recognition, and anomaly monitoring.
NPUs also excel in balancing power and performance, enabling high-performance AI operations with low power consumption. This makes them ideal for edge devices with limited heat dissipation, stringent energy consumption controls, or compact spaces, including fanless computers, embedded IoT systems, and industrial automation controllers.
Furthermore, NPUs facilitate local AI processing on devices, reducing reliance on cloud computing power, minimizing data transmission latency, and enhancing data privacy protection. With their parallel computing architecture and compact design, NPUs can be flexibly deployed in diverse scenarios like smart cities, intelligent surveillance, mobile robots, and autonomous driving, serving as a core driving force for the large-scale implementation and continuous expansion of edge intelligence.
NPUs' core advantages lie in their energy efficiency ratio and task specificity. For instance, NXP's i.MX 95 series processors integrate the eIQ Neutron NPU with 2 TOPS of computing power, offering a fourfold speed increase in image recognition tasks compared to the previous generation while reducing power consumption by 30%. This characteristic dominates scenarios with stringent real-time requirements, such as smart security and medical equipment.
03
FPGA
FPGAs play a unique role in edge AI due to their reconfigurable nature. In April 2025, after Intel's Altera became independent, it announced a focus on the edge AI inference market. FPGAs' parallel processing capabilities and low latency make them suitable for scenarios requiring rapid algorithm iteration.
While both FPGAs and GPUs offer parallel processing advantages, FPGAs can perform parallel computations at a finer granularity of logical units. For tasks involving massive data volumes, like 8K video, CPU instruction serial processing is insufficient, and GPU multi-core rendering has limitations. Conversely, FPGAs can process video streams in stages, achieving pixel-level parallelism. For example, Noah Nebula's MX2000 pro display controller utilizes AMD's FPGA to enable a single device to drive an 88K ultra-large screen, meeting technical requirements such as high-definition LED display control and frame rate conversion in movie virtual shooting.
Moreover, FPGAs can implement specific algorithms through hardware, bypassing traditional CPUs and GPUs' software stack bottlenecks, achieving ultra-low latency. Taking color space conversion as an example, FPGA processing latency is only 1/100 of that of CPUs and GPUs. Additionally, FPGAs' fixed circuit structure ensures deterministic latency, while CPUs and GPUs experience latency jitter due to system scheduling. In scenarios with stringent latency requirements, such as medical 8K endoscopic video processing and high-frequency trading, FPGAs offer significant advantages.
Currently, the development threshold for FPGAs has been lowered, with vendors providing professional IP modules and comprehensive solutions. High-level synthesis and other development methods are also applicable to audio and video processing. Furthermore, FPGAs' long lifecycle meets the long-term service needs of professional audio and video equipment, achieving "once developed, lifetime use," making them more valuable in this field compared to GPUs reliant on continuous computing power upgrades.
04
Vendor Strategies
Based on their technological strengths and market positioning, major vendors are deploying diverse technical strategies.
In the NPU camp, vendors like STMicroelectronics, Renesas, and Huawei Ascend are fully occupying the IoT market share through the "MCU+NPU" combination strategy. This solution, combining microcontrollers with neural processing units, leverages MCUs' mature control and management advantages alongside NPUs' powerful AI computing capabilities to meet IoT devices' low-power and real-time AI processing needs. Allwinner Technology's V821 chip has been mass-produced and successfully integrated into Leadwin's innovative AI glasses, providing robust support for AI applications in smart wearable devices and demonstrating NPUs' broad application potential on terminal devices.
In the GPU camp, Imagination, once abandoned by Apple, is seeking new breakthroughs with its "AI+Graphics" fusion architecture. Its E-series GPU IP boasts powerful parallel processing capabilities, supporting the parallel operation of 16 virtual machine instances, making it ideal for complex scenarios such as multi-screen interaction in automotive cockpits and ADAS monitoring, offering efficient graphics and AI processing solutions for automotive intelligence upgrades. Industry giant NVIDIA, on the other hand, leverages its Jetson series of products to deeply penetrate the robotic vision field. The high-performance, low-power Jetson platform has become a preferred choice for many robot developers, helping robots achieve precise visual recognition and decision-making in complex environments.
In the FPGA camp, Altera focuses on the data center and edge inference markets, utilizing FPGAs' programmable, flexible, and efficient characteristics to provide customized solutions for data processing and AI inference, meeting data centers' high concurrency and low-latency processing needs, as well as edge devices' real-time inference requirements in complex scenarios. Lattice, with its low-power FPGA products, has successfully entered the smart camera and sensor market. In these power consumption- and size-sensitive applications, Lattice's low-power FPGAs ensure stable long-term device operation while meeting real-time data processing and AI analysis needs, providing reliable technical support for intelligent upgrades in fields like smart security and environmental monitoring.
05
Merger and Acquisition Trends
In addition to expanding technological territories through independent research and development, major vendors are integrating resources and strengthening their advantages through mergers and acquisitions to seize opportunities in the rapidly evolving market.
STMicroelectronics (ST) acquired AI software company DeepLite to deepen its AI algorithm optimization capabilities. DeepLite's core technology enables extreme AI model compression, allowing complex AI algorithms to run efficiently on low-power devices. Post-acquisition, STMicroelectronics can deeply integrate DeepLite's technology into its "MCU+NPU" product system, further consolidating its IoT market dominance and providing more competitive AI solutions for terminal products like smart appliances and wearable devices.
Qualcomm's acquisition of the edge AI development platform Edge Impulse is a significant step to enhance its edge computing ecosystem. The Edge Impulse platform focuses on simplifying the AI development process on edge devices, supporting developers in quickly creating, training, and deploying AI models. Through this acquisition, Qualcomm can integrate Edge Impulse's development tools with its chip technology, lowering the development threshold for edge AI applications, attracting more developers to innovate using Qualcomm chips, and accelerating AI technology implementation in smart homes, industrial IoT, and other fields.
NXP's acquisition of AI chip startup Kinara focuses on strengthening its high-performance AI inference capabilities. Kinara's AI processors are renowned for their high efficiency and low power consumption, particularly suitable for scenarios requiring high real-time performance, such as smart cars and industrial automation. Post-acquisition, NXP will integrate Kinara's technology into its product line, providing more powerful AI processing solutions for automakers and industrial customers, further consolidating its leading position in automotive semiconductors and industrial control.
06
Conclusion
Amidst the technological wave ignited by edge generative AI, the semiconductor industry is undergoing a profound and widespread transformation. From GPUs' flexibility and versatility to NPUs' efficiency and specificity, to FPGAs' reconfigurability, different architectures are expanding their domains within their respective fields of expertise, reflecting the industry's diverse explorations of future computing forms.
Technological evolution is never a linear substitution process but rather a quest for optimal solutions in the continuous adaptation to scenarios and problem-solving. Faced with fragmented and rapidly changing edge AI application scenarios, a single architecture is insufficient. True competitiveness lies in how to combine software and hardware advantages to build more efficient, flexible, and scalable system solutions.
Simultaneously, vendors are accelerating the supplementation of their shortcomings and strengthening their ecological layout through mergers and acquisitions. This development strategy, emphasizing both "internal growth and external expansion," not only accelerates product iteration but also injects more possibilities for collaborative innovation into the entire industry chain.
Looking back from the 2025 node, dubbed the "first year of edge generative AI," it becomes evident that this transformation is merely the beginning.