11/06 2024 471
Qualcomm needs to attract as many Chinese automakers as possible to win a battle it cannot afford to lose.
Written by: Zhang Jixin
Edited by: Mao Shiyang
Original content by Autopix (ID: autopix)
01.
Qualcomm fires a shot at Orin
On October 23, executives from Li Auto and Great Wall Motors attended an unusual summit together.
The meeting was held in Hawaii, USA, and was organized by Qualcomm, the American chip giant.
The organizers gave Li Auto and Great Wall Motors tens of minutes to introduce their intelligent layouts in front of global technology companies. Li Auto even made it into Qualcomm's main venue, a first for a Chinese automaker.
Those who follow the new energy vehicle industry are familiar with two names: Qualcomm 8295 and NVIDIA Orin X. Thanks to the rapid growth of new energy vehicle sales in China, these two chips have made their respective companies extremely profitable in the past few years.
In 2023 alone, Qualcomm shipped 2.26 million smart cockpit chips in the Chinese market, with a market share of over 59%. NVIDIA has also benefited, with the popularization of advanced intelligent driving in urban areas being a trend in new energy vehicles this year. From the Lingpao C10, priced from 130,000 yuan, to the 500,000-yuan NIO ET7, Orin-X covers different markets, from economical to luxury models.
With this trend, a record 115,600 new energy vehicles sold in the Chinese market in September were equipped with NVIDIA's computing platform, with an expected annual delivery volume of millions of units. As mid-to-high-end intelligent driving continues to gain popularity, NVIDIA's shipments will continue to increase.
Originally, the two companies specialized in smart cockpits and intelligent driving, respectively, sharing the dividends of China's new energy market growth. However, with the rapid spread of AI in vehicles, the demand for computing power on the vehicle side is increasing due to the deployment of end-to-end intelligent driving and large cockpit models. The electronic and electrical architecture of automobiles is also evolving, moving towards central computing and cross-domain integration. Future cars may only need one "brain" to handle everything. For chips, the computing power deployed in the cockpit domain and the computing power deployed in the intelligent driving domain are not mutually exclusive; one may replace the other.
Both Qualcomm and NVIDIA want to be the "brain" of future cars. To become this "brain," it is essential to have better overall capabilities, placing increasingly high demands on chip performance.
NVIDIA's solution is to continue increasing the computing power of a single chip, leading to Thor. NVIDIA announced plans for this chip in 2022, with an AI computing power of up to 2000 TOPS, eight times that of Orin X. This chip quickly excited OEMs, with Li Auto, XPeng, Zeekr, BYD, and GAC Aion all having plans to integrate it.
In addition to significantly increased computing power, compared to the current Orin, Thor has computing power isolation capabilities, separating the computing power required for autonomous driving from functions like in-vehicle infotainment. It can also run Linux, QNX, and Android, the three mainstream in-vehicle systems, in isolation. NVIDIA is like building a super-large swimming pool divided into multiple sections, with each section operating independently and adjustable in size based on demand.
In other words, NVIDIA aims to create a computing center that meets both cockpit and intelligent driving needs. However, based on the final four versions, Thor seems unable to become the "brain" for an integrated cockpit and driving system due to insufficient computing power.
As development progressed and based on demand from major customers, Thor reduced the computing power of a single chip, gradually evolving into these four main versions:
Thor X with 1000T computing power;
Thor S with 700T computing power;
Thor U with 500T computing power;
Thor Z with 300T computing power.
Why was the chip with a single computing power of 2000 TOPS shelved? Cost is a crucial factor. Increasing chip size leads to exponentially higher costs, which may not be an issue for NVIDIA but could be unaffordable for automakers' BOM costs.
Seeing its market share almost taken away, Qualcomm began deploying a counter-attack strategy early on, choosing a different path. Unlike NVIDIA, Qualcomm's strategy involves releasing two separate chips for the cockpit and intelligent driving. On October 23, Qualcomm announced the Snapdragon Cockpit Premium Platform and the Snapdragon Ride Premium Platform, the former for smart cockpits and the latter for intelligent driving.
▍Qualcomm's New Solution
The Qualcomm solution's unique selling point is that automakers can integrate the cockpit and intelligent driving chips into a single SoC if they use both in one vehicle. The computing power of the two chips is interconnected, with the cockpit chip serving as a redundant computing resource for the intelligent driving chip and vice versa.
It's like having two similarly sized swimming pools that operate independently but can share water when needed. This solves the cost issue associated with increasing the computing power of a single chip.
For Qualcomm to integrate the capabilities of two chips, it must have higher flexibility and freedom. This is akin to training two people to work seamlessly together, which is challenging.
Another challenge is energy consumption. If two people need to eat the equivalent of four people's meals to do the work, it's not suitable for battery-powered mobile devices. For example, the popular Sentry Mode can consume 3 to 4 kWh of electricity overnight. In the context of automotive intelligence, many similar small functions require chip invocation. Reducing chip energy consumption can leave room for more intelligent functions.
Achieving these two points is a complex systems engineering task. However, one crucial foundation in chip design is that both chips use Qualcomm's self-developed Oryon CPU.
A chip's core components include the CPU, GPU, and NPU. The CPU serves as the brain of the entire system, responsible for executing program instructions, processing data, and controlling other components. The self-developed CPU architecture allows Qualcomm to achieve higher coordination between the two chips, unifying their cores first and simplifying other issues.
The self-developed CPU enables highly coordinated work among different computing units in the SoC, improving overall computational efficiency. What else does Qualcomm need? This brings us to the crucial role of Chinese automakers.
02.
Qualcomm vs. NVIDIA: Two Choices
Before the emergence of Qualcomm's Oryon architecture, ARM was the most widely used processor architecture in the mobile device field. Well-known companies like NVIDIA, Intel, Qualcomm, MediaTek, Huawei, Apple, and Samsung all produce chips based on the ARM architecture to varying degrees, leading to ARM being habitually referred to as the "public version architecture" in the industry.
Firstly, the architecture technology is mature. The public version architecture has undergone extensive testing over time, resulting in very mature development tools. Chip manufacturers can leverage existing architectures to accelerate product development cycles and quickly enter the market.
Secondly, the ARM architecture has achieved a significant market presence. Mainstream software developers and hardware manufacturers are developing products for it, enhancing compatibility between different devices and systems when chip manufacturers choose it.
More importantly, the ARM architecture offers considerable flexibility. It's like Legos, providing structurally simple yet well-designed blocks. Chip designers can use their creativity to combine these blocks in any way they choose to create desired products.
However, in the era of artificial intelligence, many manufacturers have begun to venture beyond the ARM architecture framework, such as XPeng's self-developed Turing chip and NIO's self-developed intelligent driving chip. This change is driven by artificial intelligence.
From mainstream intelligent driving solutions, we can see that AI's share is increasing, requiring a chip architecture that provides higher computing power and better matches in-house algorithms. Thus, the limitations of the public version architecture are becoming apparent.
This is like a race where intelligent driving competitors are sprinting and need equipment that better matches their athletic habits. Using Legos as an analogy, if ARM wants to continue being the Legos of the AI era, it may need to develop customized Legos tailored to each chip's needs or grant chip manufacturers the authority to modify the blocks according to their requirements.
ARM is indeed doing this, but not every chip manufacturer enjoys this treatment. In choosing a chip architecture, industry leader NVIDIA still firmly adopts ARM. Besides valuing the maturity of the architecture, NVIDIA's relationship with ARM is closer than that of Qualcomm.
NVIDIA holds an Architecture License Agreement (ALA), allowing it to custom-design processor IP cores based on ARM's instruction set architecture. In contrast, Qualcomm holds a Technology License Agreement (TLA), only permitting the purchase of ARM-designed IP cores with minor modifications. This gives NVIDIA higher freedom in its collaboration with ARM, allowing it to enjoy customized treatment as a participant.
Although ARM regularly updates its architecture to provide better-performing products and improve its "blocks" for the AI era, these updates may not fully align with Qualcomm's needs. As NVIDIA is the frontrunner and Qualcomm the challenger, Qualcomm requires even more powerful products than NVIDIA.
From the results, the Oryon CPU architecture has two highlights: a 2+6 core design and the elimination of the L3 cache, replaced by a massive 24MB L2 cache. These designs not only enhance performance but also reduce energy consumption.
How is this achieved? Let's compare it with the MediaTek Dimensity 9400, also released in October this year, which uses TSMC's second-generation 3nm process technology. The difference is that its CPU employs the ARM public version architecture with a primary core of Cortex-X925.
The Oryon CPU has two primary cores with a frequency of 4.32GHz. In contrast, the MediaTek Dimensity 9400 has only one primary core with a frequency of 3.63GHz, lower than that of Oryon.
Simply put, the frequency is like the "heartbeat" speed of the CPU, with higher values indicating more work can be completed per second. Oryon CPU allocates more resources to its primary cores by reducing the number of mid and small cores. Both have eight cores, but the Oryon CPU is "2+6" while the Dimensity 9400 is "1+3+4".
This is because the existing ARM architecture, in addition to designing large cores for high-performance computing, also needs space for multiple mid and small cores to handle low computational demands and lightweight tasks. For example, when an app moves to the background, it no longer uses the high-energy primary core for computation but switches to lower-energy mid and small cores.
Based on its technical characteristics and usage scenarios, Qualcomm eliminated mid and small cores, opting instead to improve performance and reduce energy consumption by enhancing the memory architecture, specifically by expanding the L2 cache. How is this achieved?
Imagine the CPU as a factory assembly line where the CPU cores are workers and the L2 cache is the workbench closest to them. When workers (processing data), they need various tools and parts (data information). If these tools and parts are stored in a factory warehouse, and workers have to retrieve them every time they need them, it would waste a lot of time on transportation (data transfer), reducing efficiency and increasing energy consumption.
The Oryon CPU designs a massive 24MB L2 cache for both super and performance cores, providing a larger workbench for workers. Twelve MB is dedicated to the two super cores, while the other 12MB is shared among the six performance cores, allocating a larger workbench to more capable workers.
Workers can quickly access what they need from the workbench, reducing the need to constantly go to the warehouse and minimizing back-and-forth movements. This memory architecture makes the Oryon CPU more efficient and power-saving. In comparison, the ARM-based Dimensity 9400 has a total L2 cache of less than 4MB.
Overall, compared to the previous-generation Snapdragon 8295, the Snapdragon Premium Automotive Platform boasts a 3x increase in CPU and GPU performance and a whopping 12x boost in NPU performance designed for multimodal AI. With a more targeted architecture for new energy vehicles, Qualcomm's new solution reduces energy consumption by 44% while enhancing performance, significantly aiding in reducing overall energy consumption in new energy vehicles. This reduction applies to all non-driving energy consumption, such as the Sentry Mode mentioned earlier, reducing power consumption and allowing developers to invoke the chip for more intelligent functions.
Such freedom, performance enhancements, and reduced power consumption make it possible to integrate intelligent driving and cockpit functions.
"Our cockpit chips are now more powerful and can also integrate equally high-performance intelligent driving chips. With high cost-effectiveness and simple development, why not take a look?" Leveraging its advantages in the cockpit market, technological breakthroughs in integrating cockpit and driving functions with a single SoC, and powerful AI computing capabilities, Qualcomm is once again challenging NVIDIA.
03.
Qualcomm Can't Go Back
The cost of Qualcomm's self-developed CPU architecture is its split with ARM's parent company.
To develop its own architecture, Qualcomm acquired a startup named Nuvia for $1.4 billion in 2021. Nuvia's main business is CPU development, serving high-performance computing chip design, which aligns with Qualcomm's expectations. It was this company that later helped Qualcomm create the Oryon architecture.
Qualcomm's self-development triggered strong dissatisfaction from ARM, as Qualcomm is ARM's second-largest customer, and ARM would not stand idly by and watch its market position be shaken. In the same year Qualcomm acquired Nuvia, 2021, ARM and Qualcomm went to court on the other side of the ocean. The reason was that Nuvia held a more flexible ALA agreement with ARM, and like NVIDIA, Nuvia could also make improvements based on ARM technology.
ARM was concerned that this acquisition might allow Qualcomm to bypass them and obtain the same license as NVIDIA, which would accelerate Qualcomm's progress in developing its self-developed CPU architecture. During the back-and-forth litigation process between the two parties, ARM suspended Nuvia's license in March 2022. However, by this time, Qualcomm had already integrated Nuvia's technology into its products, regardless of whether ARM's technology was used or not.
The tug-of-war between the two parties has yet to reach a conclusion, but Qualcomm has embarked on the path of self-developed CPUs since then. In 2023, Qualcomm's fully self-developed Oryon CPU was created and first integrated into the PC chip Snapdragon X Elite. On October 23, 2024, Qualcomm began to launch second-generation Oryon CPU products, targeting the broader mobile market, including smartphones and automobiles.
The automotive chip market is Qualcomm's focus for the next two years. Besides maintaining its advantage in the cockpit domain, Qualcomm also aims to venture into intelligent driving. For Qualcomm, the path of self-developed CPUs is a one-way street with no turning back.
Qualcomm's breakthrough with ARM has also led to a complete rupture in relations between Qualcomm and ARM's parent company. Also on October 23 this year, it was revealed that ARM had issued a mandatory notice to its long-term partner Qualcomm 60 days in advance, canceling Qualcomm's architecture license agreement.
If Qualcomm loses ARM's license, most of its current products, such as the 8155 and 8295, may have to be withdrawn from the market. On the path of self-developed CPUs, Qualcomm can be said to be fighting a desperate battle.
However, for the Oryon CPU to be ultimately accepted by the market, it needs to form its software and hardware ecosystem, similar to ARM-based chips, and develop its own development kits, all of which require collaboration with partners.
For enterprises with both product and software development capabilities, especially in the automotive industry, obtaining the support of Chinese automakers is crucial. It is not difficult to understand why Li Auto and Great Wall Motors have become Qualcomm's honored guests.
At the summit, Qualcomm announced that it would collaborate with Google to provide a standardized reference framework for developing AI-enhanced digital cockpits and software-defined vehicles (SDVs) using the Snapdragon Digital Chassis and Google's in-car technology. Among automotive manufacturers, both Mercedes-Benz and Li Auto will use Qualcomm's Snapdragon Cockpit Premium platform in the future.
In the smartphone era, Qualcomm and NVIDIA once engaged in a battle, which ended with Qualcomm cutting off the supply of cellular communication chips, forcing NVIDIA to withdraw from the smartphone chip market.
In terms of performance alone, both NVIDIA's Thor and Qualcomm's Snapdragon Premium automotive platform can meet the requirements of high computing power and cockpit integration in the future, both having the potential to become the brain of smart cars. However, the outcome of the competition is likely to be a winner-takes-all scenario. The two parties have once again clashed, and this chip war has only just begun.
This article is original content from Autopix.
Unauthorized reproduction is prohibited.