06/19 2024
532
End-to-end in China will transform from a buzzword into reality.
Recently, Chentao Capital, in collaboration with several organizations, released the 2024 "End-to-End Autonomous Driving Industry Research Report" (hereinafter referred to as the "Report"), suggesting that domestic autonomous driving companies' modular end-to-end solutions may achieve mass production on vehicles in 2025.
The "Report" divides the development of "end-to-end" into four stages: perception "end-to-end," decision-making planning modeling, modular end-to-end, and One Model end-to-end.
The expected landing time for the One Model end-to-end system is 1 to 2 years later than modular end-to-end, starting from 2026 to 2027 for mass production on vehicles.
This aligns with the plans of leading companies in China.
Currently, leading domestic passenger vehicle autonomous driving companies, including Huawei, Xpeng, Yuanrong Qixing, and SenseTime, have publicly announced their plans for end-to-end autonomous driving solutions to be implemented on vehicles in 2024-2025.
From buzzword to reality, how will end-to-end be implemented?
01
What is end-to-end?
The concept of end-to-end has existed for a long time. In the autonomous driving industry, its initial core definition was "a single neural network model from sensor input to control output."
However, after research, the "Report" believes that the core definition standard of end-to-end should be: lossless transmission of perception information, which can achieve global optimization of the autonomous driving system. It can be divided into four stages: perception "end-to-end," decision-making planning modeling, modular end-to-end, and One Model end-to-end.
Both the first and second stages can achieve lossless transmission of perception information and gradient conduction.
Although modular end-to-end is still divided into multiple modules, each module can be modeled to achieve lossless transmission of perception information and global optimization. Modular end-to-end still belongs to the type of discriminative/supervised learning.
One model is a generative/autoregressive model based on the concept of generative large models and relies on LLM or world models.
Why is the autonomous driving industry now starting to undergo a technological transformation towards end-to-end?
Wang Panqu, the intelligent driving partner of ZeroOne Auto and the former head of perception at Tusimple, mentioned during the discussion on the release of the "Report" that traditional algorithm systems are facing many challenges in practical implementation:
Modularization leads to complex architecture. Traditional algorithm systems usually have thirty to forty modules, with low upper limits for each module but high difficulty in information transmission and system optimization, leading to conflicts between local and overall optimization goals between modules and the system;
Research and development costs. Development/maintenance/labor costs soar as the number of modules increases;
Poor generalization. Superimposed rules to cope with delivery pressure lead to deterioration in maintainability and scalability;
Difficulties in large-scale productization and implementation. Current mainstream products can only operate in limited scenarios (such as specific cities/demonstration zones/highways) rather than on a large scale. Deep integration of algorithms with hardware and software makes it difficult to be compatible with more vehicle models/platforms/scenarios;
End-to-end provides a new solution.
As a product, end-to-end has a strong ability to respond to long-tail scenarios in autonomous driving and is more anthropomorphic. Moreover, based on the end-to-end architecture, it will "simplify organizational structure, optimize development process efficiency, and dismantle departmental walls," said Liu Yudong, an investment manager at Chentao Capital.
In practice, Tesla's FSD V12, which applies an end-to-end neural network architecture, has improved its average takeover mileage from 166 miles to 333 miles.
At the same time, the development of large language models and generative AI indicates the feasibility of data-driven autonomous driving tending towards AGI.
According to the "Report," the Chinese intelligent driving/autonomous driving industry has reached a strategic consensus on end-to-end, and most have fully embraced or actively pre-researched end-to-end, but there are still significant differences in future expectations.
For implementation, aggressive estimates suggest within 2 years, while conservatives believe it will take more than 5 years.
Interestingly, Cathy Wood's judgment on the possibility of Tesla failing to launch autonomous driving taxis within the next five years is "minimal."
Half of the respondents believe that end-to-end is the ultimate technological route and will have a disruptive impact on the existing industry landscape, while the other half disagree.
02
Conditions for the implementation of end-to-end
While there is strategic agreement, implementing end-to-end still faces numerous challenges.
The "Report" believes that the implementation of end-to-end also faces many challenges, including technical routes, data and computing power requirements, testing and validation, and organizational resource investment.
Data is the first challenge.
Current public information shows that Tesla has mined tens of thousands of hours of video data from over 2 billion miles as training data for FSD. An autonomous driving engineer also mentioned that they found that only 2% of the originally accumulated road test data was usable when training the end-to-end model.
Apart from the amount of training data, how to achieve a data loop for end-to-end has also become a new challenge.
In the past, in traditional modular systems, each model task was very specific, such as detecting traffic lights. After machine learning, millions of traffic light data could be used for closed-loop verification and iteration.
However, for an end-to-end system facing the global task of "driving," a truly "closed-loop" verification method is to let the trained system drive directly on the road and receive feedback from the real world.
But obviously, no one dares to let the system "close the loop" in the real world before ensuring high safety. At the same time, the significantly reduced takeover rate of end-to-end systems also makes it more difficult to rely on "closed-loop" methods such as test vehicles.
Tesla founder Elon Musk recently stated at a shareholder meeting that without a fleet of hundreds of thousands of vehicles like Tesla's, using shadow mode for closed-loop testing of its system, other companies cannot participate in this "game."
Regarding the challenges of data volume and closed-loop, Xie Chen, the founder and CEO of Guanglun Intelligence, said, "Only Tesla has Tesla's data scale and capabilities. Synthetic data is the most effective way to address the shortage of end-to-end data."
Synthetic data needs to meet the requirements of visual and physical authenticity, agent interactivity, and scale efficiency. "Guanglun's synthetic data can achieve high-level closed-loop reproducibility and precise generalization, enabling multi-agent high interaction. Synthetic data will be the primary data source for large models within the next three years. Guanglun Intelligence empowers every company with Tesla's data capabilities, multiplying data value by 100 times."
Computing power is another obvious challenge.
In the survey conducted for the "Report," most respondents indicated that 100 high-computing-power GPUs (such as A100) could start the first stage of end-to-end training.
However, based on the practices of Tesla and other leading players, it is evident that the demand for training computing power to create a good end-to-end system is significantly higher than this order of magnitude.
During the 2024 Q1 earnings call, Tesla stated that the company already has 35,000 H100 GPUs and plans to increase to over 85,000 H100 GPUs in 2024, reaching the same tier as Google and Amazon.
With this expected scale, Musk recently stated that Tesla is no longer facing computing power constraints.
In China, Xpeng's "Fuyao" autonomous driving computing center has a computing power of up to 600 PFLOPS (estimated based on NVIDIA A100 GPU's FP32 computing power, equivalent to about 30,000 A100 GPUs) and has announced an investment of $100 million this year for computing power construction, with further increases planned each year.
SenseTime's large-scale facility has deployed a nationwide integrated intelligent computing network, with 45,000 GPUs and an overall computing power scale of 12,000 PFLOPS, which is expected to reach 18,000 PFLOPS by the end of 2024.
Most companies developing end-to-end autonomous driving currently have training computing power scales in the thousands of cards.
03
Domestic implementation of end-to-end
Currently, domestic OEMs, autonomous driving algorithm, and system companies have all presented their end-to-end systems, and some have even achieved mass production on vehicles or fixed-point implementation.
In the first half of the year, Huawei and Xpeng successively announced their end-to-end systems.
Huawei's Qiankun ADS 3.0 technical architecture uses a large perception network based on GOD (General Object Detection) for perception and a PDP (Prediction-Decision-Planning) network for pre-decision making and planning.
Xpeng's end-to-end large model consists of the neural network XNet, the planning large model XPlanner, and the large language model XBrain. After the end-to-end large model is implemented on vehicles, Xpeng's intelligent driving capabilities will increase by 30 times within 18 months, with internal iterations of the intelligent driving model occurring every two days.
Among them, Xpeng's end-to-end model has been pushed out since May.
ZeroOne Auto, which operates in the truck business, has also launched a pure vision end-to-end autonomous driving system based on large models. The entire system uses cameras and navigation information as input, decodes through a multimodal large language model to generate planning and control signals and logical reasoning information, reducing system complexity by 90%.
ZeroOne plans to deploy end-to-end autonomous driving by the end of 2024, achieve mass production on both commercial and passenger vehicles in 2025, and aims to achieve large-scale commercial operation of high-level autonomous driving in 2026.
SenseTime Absolute Shadow is currently one of the few intelligent driving enterprises that practice one-stage end-to-end. In trying to reasonably explain all scenarios, they found that the number of perception and control interfaces that need to be defined is endless, while the upper limit of one-stage end-to-end capabilities is higher. "Therefore, when we first started developing the end-to-end solution, we advanced it in a one-stage manner," said Zhao Xianglei, Director of Intelligent Driving Products at SenseTime Absolute Shadow.
SenseTime Absolute Shadow's end-to-end solution "UniAD"
At the Beijing Auto Show, SenseTime Absolute Shadow launched the end-to-end autonomous driving solution "UniAD" for mass production, which does not require high-precision maps and can observe and understand the external environment like a human through data learning and driving, think for itself, and make decisions to drive like a human, autonomously solving various complex urban driving scenarios.
At the same time, SenseTime Absolute Shadow also released its next-generation autonomous driving technology DriveAGI, which is an "One Model end-to-end" improvement and upgrade of the end-to-end intelligent driving solution based on a multimodal large model.
During the Beijing Auto Show, Du Dalong, co-founder and CTO of Jianzhi Robotics, stated that Jianzhi's original end-to-end autonomous driving model GraphAD is ready for mass production deployment and is currently under joint development with leading automakers. "The reason why we call the end-to-end paradigm GraphAD is because Jianzhi uses a graphical structure to display modeling targets, including the relationship between dynamic and static obstacles—this makes end-to-end model training easier and further reduces the demand for data volume."
The "Report" predicts that, based on leading players in the autonomous driving industry proposing end-to-end mass production plans, modular end-to-end systems are expected to start being implemented on vehicles in 2025, which will drive upstream technological progress, market evolution, and industrial restructuring.
Technologically, the implementation of end-to-end will promote acceleration in its upstream toolchains, chips, and other dependencies.
On the market side, the improved autonomous driving experience brought by end-to-end will increase the penetration rate of high-level assisted driving; due to its strong generalization, end-to-end may also drive the application of autonomous driving across geographical regions, countries, and scenarios.
In terms of industrial structure, end-to-end further enhances the importance of data and AI talent, potentially giving rise to new industrial divisions and business models.