The Ambitions of Lixiang Auto in Intelligent Driving

07/08 2024 503

/// Can Lixiang Auto leapfrog Tesla in areas where Tesla has not ventured?

Editor: Xiao Ying

Lacking a technological moat, Lixiang Auto has begun to focus on intelligent driving.

On July 5, at the 2024 Intelligent Driving Summer Conference, Lixiang Auto announced that it would push out "nationwide available" mapless NOA (Navigate on Autopilot) to all Lixiang AD Max users within July.

At the same time, it will also push out fully automatic AES (Automatic Emergency Steering) and omnidirectional low-speed AEB (Automatic Emergency Braking).

Fan Haoyu, Senior Vice President of Lixiang Auto's Product Department, said that from pushing out the first batch of 1,000 experience users in May this year to expanding the scale of experience users to 10,000 in June, Lixiang Auto has accumulated over one million kilometers of mapless NOA driving mileage across the country, and the mapless NOA will be fully pushed out to 240,000 Lixiang AD Max car owners.

In addition, at this conference, Lixiang Auto also released a brand-new autonomous driving technology architecture based on end-to-end models, VLM visual language models, and world models.

This architecture is the first solution in the industry to deploy dual systems on the vehicle side, and it is also the first time that the VLM visual language model has been successfully deployed on the vehicle-side chip.

In June this year, at the 2024 China Automotive Chongqing Forum, Lixiang Auto CEO Li Xiang said that Lixiang Auto would launch nationwide available mapless NOA in the third quarter of this year and introduce an end-to-end + VLM (Visual Language Model) supervised autonomous driving system trained with 3 million clips to test users.

Currently, Huawei and Xpeng have already pushed out nationwide available mapless high-level urban driving assistance functions, and NIO has also pushed out NOP+ urban functions. After fully pushing out mapless NOA, Lixiang Auto's intelligent driving capabilities are catching up with the industry's first tier.

He also said that the earliest by the end of this year, or the latest by the beginning of next year, Lixiang will fully push out [supervised L3-level autonomous driving], and definitely achieve [unsupervised L4-level autonomous driving] within three years.

The end-to-end solution is still in the exploratory stage. According to currently exposed information, Xpeng has already announced the push of the end-to-end model in May, and Huawei's ADS 3.0 is also about to be first deployed on the Avita model. Whether Lixiang Auto can achieve a latecomer's advantage will become a major highlight in the intelligent driving race in the second half of the year.

01

Full Push of Mapless NOA

From urban high-level intelligent driving competitions to Tesla's FSD end-to-end solution, the domestic intelligent driving technology roadmap is basically clear: eliminate high-precision maps and fully bet on the end-to-end solution.

At the same time, when the technology begins to converge, how to improve the experience effect of intelligent driving and make users' intelligent driving as smooth as an experienced driver has become a problem that all automakers need to solve at this stage.

Lixiang's intelligent driving products and technology roadmap have not deviated from industry trends.

The four major capabilities of the mapless NOA launched by Lixiang this time, summarized in Lixiang's words, are: drive anywhere, smooth detours, easy intersections, and reassuring tacit understanding.

Specifically, thanks to the comprehensive improvement of perception, understanding, and road structure building capabilities, Lixiang Auto's mapless NOA has gotten rid of its dependence on prior information. That is to say, as long as there is navigation coverage in the city nationwide, Lixiang's users can use NOA, even on narrow alleys and rural roads.

At the same time, in avoiding and detouring road obstacles, Lixiang's mapless NOA, based on spatio-temporal joint planning capabilities, can make the overall performance smoother.

According to Lixiang's introduction, spatio-temporal joint planning achieves simultaneous planning of both horizontal and vertical spaces, and plans all drivable trajectories within the future time window by continuously predicting the spatial interaction relationship between the ego vehicle and other vehicles. Based on learning from high-quality samples, the vehicle can quickly select the optimal trajectory and execute detour actions decisively and safely.

At complex urban intersections, Lixiang's mapless NOA's route selection capability has also been significantly improved.

Mapless NOA uses a BEV (Bird's Eye View) visual model integrated with a navigation matching algorithm to perceive real-time changes in road curbs, road arrow markings, and intersection features, and integrates lane structure and navigation features to solve the problem of difficult structuring at complex intersections, while also possessing ultra-long-range navigation route selection capabilities, which can make intersection passage more stable.

At the same time, Lixiang's mapless NOA also focuses on users' psychological safety boundaries.

That is, through the occupancy network of LiDAR and visual front fusion, the vehicle can identify irregular obstacles within a larger range with higher perception accuracy, thereby achieving earlier and more accurate predictions of the behavior of other traffic participants.

Thanks to this, the vehicle can maintain a reasonable distance from other traffic participants, with more appropriate timing for acceleration and deceleration, effectively enhancing users' sense of safety while driving.

In addition to mapless NOA, Lixiang also announced upgrades in active safety capabilities at this conference, with the upgrades mainly focusing on four aspects:

First, dedicated AEB (Automatic Emergency Braking) for multi-target, multi-trajectory complex urban intersections;

Second, improved night AEB capabilities, stopping at 120 km/h for stationary trucks without lights;

Third, fully automatic AES (Automatic Emergency Steering) function, which automatically triggers emergency steering when braking cannot avoid impact in physically extreme scenarios, avoiding the front target without human involvement in steering operations;

Fourth, omnidirectional low-speed AEB, providing 360-degree active safety protection for parking and low-speed driving scenarios.

Lixiang Auto's strength has always been in product definition, with refrigerators, TVs, and large sofas, allowing Lixiang Auto's sales to remain at the forefront of the industry. In June, Lixiang Auto's monthly delivery volume rebounded to over 47,000 units, with the sales of its main product, the Lixiang L6, reaching 20,000 units.

However, from a technical capability perspective, Lixiang Auto has always been criticized by the industry. This conference revealed that Lixiang Auto is seeking a breakthrough in intelligent driving.

Next, to continuously maintain a competitive advantage and achieve a breakthrough in the intelligent driving race, the focus is still on continuously improving the experience effect of intelligent driving, which relies on breakthroughs in its end-to-end solution.

02

Intelligent Driving, the Decisive Battle of End-to-End

Current autonomous driving system solutions are mostly based on rule-based algorithms, creating a modular architecture. The entire algorithm process is: identify a problem, find the problem, and solve the problem, which leads to a large amount of data and code generation, and a more complex and cumbersome system architecture.

In comparison, end-to-end actually establishes a complete learning system, continuously learning directly from raw data and generating the required output, without the need to manually decompose tasks into multiple intermediate steps, which is the fundamental reason why it is respected by automakers.

However, Lixiang's end-to-end system is not just one, but adopts a dual-system strategy. Lang Xianpeng, Vice President of Lixiang Auto's Intelligent Driving, said that this system architecture is mainly inspired by Nobel Prize winner Daniel Kahneman's theory of fast and slow systems, simulating human thinking and decision-making processes in the field of autonomous driving to form a more intelligent and anthropomorphic driving solution.

In simple terms, the core idea of Lixiang Auto's autonomous driving roadmap is to use one Orin X chip for end-to-end, which is the fast system, and another Orin X for VLM, which is the slow system.

System 1, the fast system, is good at handling simple tasks, which are human intuitions based on experience and habits, mainly implemented by end-to-end, with efficient and fast response capabilities, able to handle 95% of conventional driving scenarios.

System 2, the slow system, is the logical reasoning, complex analysis, and computational abilities formed by humans through deeper understanding and learning, used to solve complex or even unknown traffic scenarios when driving, implemented by VLM (Visual Language Model). It receives sensor inputs, outputs decision information to System 1 after logical thinking, accounting for about 5% of daily driving.

Lixiang Auto believes that the mutual cooperation between System 1 and System 2 can ensure high efficiency in most scenarios and a high ceiling in a few scenarios, becoming the basis for human cognition, understanding of the world, and making decisions.

In terms of specific technical capabilities, Lixiang Auto also introduced the algorithm architecture of the end-to-end model, the VLM visual language model, and the world model.

Among them, the input of Lixiang's end-to-end model is mainly composed of cameras and LiDAR, with multi-sensor features extracted and fused through the CNN backbone network and projected into BEV space. To improve the model's representation capabilities, Lixiang Auto has also designed a memory module with both temporal and spatial memory capabilities.

In the model's input, Lixiang Auto has also added vehicle status information and navigation information, which are encoded by the Transformer model and decoded together with BEV features to identify dynamic obstacles, road structure, and general obstacles, and plan driving trajectories.

In the output process, there is no rule intervention in the end-to-end model, so it has significant advantages in information transmission, reasoning calculations, and model iteration.

In actual driving, the end-to-end model demonstrates stronger general obstacle understanding capabilities, ultra-long-range navigation capabilities, road structure understanding capabilities, and more anthropomorphic path planning capabilities.

The algorithm architecture of Lixiang Auto's VLM visual language model is composed of a unified Transformer model, encoding Prompt (prompt word) text with a Tokenizer (tokenizer), and encoding visual information from front-view camera images and navigation map information, and then performing modal alignment through a text-image alignment module to ultimately perform self-regressive reasoning and output understanding of the environment, driving decisions, and driving trajectories, which are transmitted to System 1 to assist in vehicle control.

Official data shows that Lixiang Auto's VLM visual language model has 2.2 billion parameters.

However, this parameter volume is not much for a large model and is even only at the level of a small model's parameter volume. For example, GPT-3 has reached 175 billion parameters, and GPT-4 has reached the trillion level. Domestically, models like Baidu's Wenxin Yiyan and Tencent's Hunyuan have already reached the 100 billion level.

According to Lixiang's official statement, the VLM visual language model has a strong ability to understand complex traffic environments in the physical world and can handle unknown scenarios for the first time.

In addition, the VLM model can also identify environmental information such as road smoothness and lighting, prompt System 1 to control vehicle speed, and ensure safe and comfortable driving. The VLM model also has a stronger ability to understand navigation maps and can cooperate with the infotainment system to correct navigation and prevent driving errors.

At the same time, the VLM model can also understand complex traffic rules such as bus lanes, tidal lanes, and time-limited traffic restrictions and make reasonable decisions while driving.

In addition to the end-to-end and VLM models, Lixiang Auto also demonstrated a world model that combines reconstruction and generation techniques, using 3DGS (3D Gaussian Splatter) technology to reconstruct real data and using generative models to supplement new perspectives.

During scene reconstruction, dynamic and static elements are separated, the static environment is reconstructed, and dynamic objects are reconstructed and new perspectives are generated. After re-rendering the scene, a 3D physical world is formed, where dynamic assets can be arbitrarily edited and adjusted to achieve partial generalization of the scene.

Compared to reconstruction, generative models have stronger generalization capabilities, and conditions such as weather, lighting, and traffic flow can be customized to generate new scenarios that conform to real-world patterns, used to evaluate the adaptability of autonomous driving systems under various conditions.

The combination of reconstruction and generation creates an excellent virtual environment for learning and testing the capabilities of autonomous driving systems, enabling the system to have efficient closed-loop iteration capabilities and ensuring the system's safety and reliability.

This series of considerations and planning clearly shows Lixiang Auto's determination to focus on end-to-end.

However, at this stage, although end-to-end can integrate multiple modules and simplify more code, as it is a "black box" state, no one can clearly explain what the specific internal processes are, which presents more "inexplicability".

This means that while theoretically promising, it is uncertain whether the end-to-end architecture can arrive as scheduled, after all, even Tesla, which is so strong, has not yet achieved the landing of full autonomous driving.

Can Lixiang Auto leapfrog Tesla and achieve what Tesla hasn't?

03

Ambition and Reality

Lixiang Auto has always been cautious in its R&D investment. According to Lixiang Auto's financial report, in 2022, Lixiang Auto's R&D accounted for 15%, while NIO's R&D accounted for 22% and Xpeng's R&D accounted for over 19% during the same period.

In 2023, Lixiang Auto's R&D expenditure was 10.59 billion yuan, accounting for 8.5% of revenue. In contrast, NIO's R&D investment in 2023 was 13.43 billion yuan, accounting for 24% of revenue.

This once led to Li Xiang being criticized as the "stingiest" carmaker. One inevitable impact of insufficient R&D investment is insufficient investment in "future technology," lacking core technological advantages beyond current products.

Although from the perspective of the conference, Lixiang Auto has quite a bit of ambition in the intelligent driving race, whether Lixiang can have such determination and intensity in real investment remains to be seen.

Lixiang Auto has always invested little in the intelligent driving race, and the number of personnel in its intelligent driving R&D department is also the smallest.

Previously, affected by the failure of the MEGA product, Lixiang Auto began a large-scale layoff mode, with an overall proportion exceeding 18% and over 5,600 people optimized.

Among them, the intelligent driving department became a hard-hit area. According to media reports, by early June, the total number of employees in Lixiang's intelligent driving department had been reduced to less than 800, with an overall department layoff ratio exceeding 30%.

By comparison, according to relevant data statistics, as of May this year, the number of Huawei, BYD, Xpeng, and NIO's intelligent driving teams exceeded 7,000, 4,000, 3,000, and 1,300 people, respectively.

According to reports, Lixiang's internal intelligent driving team is currently divided into two lines: mass production R&D and algorithm R&D:

The algorithm R&D team, managed by Jia Peng, is primarily responsible for the research and development, implementation of mapless urban NOA, and pre-research on end-to-end intelligent driving;

The mass production R&D team, led by Wang Jiajia, is primarily responsible for maintaining high-speed NOA based on high-precision maps on older models and optimizing any issues that arise during the implementation process with existing algorithms.

Although Lixiang Auto has recalled some members of the intelligent driving team, compared to companies like NIO, Xpeng, and Huawei, its overall team size is small, investment is low, and hesitation about future intelligent driving strategies have brought uncertainty to its future development in intelligent driving.

In fact, since MEGA went public, the market has consistently harbored doubts about Li Auto, which have gradually evolved into questions about the sustainability of Li Xiang's product capabilities. More importantly, while Li Auto has managed to maintain its basic sales volume, it faces strong competition from brands like Wenjie.

From a sales perspective, Li Auto sold 47,774 units in June, while Wenjie sold 43,146 units. Although Li Auto remains in the lead, the gap between it and Wenjie is not significant.

After the MEGA setback, Li Auto delayed the release of its all-electric products and quickly launched the Li L6 to maintain its basic sales volume. However, aside from the L series models, what other competitive advantages does Li Auto have?

Could intelligent driving become the new moat for Li Auto? Time will tell. The smart driving race is far from over. Tesla's FSD is about to enter China, and the end-to-end solutions from Xpeng and Huawei are also being gradually implemented. Who will gain the upper hand remains to be seen.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.