01/14 2025
469
Autonomous driving technology is advancing at an unprecedented pace, with groundbreaking innovations emerging annually. In 2024, the autonomous driving industry witnessed the rise of several new technical approaches, including City NOA (Navigate on Autopilot), Robotaxi, end-to-end solutions, perception-heavy and map-light methodologies, and pure vision systems. These advancements signify the transition of autonomous driving from a conceptual framework to a tangible reality. Let's delve into the technological highlights of the autonomous driving industry in 2024!
City NOA: The Gateway to Refined Urban Driving
City NOA represents a significant leap for Advanced Driver Assistance Systems (ADAS), enabling them to navigate complex urban scenarios, thereby extending the reach of autonomous driving from highways to city streets. Urban roads, replete with dynamic elements and unpredictability such as heavy traffic, intersections, pedestrians, non-motorized vehicles, and intricate traffic signal systems, pose greater challenges compared to highway settings. City NOA's primary objective is to intelligently adapt to this dynamic environment through seamless coordination of perception, localization, decision-making, and execution, offering users a safer and more seamless driving experience.
Technologically, City NOA leverages multi-sensor fusion, artificial intelligence algorithms, and deep integration with high-precision maps. Multi-sensor fusion serves as the cornerstone for precise perception, with LiDAR, millimeter-degree omnidirectional awareness. LiDAR excels in constructing high-resolution 3D point cloud maps, detecting obstacle positions and shapes; millimeter-wave radar maintains stable performance in adverse weather, detecting dynamic targets at long distances; and cameras recognize crucial elements like traffic signs, signal lights, and lane lines through rich visual information. These data, synchronized in time and space, coalesce into a highly integrated perception result, supporting subsequent decision-making and planning.
High-precision maps and localization technologies are vital for City NOA's precise navigation. These maps contain detailed static environmental data, including lane positions, road gradients, speed limit signs, and intersection layouts, providing vehicles with precise navigation references. Through data fusion from GNSS (Global Navigation Satellite System), IMU (Inertial Measurement Unit), and vehicle wheel speed sensors, City NOA achieves centimeter-level localization accuracy, ensuring accurate lane keeping and path planning in complex urban settings. Despite challenges in map production, dynamic updates, and the emergence of the "light on maps, heavy on perception" mantra in 2024, high-precision maps remain indispensable in the autonomous driving industry, playing a crucial role in enabling City NOA technology.
Artificial intelligence algorithms are central to City NOA's intelligent decision-making and dynamic response. Deep learning technology efficiently converts perception data into driving behavior decisions, such as avoiding pedestrians, automatic lane changes, and traffic signal compliance. Transformer-based and reinforcement learning algorithms have demonstrated remarkable performance in behavior prediction and multi-objective decision-making in urban environments, significantly enhancing City NOA's robustness in dynamic scenarios. Iterative algorithm optimization reduces the response time from perception to action, enabling vehicles to respond in real-time to emergencies, like sudden entries by vehicles or pedestrians.
The practical implementation of City NOA is an ongoing journey, fraught with challenges. High-precision maps demand significant update and maintenance costs, especially in rapidly evolving urban environments where timely map data becomes crucial. Sensor collaboration may falter in extreme weather conditions. Moreover, deep learning models still struggle with long-tail problems, making it difficult to handle extreme yet high-risk scenarios, such as sudden traffic accidents or unconventional traffic instructions.
To overcome these hurdles, the industry is exploring Vehicle-to-Everything (V2X) technology to bolster City NOA capabilities. By connecting with intelligent transportation infrastructure, vehicles obtain real-time dynamic traffic information, such as signal light changes and road construction updates, reducing reliance on high-precision maps and enhancing emergency response capabilities. With advancements in computing chip performance and the integration of cloud computing, City NOA algorithms can be iteratively optimized more efficiently, further refining perception and decision-making capabilities.
In 2024, City NOA achieved deep technology-scenario integration, significantly advancing intelligent driving applications in urban roads. From perception to decision-making, and from navigation to execution, every technical aspect is evolving towards greater refinement and intelligence. Despite technological bottlenecks and practical challenges, the maturation of V2X technology, improved high-precision map update mechanisms, and continuous AI algorithm optimization position City NOA as a vital component of future urban transportation, driving the transition of intelligent driving technology from "usable" to "user-friendly" and bringing Robotaxi closer to reality.
Robotaxi: Pioneer in Autonomous Commercialization
As a trailblazer in the commercialization of driverless technology, Robotaxi emerged as one of the key technological highlights in the autonomous driving field in 2024. Baidu's Robobus, for instance, garnered significant social media attention, sparking conversations about the potential replacement of human drivers by technology.
By integrating advanced autonomous driving technology with shared mobility models, Robotaxi showcased its unique technological advantages and market potential, paving the way for the widespread adoption of driverless technology. Robotaxi projects serve not only as technology validation platforms but also as crucial links in transitioning autonomous driving from laboratories to real-world applications, encompassing the entire chain from perception and decision-making to operation and service.
Technologically, Robotaxi relies on sophisticated perception, decision-making, and execution systems. Multi-sensor fusion technology underpins its high-level environmental perception, with LiDAR, cameras, and millimeter-wave radar synergistically generating real-time high-resolution maps of dynamic urban environments, encompassing complex elements like pedestrians, vehicles, and traffic signs. In 2024, the decreased costs and enhanced performance of LiDAR significantly bolstered Robotaxi's perception capabilities, enabling more precise handling of extreme scenarios like night driving, adverse weather conditions, and heavy traffic.
Deep learning-based perception algorithms further optimized target object classification and trajectory prediction, providing more reliable data support for driving decisions in dynamic scenarios. For decision-making and planning, Robotaxi relies on AI algorithms to process multimodal data, generating safe and efficient driving strategies. Traditional rule engines fall short in complex urban traffic, while reinforcement learning and generative model-based algorithms have become mainstream. End-to-end technology also gained prominence in 2024, with these algorithms, trained through large-scale simulations and real-road testing, accurately predicting surrounding target behaviors and swiftly formulating optimal driving strategies. In path planning, Robotaxi algorithms balance shortest path selection with passenger comfort, safety, and traffic efficiency, delivering a refined service experience.
Robotaxi operation relies on efficient fleet management and scheduling systems. Cloud-based centralized management enables real-time interaction between vehicles and transportation infrastructure, optimizing scheduling and path planning. V2X technology accelerated Robotaxi implementation in 2024, providing real-time traffic data, signal light statuses, and emergency warning information, significantly enhancing operational efficiency and safety. Through machine learning and big data analysis, the operation platform optimizes fleet distribution and resource utilization, ensuring service coverage during peak hours and in key areas.
In 2024, Robotaxi achieved large-scale pilot operations in multiple cities, notably in China and the United States. For instance, Baidu's Robobus established hundreds of Robotaxi stations in Wuhan, covering commercial centers, subway stations, and residential areas, with daily order volumes steadily increasing. The Robotaxi commercialization model is gradually maturing, integrating with traditional ride-hailing platforms to reduce per-service operating costs and address driver shortages through automation, thereby enhancing shared mobility's service quality and reliability. Operators are also exploring diversified revenue streams, such as advertisement placement and data services, to mitigate commercialization risks.
Despite its widespread presence, Robotaxi's full-scale promotion faces challenges. Technical long-tail issues, like handling extreme scenarios, require gradual resolution through larger-scale road testing and algorithm optimization. Policy and regulatory uncertainties remain significant obstacles, with varied autonomous driving laws and regulations across countries and regions, particularly regarding liability, testing permits, and regulatory requirements. Robotaxi's profitability is also under scrutiny, with the industry grappling with cost control, operational efficiency, technical reliability, and user experience.
Looking ahead, Robotaxi will mature through technological advancements and innovative operation models. With V2X technology's popularization, enhanced high-precision map dynamic updates, and increasing computing power, Robotaxi's technological bottlenecks are expected to diminish. On the commercialization front, more enterprises will explore diversified operation models, potentially integrating with public transportation systems to offer holistic urban transportation solutions. Gradual policy improvement and international standardization cooperation will clear the path for Robotaxi's large-scale deployment. As a pioneer in autonomous driving commercialization, Robotaxi not only accelerates driverless technology's development but also reshapes the future urban travel ecosystem, laying the foundation for an intelligent and sustainable transportation system.
End-to-End Solutions: Simplifying Autonomous Driving Systems
As an innovative approach in autonomous driving architecture, end-to-end solutions have garnered widespread industry attention. In 2024, this solution became a favored technology among autonomous driving enterprises, with its omission from technology launch events seemingly placing one behind the curve.
The core of end-to-end solutions lies in directly converting sensor input data into driving decisions through artificial intelligence models, bypassing traditional autonomous driving systems' hierarchical module design. This approach aims to simplify system architecture, reduce intermediate complexity, and enhance overall system efficiency and adaptability. Compared to traditional solutions, end-to-end solutions replace hierarchical perception, prediction, planning, and control modules with a unified deep learning model, marking a bold step in autonomous driving's technical landscape.
From a technological implementation standpoint, the crux of end-to-end solutions lies in the creation and training of deep learning models. By gathering and annotating extensive real-world driving data, these models can learn the intricate mapping from raw sensor inputs (like camera images and LiDAR point clouds) to driving commands (such as steering angle and acceleration/deceleration control). This mapping enables end-to-end prediction of driving behavior through multi-layered neural network feature extraction and optimization. Notably, in 2024, the advent of Transformer models further accelerated the advancement of end-to-end technology. Unlike traditional Convolutional Neural Networks (CNN), Transformers excel at capturing long-range dependencies in complex traffic scenarios, thereby enhancing the system's decision-making prowess in dynamic environments.
The merits of end-to-end solutions are most evident in system simplification and model refinement. Traditional hierarchical architectures, due to their modular design, necessitate the establishment of numerous intricate interfaces between perception, prediction, planning, and control, which may lead to information loss or miscommunication. Conversely, end-to-end solutions streamline the entire process through a unified model, circumventing the efficiency losses associated with multi-module communication in hierarchical systems. End-to-end models are highly data-driven and can continually optimize performance through extensive data training, especially in addressing long-tail issues. Through reinforcement learning in complex scenarios, these models can adeptly handle extremely rare traffic events, such as pedestrians abruptly entering the road or unusual traffic signs.
However, end-to-end solutions also confront numerous challenges. Model interpretability remains a pivotal concern in the industry. Since end-to-end solutions forgo traditional modular designs, driving behavior formation is entirely handled by black-box neural networks, posing challenges for safety and debugging. In the event of an accident, pinpointing the issue's source becomes difficult, impacting user and regulatory trust. Additionally, the scale and quality of data are crucial for end-to-end solution performance. Compared to traditional methods, end-to-end models require larger, higher-quality training datasets, intensifying demands on data collection, cleaning, and annotation. Moreover, end-to-end solutions still struggle with multimodal data fusion, such as integrating camera, LiDAR, and high-precision map data within a unified processing framework.
In 2024, end-to-end technology will undoubtedly witness a surge, with leading enterprises successfully deploying end-to-end autonomous driving systems in limited scenarios like campus shuttles and specific urban road segments. These systems showcase remarkable operational efficiency and safety due to their straightforward architecture and adaptable decision-making capabilities. Advances in autonomous driving simulation technology also provide crucial support for end-to-end solutions. Through large-scale simulations, R&D teams can more rapidly train and validate the reliability and robustness of end-to-end models.
Looking ahead, as computing power and algorithms advance, the potential of end-to-end solutions will continue to unfold. The issue of model interpretability is expected to gradually diminish with the introduction of visualization technology and causal reasoning methods. The challenge of data demand can be addressed through collaborative data annotation and the use of Generative Adversarial Networks (GANs) to generate high-quality virtual data. End-to-end solutions may coexist with traditional hierarchical architectures, forming a hybrid system that leverages each other's strengths in diverse scenarios.
As a novel approach to system simplification, end-to-end solutions are gradually propelling autonomous driving technology towards greater efficiency and intelligence. Despite dual challenges in technology and application, their potential in system efficiency and data-driven performance optimization positions them as a key research direction in the autonomous driving industry. With increasing corporate participation and continuous technological iteration, end-to-end solutions are poised to become a cornerstone of future intelligent driving technology, paving the way for the realization of true driverless vehicles.
Heavy Perception, Light Map: An Attempt to Break Free from Map Dependence
Heavy Perception, Light Map represents an emerging technical approach in autonomous driving, aimed at reducing reliance on high-precision maps by enhancing vehicles' perception capabilities for more flexible and efficient autonomous operation. This concept centers on minimizing the need for pre-made and real-time updated high-precision maps, instead shifting data processing and decision-making to the vehicle's perception system. This approach gained significant traction in the first half of 2024, becoming a hot topic in the industry, particularly showcasing its unique advantages in complex urban settings.
Traditional autonomous driving systems rely heavily on high-precision maps to provide detailed environmental information, including lane positions, traffic signs, and road gradients. However, the cost of creating and maintaining these maps is high, and there are often delays in updating dynamic environments such as temporary constructions or road changes due to accidents. The Heavy Perception, Light Map approach addresses these limitations by enhancing vehicles' perception capabilities, enabling them to autonomously comprehend their surroundings and thus reduce map dependence. This method emphasizes the use of real-time sensor data from cameras, LiDAR, and millimeter-wave radars to dynamically construct environmental models around the vehicle through advanced perception algorithms, enabling precise autonomous decision-making.
Technically, the core of Heavy Perception, Light Map lies in efficient environmental perception and dynamic mapping capabilities. Multi-sensor fusion technology serves as the foundation, enabling vehicles to generate real-time 3D environmental models by integrating visual data from cameras, depth information from LiDAR, and speed data from millimeter-wave radars. At the algorithmic level, deep learning-based perception models can rapidly identify and classify static and dynamic objects on the road, such as lane lines, pedestrians, obstacles, and traffic signs.
The integration of reinforcement learning and generative models further empowers vehicles to swiftly adapt to environmental changes. Dynamic mapping technology plays a pivotal role in this process, constructing local maps based on real-time sensor data rather than relying on pre-set information. These local maps are updated at a high frequency, reflecting changes in the surroundings promptly, thereby enhancing the vehicle's adaptability and safety.
In urban environments, where unexpected situations and dynamic changes like illegal parking, construction zones, and temporary traffic signs are common, traditional high-precision maps struggle to keep up with timely updates. The perception algorithms of Heavy Perception, Light Map can recognize these changes in real-time and plan routes based on the latest environmental data. For instance, in the event of a sudden construction zone, the vehicle can identify construction signs and barriers through its perception system, swiftly adjusting speed and driving path without waiting for map updates.
Despite its promising technical concept, the practical implementation of Heavy Perception, Light Map faces several technical and application challenges. One key issue is the robustness of the perception system. Under extreme weather conditions like heavy rain, fog, and snow, cameras and LiDAR may have limited perception capabilities, leading to inaccuracies in environmental model construction. This necessitates further improvements in sensor performance and algorithmic fault tolerance. Dynamic mapping technology also faces significant computational burdens in complex road networks, especially at high speeds or with dense sensor data, demanding more powerful onboard computing resources. Additionally, since vehicles' autonomously perceived local environmental information may differ from that of other vehicles or infrastructure, the integration of Vehicle-to-Everything (V2X) technology will be crucial for the development of Heavy Perception, Light Map. By combining autonomous perception with external information, the accuracy and reliability of environmental understanding can be further enhanced.
In 2024, several companies began exploring the commercial application of Heavy Perception, Light Map technology, with many automakers also experimenting with the light map approach. This not only reduces the cost of map creation and maintenance but also enables vehicles to more flexibly adapt to new scenarios and markets. The technology lays the groundwork for future low-cost autonomous driving solutions, fostering the popularization of autonomous driving.
Overall, the Heavy Perception, Light Map approach introduces a novel development paradigm to the autonomous driving industry. By minimizing reliance on high-precision maps and enhancing vehicles' environmental perception capabilities, it not only effectively lowers the cost of autonomous driving but also significantly improves the system's adaptability and flexibility in dynamic environments. Although technical challenges remain, the widespread adoption of this technology will offer more possibilities for the popularization and commercialization of autonomous driving, propelling the industry towards more efficient and flexible development.
Pure Vision Solution: Exploration of a Minimalist Technical Path
As a minimalist approach in autonomous driving, the pure vision solution endeavors to rely solely on cameras for environmental perception, target detection, and driving decisions, eliminating the need for high-cost hardware like LiDAR. Inspired by human driving behavior, which primarily relies on visual judgment, advancements in modern computer vision technology provide the foundation for this approach. In 2024, significant strides were made in exploring the pure vision solution, positioning it as an important branch of autonomous driving technology and a focal point of industry interest.
The crux of the pure vision solution lies in constructing a multi-modal visual perception system based on cameras. This system processes vast amounts of real-time image data collected by cameras through deep learning models to achieve a precise understanding of the vehicle's surroundings. The core visual algorithms encompass target detection, semantic segmentation, object tracking, and depth estimation. Target detection identifies and locates key objects like vehicles, pedestrians, and traffic signs; semantic segmentation classifies each pixel in the image as road, obstacle, or other categories; object tracking technology continuously monitors the position changes of dynamic targets; and depth estimation compensates for traditional cameras' inability to directly obtain distance information, calculating the distance between objects and the vehicle using monocular or binocular vision algorithms. These technologies collectively form a complete perception loop, enabling the pure vision solution to match the performance of multi-sensor fusion solutions in specific scenarios.
From a hardware perspective, the pure vision solution relies on low-cost cameras, typically including multi-directional forward, rear, and side-view cameras to form a 360-degree perception system. Compared to LiDAR and millimeter-wave radars, cameras offer higher resolution and detail capture capabilities, particularly in identifying traffic signs, lane lines, and complex environmental textures. Furthermore, cameras are significantly less expensive than sensors like LiDAR, making the pure vision solution advantageous in reducing the hardware cost of autonomous vehicles, which is crucial for mid-to-low-end market penetration. Tesla's continued commitment to its camera-centric "Tesla Vision" solution in 2024 has further accelerated the industry-wide adoption of the pure vision approach.
The pure vision solution's strength lies in its simplified architecture and substantially reduced cost. By eliminating complex hardware components like LiDAR, the system's structure is more streamlined with lower maintenance costs, significantly alleviating the economic burden on vehicle manufacturing and operation. The rapid iteration capability of visual algorithms allows for system performance upgrades via Over-The-Air (OTA) updates without major hardware modifications. Moreover, the minimalist approach of the pure vision solution enhances its adaptability to diverse market demands, fostering more flexible technology deployment.
As with the previously discussed cutting-edge technologies, the limitations of pure vision solutions are apparent. Visual sensors are highly susceptible to lighting conditions, and their performance can be severely hampered in extreme weather scenarios, including bright sunlight, shadows, rain, snow, and fog. In completely dark environments, cameras struggle to function normally, making pure vision solutions less effective in scenarios such as tunnels and night driving. Additionally, accurately obtaining depth information remains a technical hurdle for this approach. Despite advancements in depth estimation technology, visual depth estimation still exhibits larger errors compared to LiDAR, which directly captures 3D point clouds. The pure vision solution necessitates substantial computational resources for dynamic target prediction and trajectory planning, placing stringent demands on onboard computing power. Insufficient computing power can lead to a decline in system performance.
Nonetheless, the commercial application of pure vision solutions is gradually being realized in 2024, particularly by companies like Tesla, which have successfully tapped into the potential of this approach. The future development of pure vision solutions will hinge on the concerted progress of algorithms, computing power, and sensor performance. The introduction of advanced deep learning models, such as Transformers, will further enhance the recognition capabilities of visual algorithms. Meanwhile, higher-performance onboard chips will alleviate the strain on computational resources. By integrating Generative Adversarial Networks (GANs) and simulation technology, the training efficiency and generalization ability of visual algorithms will undergo significant improvements.
As a minimalist and cost-effective autonomous driving solution, pure vision is steering the industry towards greater popularization and economization. While it still confronts challenges in environmental adaptability and technical reliability, its potential in the mid-to-low-end market and specific scenarios has been proven. With ongoing breakthroughs in algorithm and hardware technology, pure vision solutions are poised to become a vital component of the autonomous driving industry, contributing fresh impetus to the widespread adoption of driverless technology.
In 2024, the vigorous development of technological trends such as City NOA, Robotaxi, end-to-end solutions, Heavy Perception, Light Map, and pure vision solutions underscores the autonomous driving industry's in-depth exploration, spanning from technology research and development to large-scale application. These technology hotspots have not only accelerated the technical maturity of autonomous driving but also laid a solid foundation for future innovations in travel modes. As we move into 2025, with declining hardware costs, further algorithm optimization, and an improved policy environment, autonomous driving technology is poised for explosive growth across multiple fields. What new technologies will emerge in the autonomous driving industry next? We welcome your comments and suggestions!
-- END --