04/22 2025
551
Produced by Zhineng Technology
The report
After reading this report, I found it quite valuable. Through a systematic analysis of single-channel, symmetric, and asymmetric architectures, it reveals their performance in functional safety (FuSa), safety of the intended functionality (SOTIF), availability, and scalability.
Asymmetric architecture offers significant advantages in fault tolerance and safety in complex scenarios, providing a scientific basis for architecture selection and technical optimization. Combining considerations of hardware and software implementation with cutting-edge technology trends, we look forward to the future development of autonomous driving architectures.
Note: This article focuses on architectural analysis and differs significantly from the architectures currently used by automakers.
01
Autonomous Driving System Architecture
Fundamental Elements Analysis
Using SAE Level 4 Highway Pilot (HWP) as an analysis case, it represents autonomous driving technology with commercial prospects between 2025 and 2028.
HWP encompasses complex driving tasks, including lane keeping (U1), automatic lane changing (U2), traffic jam handling (U3), and speed support up to 130km/h (U6).
◎ The Operational Design Domain (ODD) is clearly defined, with activation requiring manual triggering by the driver (U8) and system status checks. Deactivation or intervention requests are handled through the UI system, ensuring the safety and reliability of human-machine interaction.
◎ The Autonomous Driving Intelligence (ADI) system serves as the core computing unit, working in coordination with the sensor system, actuator system, user interface (UI) system, and diagnostic system.
◎ The sensor system provides high-precision, real-time environmental data through multimodal perception (radar, LiDAR, cameras, etc.), requiring redundancy to cope with single-point failures.
◎ The actuator system is responsible for executing ADI control commands (such as steering, braking, acceleration), requiring low latency and high reliability.
◎ The UI system supports driver interaction with the system, providing status feedback and intervention entry points.
◎ The diagnostic system monitors the status of each subsystem in real-time and generates fault logs to support fault detection and recovery. These subsystems collectively form the autonomous driving ecosystem, and their architectural design directly determines the system's performance and safety.
● The technical requirements of the ADI system cover multi-dimensional performance indicators.
◎ The output must meet timeliness (S1), ensuring that decision-making and control commands are completed within milliseconds to avoid collision risks due to delays; ◎ Availability (S2) requires the system to maintain safe operation under partial failures, for example, maintaining basic functions through redundant channels when sensors fail; ◎ Correctness (S3) and consistency (S4) ensure that commands align with expected logic, avoiding erroneous decisions; ◎ Additionally, perception fault detection (S5) and system diagnosis (S6) functions quickly identify and isolate faults through real-time monitoring and log analysis, preventing fault propagation.
● When designing architectures, one must address technical limitations and follow scientific principles. Limitations include:
◎ Software design defects in large, complex systems are difficult to eliminate completely (G1), such as logical errors hidden in millions of lines of code;
◎ Hardware is susceptible to single-event upsets (G2), such as bit flips caused by electromagnetic interference or cosmic rays;
◎ High-safety systems cannot be verified solely through simulation and testing (G3), requiring a combination of formal verification and redundancy design.
● To this end, design principles emphasize:
◎ Fault Containment Units (FCUs, D1): Modular design distributes functions to independent units, limiting the scope of fault impacts. ◎ Diversity and Redundancy (D3): Using heterogeneous hardware and algorithms (e.g., sensors or AI models from different vendors) to improve fault tolerance. ◎ Simplified Interactions (D5): Reducing unnecessary communication between subsystems to minimize unpredictable behavior arising from complex interactions. ◎ Swiss Cheese Model (D6): Lowering the probability of risk penetration through multi-layered defense mechanisms (e.g., independent verification of perception, decision-making, and execution).
These principles provide a theoretical basis for architectural design, ensuring high reliability and safety in dynamic, unpredictable highway scenarios.
02
Comprehensive Analysis and Evaluation of Candidate Architectures
Autonomous driving system architectures can be classified into three major categories: single-channel, symmetric, and asymmetric, each with its focus on functional implementation, fault tolerance mechanisms, and applicable scenarios.
● Single-channel architectures center around a single Electronic Control Unit (ECU), integrating perception, decision-making, and control functions. Typical examples include Audi's zFAS system (for Level 3 functions) and early designs of Tesla's FSD. Its advantages lie in simple hardware and low development costs, making it suitable for lower-level autonomous driving (such as L2+). However, the single-channel architecture heavily relies on a single computing unit, lacks redundancy mechanisms, and a single-point failure may lead to complete system failure, failing to meet the stringent requirements of ASIL-D functional safety (ISO 26262) for Level 4 and above. Additionally, its scalability is limited, making it difficult to adapt to the diverse needs of complex scenarios.
● Symmetric architectures are characterized by multi-channel parallel computing, typically implemented as majority voting architectures (e.g., Triple Modular Redundancy, TMR). This architecture consists of multiple channels with homogeneous or near-identical functions, with an arbitrator (voter) deciding the output. For example, three channels run perception and decision-making algorithms, and the voter selects the result based on majority consensus. This design effectively handles random hardware failures (such as chip failures), enhancing system availability. However, symmetric architectures are susceptible to common cause failures (CCF), where all channels produce consistent errors due to the same software vulnerability or erroneous sensor data, rendering the voting mechanism ineffective. Moreover, in complex scenarios, different but reasonable results may be generated for the same issue across channels, making it difficult for the voter to accurately arbitrate, limiting its applicability in Level 4 HWP scenarios.
● Asymmetric architectures provide higher fault tolerance and safety through functional decomposition and redundancy design, categorized into the following subtypes:
◎ Channel-based Doer/Checker/Fallback (DCF) architecture: Comprises a Computer-Controlled Driving Subsystem (CCDSS, executing normal driving), Monitoring Subsystem (MSS, real-time checking of CCDSS output), Critical Event Handling Subsystem (CEHSS, executing minimum risk operations such as deceleration and stopping), and Fault-Tolerant Decision Subsystem (FTDSS, selecting the final output). It ensures seamless switching during faults through an Active/Hot Stand-By mechanism, suitable for high-reliability scenarios. ◎ Layered DCF architecture: Introduces Doer/Checker pairs at each layer of perception, decision-making, and execution, forming multi-layer redundancy. For example, the perception layer uses a primary LiDAR and a backup camera for verification, while the decision-making layer compares outputs from heterogeneous AI models, enhancing system robustness. ◎ Distributed Safety Mechanism (DSM) architecture: Implements safety monitoring and control through distributed nodes, each running lightweight safety algorithms independently and collaboratively completing fault detection and recovery, suitable for large-scale, heterogeneous systems.
Through functional separation and diversity design, asymmetric architectures significantly reduce the risk of common cause failures, making them suitable for complex Level 4 scenarios.
● Architecture Evaluation Criteria and Result Analysis
Architecture evaluation is based on the following key criteria, comprehensively measuring technical performance:
◎ Availability: The ability to maintain essential functions under fault conditions. For example, in HWP scenarios, the system is required to safely drive to the shoulder even with a single sensor failure. ◎ Reliability: The consistency in maintaining nominal functions. Frequent functional degradation (e.g., downgrading to L2 mode) reduces user experience and should be minimized. ◎ Cybersecurity: The ability to withstand external attacks. This is achieved by minimizing communication interfaces and implementing encryption and access control to reduce the attack surface. ◎ Scalability: The ability to support different SAE levels and market scenarios, such as expanding from Level 4 HWP to Level 5 urban driving. ◎ Simplicity: Ease of design, verification, and maintenance, lowering development and verification costs. ◎ Safety (FuSa and SOTIF): Compliance with ISO 26262 and ISO 21448 standards, ensuring that functional anomalies or unintended behaviors do not cause accidents.
Evaluation results show that asymmetric architectures excel in multiple metrics. Taking the channel-based DCF architecture as an example, through functional decomposition (separation of CCDSS and MSS) and redundancy design (CEHSS as a hot standby), it significantly outperforms other architectures in availability and safety.
The layered DCF architecture further enhances reliability and SOTIF performance through multi-layer verification mechanisms, suitable for high-complexity scenarios. The distributed nature of the DSM architecture gives it advantages in scalability and cybersecurity, making it suitable for future heterogeneous, cloud-edge collaborative autonomous driving systems.
Symmetric architectures like TMR perform well in tolerating random failures but their sensitivity to common cause failures limits their reliability in complex scenarios. Single-channel architectures excel in simplicity and cost but lack availability and safety, making it difficult to meet Level 4 requirements.
Comprehensive evaluation shows that asymmetric architectures offer the best overall performance in HWP scenarios, particularly in functional safety and safety of the intended functionality.
Summary
Through systematic analysis, this article reveals the core challenges and technical pathways in the design and implementation of autonomous driving system architectures. Asymmetric architectures, with their superiority in fault tolerance, safety, and scalability, emerge as the ideal choice for Level 4 autonomous driving.
Channel-based DCF, layered DCF, and DSM architectures effectively address safety and reliability needs in complex scenarios through functional decomposition, multi-layer redundancy, and distributed design. Architectural optimization must be integrated with specific hardware platforms (such as high-performance SoCs, heterogeneous accelerators) and software stacks (such as ROS2, AUTOSAR Adaptive) to ensure real-time performance and resource efficiency.