Why is "Intermediate Representation" Vital for Autonomous Driving?

01/19 2026 499

When delving into autonomous driving technology, the term "Intermediate Representation" often crops up. Unlike well - known hardware components such as LiDAR, onboard cameras, and millimeter - wave radar, or software concepts like large models, end - to - end systems, and algorithms, "Intermediate Representation" is a relatively abstract notion. So, what precisely is Intermediate Representation, and what function does it serve?

What is Intermediate Representation?

To grasp the concept of "Intermediate Representation," we can start by analyzing the characteristics of the information processed by the system. In autonomous driving systems, sensors supply data in diverse forms. Cameras produce color images; LiDAR generates 3D point clouds; millimeter - wave radar offers target distance and speed information; GPS and inertial measurement units provide position and motion status. Although these raw data encompass all the details of the scene, they are voluminous and disorganized, making direct use for decision - making challenging. Hence, the system must transform these raw inputs into information that is more convenient for reasoning and utilization.

To attain this objective, the vehicle gathers a large volume of raw data from various sensors. This data cannot be directly employed for decision - making and necessitates a series of processing steps to become accurate and usable information. This processed data or information, which lies between the raw data and the final driving decision, is what we denote as "Intermediate Representation."

For instance, the position of a lane line identified from sensor images and point clouds, the relative speed and distance of a vehicle behind, and the current state of a traffic light are all examples of Intermediate Representation. They are more meaningful than raw pixels and point coordinates but are not yet final control instructions. They represent the system's "understanding and summary" of the immediate environment. By converting raw data into these representations, the autonomous driving system can concentrate on environmental elements that have a practical impact on driving, reducing the processing burden and reliance on irrelevant details.

Intermediate Representation is sometimes also referred to as Intermediate Representation, akin to the concept of intermediate language used in compilers in computer science. Both act as an intermediate layer between raw input and final output to enhance the efficiency and analyzability of the entire process. For autonomous driving, a well - designed Intermediate Representation must enable subsequent modules to make predictions and plans more easily and accurately without losing critical information.

In brief, Intermediate Representation is responsible for processing raw data into more meaningful information within autonomous driving systems. It can take various forms, such as semantic descriptions of 2D scenes, positional information of objects in 3D space, or predictions of future behaviors. It is more refined than raw perception data and closer to environmental understanding than final control outputs.

The Role of Intermediate Representation in Different Architectures

Autonomous driving systems adopt various architectures, which exhibit subtle differences in defining and using Intermediate Representation. In traditional modular systems, Intermediate Representation is explicitly defined and passed on. The Intermediate Representation output by one module serves as the input for the next module, creating a clear and observable information flow. This design makes the autonomous driving system easier to debug, verify, and optimize.

In such architectures, the Intermediate Representation output by the perception module includes both static features (such as lane lines and obstacle positions) and dynamic features (such as object motion speed and trends). This information is passed to the prediction module in a standard format, which then uses it to judge scene changes over a future period. Subsequently, the planning module determines the vehicle's next safe and reasonable trajectory based on the prediction results. In modular systems, Intermediate Representation is a predefined information format that allows each module to be developed and tested independently. Since the modular design enables the inspection of intermediate outputs at each stage, if there is an error in the perception output, it will naturally affect prediction and planning. Therefore, this design facilitates the identification of specific fault points.

Image source: Internet

Currently, end - to - end applications are widespread. These systems attempt to bypass explicit intermediate modules and directly map sensor inputs to control outputs. This approach can reduce the complexity of manual design in certain scenarios but also leads to poor interpretability. The internal processing of end - to - end systems is concealed within an overall neural network model. Without explicit Intermediate Representation, it is difficult to ascertain the specific environmental features on which the system bases its decisions. This black - box decision - making is detrimental to safety verification and debugging.

To strike a balance between the flexibility of end - to - end learning and the interpretability of the system, a "two - stage end - to - end" approach has been proposed. In this architecture, the network first learns to convert raw inputs into an Intermediate Representation that is conducive to driving decisions and then uses this Intermediate Representation to generate control instructions. For example, the system can first generate a semantic bird's - eye view map and a set of predicted trajectory points as Intermediate Representation and then use this information to generate the final control output. This approach maintains the advantages of end - to - end learning while improving model interpretability and system reliability through explicit Intermediate Representation.

Common Forms and Roles of Intermediate Representation

To gain a better understanding of Intermediate Representation, let's examine its specific forms in the system and the reasons for their usefulness. Different Intermediate Representations carry different information focuses, but their common goal is to transform sensor information into data that is more helpful for subsequent tasks.

A common form of Intermediate Representation is geometric semantics. This type describes the spatial structure of the environment, such as the geometric shape of roads, the position of lane lines, the location of curbs, and the bounding boxes of vehicles and pedestrians. These data essentially answer the questions "what is around" and "where are they." For the planning module, understanding this geometric information is fundamental to determining the vehicle's passable space and path.

Another form is semantic understanding, which includes information such as traffic signs, traffic light states, and road types. This type of Intermediate Representation helps the system understand scene semantics when making path selections and behavioral decisions. If the system knows that the traffic light ahead is red rather than green, it will stop instead of continuing forward. This information is semantic rather than purely geometric data.

More advanced Intermediate Representations encompass dynamic prediction information, which not only describes the current environment but also predicts possible future changes. For example, the system may predict the likely position of a vehicle ahead in the next few seconds based on its current speed and motion direction. These prediction results, combined with probabilities, become important references for the planning module's decision - making. Without dynamic information prediction, the vehicle can only make decisions based on the current instantaneous state, which would deprive the autonomous driving system of insight into future risks.

The design and selection of these Intermediate Representations are not haphazard but are determined by the core judgment abilities required for autonomous driving to operate on actual roads. Good Intermediate Representations enable the system to make more accurate and safer decisions in complex and dynamically changing road environments.

Why Focusing on Intermediate Representation is Important for Autonomous Driving

Understanding Intermediate Representation helps deepen our comprehension of the essence of autonomous driving systems. Simply feeding camera or radar data into a large model does not automatically yield driving instructions. The system needs to convert raw data into information that expresses environmental states and then make predictions, plans, and controls based on this representation. Intermediate Representation is not only an information bridge for engineering implementation but also the cornerstone of performance and safety guarantees.

In modular design, Intermediate Representation makes the functions at each stage clearer and easier to verify. If a particular representation performs unstably in a certain scenario, that specific stage can be optimized. This clear layering also facilitates integration with traditional control theory, thereby enhancing the overall robustness and controllability of the autonomous driving system.

In learning - driven approaches, explicit Intermediate Representation can provide richer supervision signals, enabling the model to learn not only control but also correct scene understanding. For example, during the training phase, labeled Intermediate Representation can be used as an additional constraint to prevent the model from focusing solely on the final control result while neglecting a correct understanding of the scene itself.

The importance of Intermediate Representation is also evident in the safety assessment of autonomous driving. Since autonomous driving systems require extensive testing and verification before being deployed on actual roads, explicit representation allows testing to cover more extreme scenarios and makes it easier to diagnose the system's weaknesses in certain types of scenes.

Final Thoughts

Intermediate Representation is a core information structure within autonomous driving systems. It connects the perception layer and the decision - making layer, playing a crucial role in enabling the entire system to understand the environment and make correct judgments. Although the forms and roles of Intermediate Representation vary slightly across different technical architectures, they all undertake the task of transforming vast amounts of raw sensor data into information that is meaningful for future behaviors. Understanding Intermediate Representation helps us grasp the design logic, performance boundaries, and engineering implementation of autonomous driving technology. As autonomous driving technology continues to evolve, the design and optimization of Intermediate Representation remain important directions that require ongoing attention.

-- END --

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.