03/03 2026
566
Among the technical approaches to autonomous driving, the pure vision solution has gained popularity among many automakers due to its ability to mimic human driving logic and its low hardware costs. However, this perception method, which heavily relies on cameras, experiences a marked decline in performance during nighttime, when the vehicle enters a dark tunnel, or encounters intense backlighting, heavy rain, snow, or fog. Why does lighting have such a significant impact on pure vision autonomous driving?
The Physical Limits of Passive Perception
The pure vision perception system is essentially a passive measurement system that depends on ambient light reflection. The camera itself does not emit energy; instead, it acquires information from external light sources, which are photons reflected off object surfaces after being illuminated by the sun, streetlights, or the headlights of other vehicles.
This operational mode is similar to that of the human eye. When ambient light is sufficient and evenly distributed, the camera can capture rich color, texture, and semantic information, which is crucial for identifying traffic signs, judging road markings, and understanding complex traffic intentions. However, when the light source is absent or the lighting environment becomes extreme, the drawbacks of passive perception become fully apparent.
In contrast, LiDAR and other active sensors function like "vision with a built-in flashlight." LiDAR actively emits controlled laser pulses and receives the energy reflected from targets, using the time-of-flight principle to directly calculate the spatial coordinates of objects. This active detection mechanism allows LiDAR to maintain high perception accuracy even in complete darkness and is largely unaffected by ambient light interference.
In low-light environments, the primary challenge for camera sensors is a sharp decline in the signal-to-noise ratio (SNR). When photons are scarce, the effective signals captured by the sensor may be overwhelmed by thermal noise generated by the circuitry. To "see" objects in the dark, the system must extend exposure time or increase sensitivity (ISO).
Extending exposure time is extremely dangerous in dynamic driving scenarios because the relative motion between the vehicle and targets can cause severe motion blur in the image, making originally clear target outlines appear as ghostly shadows.
On the other hand, blindly increasing sensitivity introduces a large amount of random noise, filling the image with impurities and severely interfering with the backend neural network's ability to extract object features. This "raw material," already physically compromised, is destined to struggle in low-light conditions for the pure vision solution.
Interception and Distortion of Light Waves by Environmental Media
Autonomous vehicles do not operate in a vacuum; light must travel through a complex atmospheric environment on its way from object surfaces back to the camera. Adverse weather conditions such as rain, snow, and fog alter the propagation path of light waves, imposing multiple barriers on visual perception through physical phenomena like scattering, refraction, and absorption.
The impact of fog on vision primarily stems from Mie scattering. The diameter of fog droplets is typically comparable to the wavelength of visible light, and when light waves encounter these tiny water droplets, they scatter strongly in all directions.
This scattering effect has two severe consequences: first, the intensity of light rapidly attenuates during propagation, causing distant objects to disappear from the image; second, background light and ambient light are scattered into a white "curtain," significantly reducing target contrast.
From a signal processing perspective, fog is equivalent to superimposing a large-scale low-pass filter on the image, filtering out most high-frequency details. When neural networks process such images, it is difficult to identify pedestrian edges or lane markings obscured by fog, leading to a sharp decline in recognition confidence or even complete missed detections.
Rainy scenarios introduce another issue. Falling raindrops have extremely high transparency and unique geometric shapes, with each raindrop acting like a tiny spherical lens that refracts and totally reflects light passing through it. This causes local distortion and artifacts in the images captured by the camera.
An even more severe problem occurs on the protective glass covering the camera surface, where adhered raindrops cause large areas of image blur. Because these raindrops are in the near-focus position of the camera, they create severe defocusing, rendering key areas of the image invisible.
In snowy environments, the visual system faces the dual challenges of contrast loss and physical obstruction. Snowflakes have extremely high light reflectivity, causing large areas of overexposure in images under strong illumination; on cloudy days, the lack of sufficient contrast between white snow, white vehicles, and white road signs makes it difficult for perception algorithms to distinguish targets from the background. Additionally, sticky snow may directly cover the camera lens, a physical form of "blindness" that no software algorithm can recover from.
These physical-level interferences directly challenge the pure vision system's ability to model spatial geometric structures. Since cameras cannot strip away environmental noise through precise pulse return times like LiDAR, they must rely on probabilistic predictions to guess the existence of objects among chaotic pixels. In such cases, the interception of light by physical laws effectively cuts off the information source on which the visual system depends.
Image Signal Processor: An Overlooked Source of Information Loss
Even if light successfully penetrates the atmosphere and is captured by the camera sensor, there is still a complex step between the raw electrical signals (RAW data) output by the photosensitive unit and the final colored image (RGB image) that enters the autonomous driving "brain": the image signal processor (ISP).
For a long time, the tuning goal of in-vehicle ISPs has been to serve "human viewing," pursuing visual effects with vivid colors, high contrast, and low noise. However, this pursuit of "aesthetics" is actually a disaster for machine vision algorithms.
The ISP processing pipeline includes multiple stages such as demosaicing, white balance correction, denoising, gamma correction, and tone mapping. In low-light or high dynamic range (HDR) scenarios, the side effects of ISP are particularly evident. To suppress noise in dark conditions, ISPs employ powerful spatial-domain or frequency-domain denoising algorithms. While these algorithms remove random noise, they also indiscriminately erase fine texture details, causing the image to take on an "oil painting" appearance.
For human drivers, this smoothing process may enhance visual comfort, but for deep learning models that rely on pixel-level feature gradients for object detection, it means losing the critical high-frequency information needed to judge object edges.
Another issue lies in dynamic range processing. The brightness span in the natural world can exceed 140 dB, while the dynamic range of mainstream in-vehicle camera sensors is generally around 120 dB. When a vehicle exits a dark tunnel and suddenly faces blinding sunlight, the ISP must adjust exposure parameters in an extremely short time.
Traditional HDR technology achieves high dynamic display through multi-frame exposure synthesis, but this introduces severe motion artifacts at high speeds. Due to the time difference between different exposure frames, fast-moving objects appear as double images or ghostly shadows in the synthesized image, preventing the autonomous driving algorithm from accurately judging object boundaries.
Additionally, the tone mapping and gamma correction performed by the ISP are essentially nonlinear information compression processes. To map the 20-bit or 24-bit high dynamic RAW data captured by the sensor into an 8-bit or 10-bit RGB space, the ISP forcibly compresses the contrast in shadow and highlight regions.
In this process, subtle brightness differences that were clearly distinguishable in the RAW domain are forcibly merged into the same pixel value. This mathematically irreversible loss deprives the perception network of the possibility of "microsecond-level detection" in extreme lighting scenarios.
This mismatch between "human-eye-oriented" and "machine-oriented" processing is a major contributing factor to the poor performance of pure vision solutions in extreme scenarios. Currently, some technical solutions are attempting to bypass traditional ISPs and directly use RAW domain data for end-to-end object detection training to preserve all the original information from the photosensor, indirectly proving the limitations of traditional processing pipelines in addressing lighting challenges.
The Cognitive Boundaries of Deep Learning in Extreme Scenarios
Pure vision autonomous driving relies on deep learning algorithms; however, the performance of object detection models based on convolutional neural networks (CNNs) or Transformers is highly dependent on the distribution of training data. When facing significantly deteriorated lighting conditions, "cognition" at the algorithmic level also exhibits serious biases.
The basis for neural networks to extract object features lies in the contrast gradients between pixels. In situations of intense backlighting or direct nighttime high-beam illumination, light produces severe "glare" and "blooming" effects. When an extremely bright point light source (such as the high beams of an oncoming vehicle) shines on the sensor, the generated charge overflows into adjacent pixels, causing large areas of bright spots in the image.
This phenomenon not only obscures the texture of the obstacle itself but also completely destroys its geometric outline. When high-frequency components in the feature map disappear due to overexposure or extremely low brightness, the convolutional kernel fails to capture effective activation signals, causing the system to logically "ignore" the existence of the obstacle.
Additionally, the only way for monocular pure vision systems to obtain depth is through algorithmic estimation. The model infers distance by identifying object types and combining empirical values of "closer objects appear larger" or changes in road texture. However, in extremely dark nights, road textures are almost invisible, and object visual features are distorted by noise interference.
Under such conditions, the algorithm's depth estimation becomes highly unstable. Even if the system identifies a pedestrian ahead, it may fail to accurately judge the distance, leading to malfunctioning emergency braking decisions. In high-speed scenarios, a distance deviation of just a few meters can be enough to determine whether an accident occurs.
There is also a deeper issue: current pure vision models are essentially performing a form of "pattern matching." When 99% of the scenarios in the training dataset are sunny, well-lit highways, the model develops a prior bias.
When it encounters bizarre outlines produced by dramatic light and shadow alternations at the entrance of a nighttime tunnel, the model may incorrectly classify them as non-threatening shadows or road debris. This lack of generalization ability for long-tail scenarios (Edge Cases) is a fundamental gap that the pure vision solution must overcome to achieve L4 or higher levels of autonomous driving.
Final Thoughts
From the low signal-to-noise ratio of passive perception to the interception of photons by atmospheric media, from data cropping during ISP processing to the cognitive helplessness of neural networks in the face of feature loss, each link accumulates errors in the perception of pure vision autonomous driving. Although the boundaries of pure vision solutions are continuously expanding with the application of sensors with larger dynamic ranges, the introduction of end-to-end RAW domain perception, and the supplementation of cross-modal training data, the "lighting blind spots" determined by its physical properties remain a core issue that the industry must carefully address when balancing safety and cost.
-- END --