NeurIPS`25 | Lamenting the Thief of Time! Nankai & Samsung Unveil Open-Source Cradle2Cane: Flawlessly Resolving the "Age-ID" Conundrum!

11/21 2025 366

Interpretation: AI Shapes the Future

Highlights

Directly Tackling Key Issues: Introducing the "Age-ID Trade-off": This paper conducts an in-depth analysis of the inherent conflict between "age accuracy" and "identity preservation" in face aging tasks. Existing methods frequently struggle to strike a balance between these two aspects, while the proposed framework aims to break free from this zero-sum game.

Innovative Two-Pass Decoupling Architecture: Presenting the Cradle2Cane Framework. The first stage utilizes Adaptive Noise Injection (AdaNI) to concentrate on "aging," whereas the second stage employs Identity-Aware Embedding (IDEmb) to focus on "identity recovery." This divide-and-conquer approach yields a synergistic effect, where the combined outcome is greater than the sum of its parts (1+1>2).

Ingenious Feature Decoupling Design: Introducing SVR-ArcFace and Rotate-CLIP Modules. By leveraging Singular Value Reweighting (SVR) to eliminate age-related interference in ArcFace features and using Spherical Linear Interpolation (Slerp) for seamless age attribute transfer in CLIP space, precise separation and control of identity and age features are achieved.

Efficient Inference Powered by SDXL-Turbo: Capitalizing on the Benefits of a Few-Step Diffusion Model, high-quality images can be generated in a mere 0.56 seconds. This method maintains exceptional fidelity while perfectly enabling smooth transformations across the entire age spectrum, from infancy to old age.

Problems Solved

For an extended period, the face aging task has been plagued by a core challenge: how to flawlessly preserve the original identity information while significantly altering age features? The authors term this challenge the "Age-ID Trade-off."

Traditional GAN methods (e.g., Lifespan, CUSP): Although they perform reasonably well in identity preservation, they often produce blurry results when dealing with large age spans (e.g., from infancy to old age) and struggle to realistically simulate skin texture and bone structure changes.

Existing Diffusion methods (e.g., FADING): While they generate high-quality images, they frequently suffer from severe identity drift, resulting in the "aged version" resembling a completely different individual. As depicted in Figure 1, the performance curves of existing methods often exhibit a "trade-off" trend—the higher the accuracy of age transition, the lower the identity similarity, and vice versa. Breaking this status quo to achieve high-fidelity aging across the entire lifespan is the central problem addressed in this paper.

Proposed Solution

This paper proposes a two-stage (Two-Pass) diffusion framework based on SDXL-Turbo, known as Cradle2Cane. The core concept of this solution is "decoupling": dividing age transformation and identity preservation into two separate subtasks for optimized performance.

First Stage (1st Pass): Adaptive Noise Injection (AdaNI)

The objective of this stage is "precise aging." The authors discovered that the larger the age span, the more significant the required image structural changes. Consequently, the AdaNI mechanism dynamically adjusts the noise level injected into the latent space based on the disparity between the target age and the source age.

Small Span: Inject low noise for subtle texture adjustments.

Large Span: Inject high noise to enable the model to make substantial modifications to facial shape, wrinkles, and hairlines. Although this step sacrifices some identity information, it establishes the foundation for generating realistic aging features.

Second Stage (2nd Pass): Identity-Aware Embedding (IDEmb)

The goal of this stage is "identity recovery." Building upon the image generated in the first stage, the model incorporates IDEmb for denoising guidance. IDEmb comprises two innovative modules:

SVR-ArcFace: Traditional ArcFace features are intertwined with age information. The authors employ Singular Value Reweighting (SVR) technology to suppress age-varying components in the features, extracting a purer "identity core."

Rotate-CLIP: Given that CLIP text features possess directionality, the authors propose performing "rotation" in CLIP space. Through spherical interpolation (Slerp) instead of simple vector subtraction, features are smoothly guided toward the target age while maintaining semantic consistency.

Achieved Effects

Balanced Performance

Experiments conducted on the CelebA-HQ and CelebA-HQ (in-the-wild) datasets demonstrate that Cradle2Cane achieves state-of-the-art (SOTA) performance under both Face++ and Qwen-VL multimodal large model evaluation protocols. Notably, on the HCS (Harmonic Consistency Score) metric, this method significantly outperforms baseline models such as SAM, CUSP, and FADING.

Natural Visual Effects

Whether it's the deepening of wrinkles, skin sagging, or hair color changes (e.g., graying), Cradle2Cane generates extremely natural details. More importantly, even when processing in-the-wild images with occlusions, profile views, or complex lighting conditions, the model still delivers impressive performance.

Flexible Application Extensions

Thanks to the flexibility of two-stage editing, in addition to independently altering age, this method can also simultaneously perform face attribute editing. For instance, it can seamlessly apply various attributes such as glasses, green hair, or hats while continuously aging the face, significantly enhancing the diversity and controllability of generated images.

More Results

Summary

The success of Cradle2Cane underscores that in the era of generative AI, structured decoupling designs tailored for specific tasks still harbor immense potential. By abandoning the traditional "one-step" approach and adopting a coarse-to-fine strategy—first focusing on age transition and then enhancing identity—the research team from Nankai University and Samsung presents a novel framework for diffusion-based face aging tasks to address the long-standing "Age-ID trade-off" problem. This represents not only an algorithmic innovation but also highlights the vast application prospects of AI in digital entertainment, film and television special effects, and even social welfare fields like locating missing persons.

References

[1] From Cradle to Cane: A Two-Pass Framework for High-Fidelity Lifespan Face Aging

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.