12/25 2025
380
Ali Tongyi has just announced the open-sourcing of its Qwen-Image-Edit-2511 image editing model.
This new version is an upgrade of Qwen-Image-Edit-2509, incorporating several key enhancements:
In Qwen-Image-Edit-2511, the model is capable of performing imaginative edits on input portraits while maintaining the subject's identity traits and visual style.
Additionally, the consistency of group photos featuring multiple individuals has been further refined in Qwen-Image-Edit-2511. The model can seamlessly merge two separate character images into a cohesive and high-fidelity group photo:
Since the introduction of Qwen-Image-Edit, the community has crafted a plethora of creative and high-quality LoRA models, significantly expanding its expressive range. Qwen-Image-Edit-2511 directly incorporates some of the most popular and well-selected LoRAs into the base model, allowing their effects to be utilized without the need for extra fine-tuning.
For instance, the Lighting Enhancement LoRA can instantly achieve realistic lighting control:
You also have the option to use the base model to generate new perspectives directly:
The research team has also given special consideration to practical engineering applications, such as batch industrial product design:
And material substitution for industrial components:
In terms of reasoning capabilities, Qwen-Image-Edit-2511 introduces enhanced geometric reasoning abilities—for example, it can directly generate auxiliary construction lines for design or annotation purposes:
Qwen-Image serves as the foundational model for image generation within the Qwen series, achieving notable advancements in complex text rendering and precise image editing.
In the technical report released by Tongyi Qianwen in August, the research team outlined a comprehensive data pipeline, encompassing large-scale data collection, filtering, annotation, synthesis, and balancing.
They employed a progressive training strategy, commencing with non-text to text rendering, gradually progressing to simple text input, and ultimately expanding to paragraph-level descriptions. This curriculum learning method significantly boosted the model's innate text rendering capabilities.
Qwen-Image not only excels in alphabetic languages like English but also demonstrates significant progress in more complex ideographic scripts such as Chinese.
To further enhance image editing consistency, the team previously introduced an improved multi-task training paradigm. This approach encompasses not only traditional text-to-image (T2I) and text-image-to-image (TI2I) tasks but also image-to-image (I2I) reconstruction tasks.
In multiple public benchmark tests, QwenImage has achieved state-of-the-art performance, showcasing its formidable capabilities in image generation and editing. References:
https://www.modelscope.cn/models/Qwen/Qwen-Image-Edit-2511
https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-Image/Qwen_Image.pdf