Year-End Review: Who Will Be China's 'Nano Banana'?

12/17 2025 462

© Original Content by YouJie UnKnown

Editor | Qian Jiang

Looking back at 2025, the most significant shift in the AI industry was not in models but in Agents truly integrating into workflows.

In November, Nano Banana tore open a gap in the creative tools sector upon its release. Rather than simply assisting with design, it redefined designers' work methods, enabling AI to deliver scalable, usable outputs for the first time.

Nano Banana essentially overhauled all design-related workflows. Similarly, domestic AI Agents rapidly penetrated office scenarios across industries—drafting documents, creating PPTs, editing podcasts, organizing data reports—with more tasks being automated by Agents.

This year, the work methods of office workers were quietly rewritten.

Overseas, multimodal office and creative tool matrices like Microsoft Copilot, Google Gemini, and Notion AI have solidified. Domestic players followed suit: Baidu Wenku and Netdisk launched GenFlow3.0, Kingsoft introduced WPS.AI, Alibaba rolled out QianWen and Kuake, and ByteDance debuted Kouzi Space, integrating document writing, PPT creation, data processing, image generation, and automated distribution into one-stop multimodal Agent systems.

But questions arise—when AI becomes ubiquitous, what do users truly need from an Agent? A complete replacement of existing workflows? Proactive work design? Stacked automation capabilities? Or expanded creativity?

More critically, after all major players enter the field—who will become 'China's Nano Banana,' transforming how Chinese people work? Who will truly define the next generation of creative and office scenarios?

To answer these, we tested five mainstream domestic Agent products: Baidu GenFlow3.0, Tencent ima, Kingsoft WPS.AI, Kouzi Space, and Kuake. Our evaluations revealed three emerging generational tiers:

1. Capable of producing complete workflows;

2. Capable of forming data closed loops;

3. Capable of continuously accumulating cognition and memory.

Currently, only two have reached the third stage: GenFlow3.0 and Kouzi Space.

Three Benchmarks for Evaluating Agents

Before answering further, we must understand how AI has transformed production and creativity.

Traditionally, whether early Office suites or later SaaS platforms (e.g., Canva), their structures were simple—editing suites at the core, layered with creative templates, assets, and collaboration tools.

In this phase, productivity remained human-driven; creative platforms were merely 'toolboxes.'

After ChatGPT's debut, Microsoft pioneered integrating AI into Office, initiating the first transformation of traditional tools. Early Agents handled text processing, polishing, expansion, and content generation. This year, with multimodal model maturity, Agents began deeply intervene [JWR1] (can we say 'intervening in'?) the entire creative process, propelling AI from standalone tools toward one-stop 'intelligent pipelines.'

Under this backdrop, Agent architectures grew more complex. Based on our research, current one-stop Agents roughly divide into three tiers:

This three-tier structure forms the critical framework for future AI Agents to transition from assistants to leaders.

Within this new architecture, three Agent evaluation benchmarks are paramount:

First, multimodal one-stop generation capability. AI applications are shifting from 'one tool, one problem' to 'one product, multiple tasks.' As Sam Altman noted in a recent interview: 'Most users prefer a single, efficient AI service that adds value across their lives, so ChatGPT must continuously expand features.' Manus' viral success earlier this year accelerated this trend, making general-purpose, multifunctional integration an industry consensus.

Second, knowledge base integration. What truly differentiates Agents and builds barriers lies not in large models, prompts, or tools but in their ability to mobilize data at scale. If materials, corpora, and user preferences can be systematically precipitate ed into knowledge bases—enabling cross-material retrieval, cross-style understanding, and cross-task transfer—then each creation builds on a learnable, evolvable knowledge structure, allowing AI to improve iteratively like humans.

Third, human-AI collaboration. This is the biggest distinction between Agents and traditional AI tools. Traditional tools are human-controlled; creators' will directly shapes outputs. Agents, however, co-create with humans—acting as partners, assistants, or co-pilots. This demands humans engage more proactively and comprehensively with the traditional 'AI black box,' making human-AI collaborative editing even more critical.

Thus, we can scientifically assess an AI Agent's competitiveness using three core criteria:

1. Does it offer sufficiently rich AI creative tools/Agents to support full-modality, one-stop content generation?

2. Does it have a robust knowledge base capable of precipitate [JWR2] (can we say 'accumulating'? ) knowledge and memory and feeding this data back into creation?

3. Does it provide a sound human-AI collaboration framework enabling full synergy?

Using these criteria, we systematically evaluated five mainstream Agents and consolidated our findings into this overview:

Product Testing: Who Is China's 'Nano Banana'?

As AI truly enters creative and office scenarios, the first differentiator in user experience lies not in model parameters or algorithmic power but in whether it can complete tasks end-to-end.

The primary metric for this is platform functionality completeness. The table below shows each platform's support for creative depth and task breadth:

From feature coverage, GenFlow3.0 is the only platform with full-modality creative capabilities, covering all core functions of mainstream AI creative platforms. Kuake and WPS rank second, supporting most common creative and office scenarios. In contrast, Kouzi Space and ima have gaps in multimodal abilities, document toolchains, and professional features, remaining in continuous improvement phases.

However, as AI task coverage converges, a second differentiator emerges: the core conflict shifts from 'Can AI generate?' to 'Can humans intervene and correct anytime?'

In this dimension, the key metric for collaboration depth is Office ecosystem compatibility:

Testing reveals clear distinctions here. Excluding WPS, which leverages its native office software advantage, GenFlow3.0 is the only AI platform achieving 'native-level compatibility': it supports direct Office format output and cross-tool, cross-device editing links, truly closing the loop from 'AI generation' to 'execution.'

GenFlow3.0 adopts a unique 'dual-mode editing' strategy:

Lightweight adjustments: When generating documents or PPTs, its right preview window supports conversational modifications—highlighting Excel columns, generating radar charts, or adjusting PPT formats via voice commands.

Deep refinement: For complex layouts, a one-click switch to 'Advanced Editing' mode accesses a professional Office-like interface, with seamless operational and functional continuity.

In contrast, Kouzi Space, Kuake, and ima offer preview windows and basic editing but lack full Office toolkit support, struggling with deep document processing.

Beyond generation capabilities, human-Agent collaboration smoothness forms the third experience differentiator. The core question: Does the system empower humans to intervene and optimize during execution?

True creativity is nonlinear—a dynamic, spiral process of thinking while doing and revising logic as inspiration strikes. This 'spiral ascent' mindset defines human work.

Regrettably, most Agents still follow rigid 'one-way execution' logic: once started, they're hard to interrupt. If users spot deviations mid-task, they must wait for completion before restarting, wasting computational power and time.

To mitigate 'process black box' risks, mainstream products adopt compromise 'pre-confirmation' strategies. For example, GenFlow3.0, Kouzi Space, ima, and Kuake require generating outlines and visual styles before PPT or long-document creation, proceeding only after user approval.

Of course, 'pre-communication' isn't universal. Products like WPS still use traditional command logic: users issue demands, and AI executes in a 'black box' until delivering final results.

Notably, GenFlow3.0 showcased the most differentiated interaction in our tests. Beyond standard 'pre-confirmation,' it pioneered 'breakpoint continuation'—allowing users to pause and intervene mid-generation.

During our year-end summary test, we intentionally omitted key information, paused midway, and added instructions to 'highlight annual performance.' GenFlow3.0 didn't mechanically restart; it understood the new request and seamlessly continued from the previous logic flow.

This 'dynamic correction' capability marks AI collaboration's shift from 'command-based' to 'interactive.'

Thus, human-AI relations have qualitatively changed: AI is no longer a mere output tool but has entered a 'employee-like' state of being managed, corrected, and deeply collaborated with.

However, when AI handles long-chain tasks with frequent user interruptions and corrections, a foundational technical challenge arises: How can it remember context, follow new instructions, and improve iteratively?

This touches AI's core memory hub—the knowledge base.

While knowledge bases are now standard in Agent products, the true experience gap lies in whether they function as rigid 'warehouses' or dynamic 'knowledge engines.' Ideal systems should unobtrusively accumulate knowledge during reading, thinking, and creation.

Thus, our evaluation focuses not on 'existence' but on two critical loops: 'how knowledge is acquired' and 'how it's applied.' We scanned each platform's performance:

In 'knowledge acquisition,' ecological DNA creates three distinct moats:

First, ima leverages Tencent's ecosystem: it supports one-click Tencent Docs import and directly integrates WeChat Official Account article collections. This uniquely accesses high-value content trapped in WeChat's private domain.

Kuake's strength lies in its browser gateway: screenshot and word-highlighting functions minimize fragmented information collection barriers.

In contrast, GenFlow3.0's advantage is not just breadth but 'specialization.' Relying on Baidu Wenku and Baidu Academic's vast databases, it accesses exclusive content—professional documents, official templates, exam question banks, and academic papers—often gated behind paywalls or permissions. These 'hard-to-find' resources form GenFlow3.0's strongest knowledge foundation.

Additionally, GenFlow3.0 built a unique 'internal circulation' mechanism: all AI-generated content can reflow [JWR3] (can we say 'flow back'? ) to the knowledge base, freely dragged, recombined, and reused in creations. Knowledge becomes reusable and ever-improving, achieving true 'use anytime, improve with use.'

Of course, knowledge acquisition isn't the goal—empowering creation is.

Shifting to 'knowledge application,' experience gaps emerge:

While Kuake excels at collection, it suffers a clear 'storage-application' disconnect—its saved content exists mainly as 'bookmarks,' unusable in creations, severely limiting the knowledge base's value. Similarly, ima's strong knowledge foundation is underutilized due to its thin toolchain, restricting knowledge application scenarios.

True closed loops emerge with GenFlow3.0 and WPS.

WPS leverages cloud storage to unify files across phones, computers, tablets, and WeChat, supporting real-time upload/download and forming a standard 'material storage-content creation' loop.

GenFlow3.0 builds a more three-dimensional 'four-layer knowledge system': externally connecting Baidu Wenku and Baidu Academic's professional knowledge; internally syncing data via Baidu Netdisk; automatically backing up browsing history and AI-generated content; plus custom knowledge bases. It also closes the 'collect-store-use' loop. Compared to WPS, GenFlow3.0 offers broader knowledge reserves and application scenarios.

Many call 2025 the first year of AI applications, but by year-end, the sector had already advanced.

On one hand, standalone AI tools rapidly evolve into one-stop Agent creative platforms. On the other, industry dynamics shift—tech giants are displacing startups as primary competitors.

These changes outline AI Agents' next phase:

Previously, Agents survived through differentiation—finding niche segments sufficed. Now, as streams converge, Agents enter 'head-to-head' competition. If the past was 'qualifying rounds,' the present is 'knockout stages.'

Knockout competition is multifaceted, encompassing multimodal models (images, videos, audio, text) and ecological synergies (traffic, entry points). This explains why tech giants dominate—their resources create insurmountable advantages.

Yet, when only giants remain at the table with equal resources, product excellence becomes the true differentiator.

So, what should an AI-era Agent's ultimate form be? I don't know, but I'm certain its goal isn't to complete fixed tasks within rigid workflows.

It should become a "partner" rather than a "tool"; it should not merely serve specific work scenarios but permeate every aspect of your life and creativity.

Just as Fei-Fei Li said, the ultimate mission of AI is to become a capable partner for humanity in tackling major challenges, enabling greater realization of human potential and the creation of a brighter future.

However, this is precisely the biggest challenge facing many products today—many still adhere to a process-oriented mindset, interpreting creativity as preset steps. Yet, it is clear that if AI merely follows procedures, it will ultimately revert to the traditional SaaS path, losing its inherent creativity and potential.

Therefore, the core competitive edge of future Agents no longer hinges on a single "wow factor" in generation but on their ability to truly integrate into complex work environments and remain there over the long term.

For the vast majority, the essence of work is not idle chatting in dialog boxes but the iterative refinement of documents, PPTs, and spreadsheets. If AI stops at one-time delivery and cannot enter these deep editing scenarios, it will ultimately remain peripheral to core workflows.

Hence, we believe that the true next-generation Agent must reconstruct three foundational capabilities:

Omni-Convergence: The ability to orchestrate text, images, data, and presentations in one place, transforming fragmented instructions into a complete workflow;

Memory Accrual: The capacity to retain your preferences and historical outputs, enabling each creation to build upon past accumulations;

Deep Collaboration: The necessity to allow humans to intervene, interrupt, and correct at any time, truly achieving a closed loop from "drafting" to "delivery."

When Agents possess these qualities, competition transcends the parameter battle over model interfaces and shifts to the ecological niche of being a "long-term collaborative partner" for human employees.

In office and creative scenarios, the true Chinese version of the "Nano Banana" will not emerge from a flashy demo but will only arise from the "super employee" of multimodal creation that you rely on daily.

*The images in the text are sourced from the internet

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.