Desktop Agent Unleashed! Alibaba’s QoderWork Handles Odd Jobs, But Only at Intern Level - Cloud - AsiaICT

Home

ICV

Digital Live

Home Finance AI ICV Smart City Digital Live Cloud Optics

Desktop Agent Unleashed! Alibaba’s QoderWork Handles Odd Jobs, But Only at Intern Level

06/11 2026 677

From crafting articles and creating PPTs to building web pages, QoderWork does it all with ease.

"AI Intern, Now Officially on Duty."

Alibaba recently unveiled QoderWork, an extension of its original Qoder code agent capabilities into daily office scenarios. Its core mission is straightforward: desktop AI should transcend mere 'question-answering' and embark on 'task-completion'.

(Image source: QoderWork)

This concept might sound familiar. Tencent’s Mavis, Yuezhi Anmian’s KimiWork, and third-party tools like DeepSeek GUI are all vying to 'overthrow Codex.' QoderWork’s main offerings are also well-known: file organization, data analysis, document generation, research integration, and browser automation—it encompasses everything.

Of course, the primary advantage of such agents over Codex lies in their practicality. QoderWork operates on Qwen, with Qwen 3.7 Max currently available for a limited-time 15-day free trial—a generous offer.

The term 'desktop AI agent' has been bandied about frequently in the past two months, with every tool claiming to 'get work done.' But does it truly deliver? Here are Leitech's findings from using QoderWork.

QoderWork operates distinctly from most AI tools. For instance, on Qwen's web interface, you typically pose a question and receive an answer, which is then recorded in the chat history. QoderWork, however, functions on tasks: you initiate a goal, it breaks it down into several execution steps, and upon completion, it saves the output as a file. The entire task remains in the task list, allowing for traceability, continuation, and monitoring—akin to Wukong.

This might seem like a minor distinction, but it's significant. Take one of our actual tasks as an example. In task mode, items like 'Apple WWDC2026 Article,' 'Leitech Business Introduction PPT,' and 'IFA 2026 Special Webpage' were listed as projects on the left sidebar. Clicking on them allowed us to view execution steps, output files, and make adjustments in the original conversation. With AI chat, once the conversation ends, you're left with some answers—and that's it.

(Image source: Leitech Graphics)

On the right side of QoderWork, there's a 'Task Monitoring' area displaying pending steps, final files, working files, and the skills and MCP capabilities invoked. In the first round of article tasks, the task monitor outlined the entire execution chain: 'Research Leitech's writing style - Gather WWDC 2026 information - Propose topic angles and select a direction - Write a complete article - Generate a Word document.' This at least gives users a rough idea of what the AI did at each stage.

(Image source: Leitech Graphics)

Functionally, QoderWork offers 'Expert Suites,' 'Skill Marketplace,' 'Scheduled Tasks,' and 'App Snapshots.' Expert Suites package capabilities for specific roles, such as legal, product, contracts, investment research, and finance. Installing a complete suite enables immediate use without manual tool assembly. The Skill Marketplace resembles a plugin system, offering in-depth research, data analysis, PPT generation, and Notion infographics. In the second round of PPT testing, QoderWork proactively invoked PPT skills and, upon detecting a missing Node.js environment, asked the user whether to install dependencies. This behavior indicates an awareness of proactively completing the toolchain to advance the task to the final file.

(Image source: Leitech Graphics)

Scheduled tasks are straightforward. Examples include 'Lunchtime Charging Station,' 'Weekly Competitor Dynamics Tracking,' 'Daily Download Folder Cleanup,' and 'Daily Data Report Updates.' These tasks can be set to execute automatically on a regular basis. If stable and usable, they offer more long-term value than ordinary chat assistants. Notably, these scheduled tasks currently require the computer to remain awake to execute; they will fail if the internet is disconnected or the screen is turned off.

(Image source: Leitech Graphics)

Additionally, QoderWork boasts innovative features like App Snapshots. Simply put, it captures the foremost application interface as a screenshot and readable text context, allowing QoderWork to 'see' the interface the user is currently working on. This is where desktop agents truly differ from web-based AI tools—and where the highest permission thresholds lie. Enabling this feature requires granting QoderWork Computer Use, screen recording, and accessibility permissions. The initial authorization process on macOS may take some time.

(Image source: Leitech Graphics)

Overall, as a desktop-level agent still at version '0.5,' QoderWork essentially has all the necessary functions, offering a rich selection of skills and tasks, as well as a well-developed task chain and thought process. What deserves even more praise is the limited-time free Qwen 3.7 Max, likely one of the strongest code models currently available.

We designed three types of tests for it, aiming to closely align with the actual work needs of a tech media editorial department. In the first round, we had it learn Leitech's writing style and fully automatically write an article about Apple WWDC 2026, generating a Word document. In the second round, we had it create a business introduction PPT for Leitech from scratch. In the third round, we had it build a special webpage for IFA 2026 exhibition coverage, ensuring it included code, interactivity, and responsiveness without any omissions.

The first task was for QoderWork to study the writing style of recent articles on Leitech's official website, organize key information about Apple WWDC 2026, complete a draft in line with Leitech's style, and generate a Word document. Information search, style recognition, topic selection, long-form writing, and document delivery essentially constitute the complete workflow of an editorial assistant.

QoderWork successfully completed the task. It analyzed Leitech's writing style, gathered WWDC 2026 information, provided three topic angles, and continued writing after user confirmation of the direction, ultimately generating a Word document. The 'wait for user confirmation' step is particularly noteworthy; it paused at key decision points without proceeding without permission, indicating a certain degree of 'controllable execution' awareness.

(Image source: Leitech Graphics)

The final article, titled 'Siri's Brain Transplant and Rebirth! The Biggest Suspense of Apple WWDC 2026: After Two Years of Catching Up, Can AI Still Win This Fight?', was approximately 3,500 words long, including an introduction, subheadings, opinion judgments, and an interactive conclusion. It strived to emulate a stance-driven tech media piece, featuring short sentence beginnings, colloquial judgments, and a structure centered around core issues.

However, the problems were evident. The article included information requiring strong sourcing, such as '$1 billion annually,' '1.2 trillion-parameter Gemini,' 'macOS Golden Gate,' 'abandoning Intel Mac support,' and 'using third-party AI models as the default conversation engine.' Without reliable public sources, including such content in the main text is a typical AI writing issue. While the draft may look presentable, it does not guarantee factual reliability—a critical flaw for tech media.

(Image source: Leitech Graphics)

In terms of style mimicry, expressions like 'Xiaolei babbles,' 'Apple is finally freaking out,' 'slow as a snail,' and 'breaking it down in detail' appeared with unusually high density, more like deliberate cosplay of the style rather than truly internalizing a judgmental, information-dense writing approach. A truly publishable draft should tone down the colloquial feel and elevate judgment and information density.

(Image source: Leitech Graphics)

The first round could be rated 7.5 points. While it completed a full editorial assistant-level workflow, it cannot yet serve as a responsible editor, as factual verification and risk assessment still require manual oversight.

The second task was for QoderWork to create a business introduction PPT for Leitech from scratch, assuming the audience is potential partners. It was required to search public information, organize media positioning, content direction, audience, and cooperation value, and generate an openable PPT file.

(Image source: Leitech Graphics)

An incident during the process clearly illustrated QoderWork's capability boundaries: it detected a missing Node.js and npm environment, requested the user to install Node.js v20 LTS, downloaded and installed dependencies upon approval, proceeded to install the npm packages required for PPT skills, and finally generated the file. Ordinary AI chat tools typically stop at the 'suggestion layer' when environments are missing, telling you what to install but not proceeding themselves. QoderWork's proactive attempt to complete the toolchain and advance the task to file generation represents a qualitative difference.

(Image source: Leitech Graphics)

The final output was 'Leitech Business Introduction.pptx,' totaling 13 pages, with a structure including a cover, table of contents, 'Who is Leitech,' 'What We Focus On,' 'Content Strengths and Influence,' 'Why Cooperate,' 'Cooperation Methods,' and an acknowledgment page. The PPT correctly identified itself as a business material for partners, with a logically sound structure and a certain design sense in the cover and layout. Cards, chapter pages, and data highlight pages were largely complete. As a first draft generated in about 15 minutes, its efficiency was undeniable.

(Image source: Leitech Graphics)

However, its most regrettable flaw was the absence of Leitech's actual logo on the first page of the business PPT—it used generated illustrations or generic tech visuals instead. Honestly, the lack of a company logo is quite unprofessional for a business cooperation introduction PPT.

Additionally, the table of contents page still had a template residue reading '05 I am the chapter name,' and the last page used English 'Thank you!' These are very basic but glaring flaws, indicating that while it claimed to have verified the PPT, it did not actually conduct a page-by-page check. Data used in the PPT, such as '6 million+ fans across all platforms' and '9 million+ views for a single AWE report,' were claimed to be from public sources but lacked any footnotes or source citations, requiring re-verification for business materials.

(Image source: Leitech Graphics)

The second round also scored 7.5 points. While it successfully created an openable, structurally complete, and visually designed file from scratch, it still fell short of being 'ready to send directly to clients.' However, considering that almost no agent can achieve 100% satisfaction in PPT creation in a single attempt, this result is still acceptable.

True to form, Qwen 3.7 Max delivered impressive results in the third round, creating a special webpage.

The third task was for QoderWork to build a special webpage for Leitech's IFA 2026 exhibition coverage. It was required to reference Leitech's official website exhibition special pages without copying the design, including a hero headline, exhibition introduction, key reports, live updates, galleries, in-depth reviews, and exhibit categories, generating a locally openable static webpage using HTML, CSS, and JavaScript.

(Image Source: Drawing by Leikeji)

First, let's check if our requirements have been met. The page includes 7 sections: the hero section, introduction, key reports, exhibit overview, live updates, gallery, and in-depth commentary. The navigation bar allows for jumping between sections, cards have hover effects, and exhibit categories support switching between 'All, AI Hardware, Smart Cars, Smart Home, Mobile Devices, Robots'. There is no horizontal overflow on both desktop and mobile with a width of 390px, and no console errors. The mobile version switches to a hamburger menu, and the main content of the page displays normally. Zero errors—it's perfect.

(Image Source: Drawing by Leikeji)

The dark tech style, blue highlights, fixed navigation, geometric decorative elements, and card layout are largely complete. More importantly, it includes real, runnable code where functions work and interactions can be triggered, rather than just generating a screenshot. This round comes closest to the expectation of a 'desktop agent helping users complete a frontend task' and represents QoderWork's most solid performance across the three rounds of testing.

If we must nitpick, it still didn't use the real logo, opting for a blue square with an 'L' instead. This is acceptable for a demo but would not be acceptable for a live version. Additionally, the gallery and product visuals heavily use emojis as substitutes, with rows of robots, cars, phones, and headphones displayed. Since no real content is currently live, it filled the space with random articles—a practice that is understandable but not aesthetically pleasing.

(Image Source: Drawing by Leikeji)

In the third round of assessment, I would assign it a score of 8. This round demonstrates that QoderWork's static webpage generation capability is nearing a ready-to-deploy state, surpassing mere drafting or PPT creation in terms of practicality.

Following these three rounds of comprehensive testing, it is evident that QoderWork has made a substantial leap from merely providing answers to actually accomplishing tasks. Nevertheless, the present quality of its outputs may necessitate multiple iterations and refinements before they can be seamlessly integrated into existing workflows.

The notion of desktop AI agents has been a topic of much discussion over the past year. However, products that genuinely give users the impression of "doing the work for me, rather than just assisting" remain scarce. Has QoderWork achieved this feat? Based on the outcomes of these three rounds, the answer is that it is very close, but true hands-free operation is not yet within reach.

At its heart, this issue revolves around authority and responsibility. The underlying logic of conventional AI chat tools is, "I offer suggestions, and you make the decisions." Users are presented with a block of text and must decide for themselves whether to act upon it. QoderWork, on the other hand, endeavors to shift this dynamic to, "I deliver the finished product for you to utilize or modify." This leap is far more profound than it may seem, as "delivering the finished product" implies that the AI must assume responsibility for the quality of the content—ensuring factual accuracy, compliance with formatting standards—and if errors arise, it may need to start anew.

(Image Source: Illustration by Leikeji)

Currently, QoderWork has addressed the challenge of "transitioning from nothing to a preliminary draft" but has not yet conquered the hurdle of "refining a preliminary draft into something directly usable." Of course, it is important to maintain perspective; no agent can yet claim to deliver a 100% usable product in a single attempt.

Therefore, we prefer to characterize QoderWork as a desktop "AI intern." It is capable of getting things done, but not necessarily with finesse. It significantly reduces initial time investments—for instance, when crafting an article, you at least do not have to gather information piece by piece. As for when it will evolve from "capable of producing a preliminary draft" to "reliable for final delivery"? That may be a question that only time can answer.

Alibaba QoderWork Codex Agent Desktop Agent

Source: Leikeji

The images featured in this article are sourced from the 123RF Licensed Image Library. Source: Leikeji

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links

Copyright © 2016-, All Rights Reserved.