Experiencing Tencent's Marvis: How Close Are We to Emulating 'Iron Man'?

05/26 2026 340

Image sourced from Tencent's product Marvis © YouJie UnKnown Original

Recently, Tencent unveiled a new AI product, Marvis, which has sparked widespread discussions within the industry.

The name Marvis pays homage to Jarvis—the trusted assistant of Iron Man and one of the most iconic AI characters in popular culture.

Fans of Iron Man or Marvel will undoubtedly recognize the immense power of Jarvis as an AI assistant.

Jar actsvis swiftly to grab a fire extinguisher when Iron Man's experiments go awry (everyday adaptability); it calculates methods for time travel through wormholes with just a single directive from Iron Man (scientific research capability); and it even dials Iron Man's lover's number when he is in critical danger (emotional companionship)...

Jarvis is truly an all-powerful assistant, covering everything from daily life to work. It genuinely understands and anticipates its master's needs, actively assisting in decision-making when appropriate.

It's safe to say that anyone familiar with Jarvis would desire such an assistant for themselves. Hence, this character has become the ideal archetype for AI assistants.

By naming its AI product Marvis, Tencent's intentions are clear—it aims to create an AI product as versatile and human-like as Jarvis.

This is undoubtedly a commendable idea, but can Tencent achieve this goal? Or, to put it another way, has Tencent taken a solid first step toward this objective?

Marvis: A Deliberately Designed 'Lifelike' Persona

To gauge the reality, we thoroughly tested the Marvis product.

Perhaps to make Marvis more akin to Jarvis, every step—from accessing the product page to installing it on the computer—exudes a deliberate sense of 'lifelikeness' crafted by Tencent.

First, Marvis's image: a horse wearing a red scarf, referencing the Year of the Horse, Tencent's Pony Ma, and the self-deprecating term 'cattle and horse' used by countless workers today. This setup immediately aims to bridge the gap with users.

Next, in Marvis's initial interface, Marvis is seen replenishing its Tokens while undergoing 'onboarding procedures.'

This scene is very 'Tencent.' In reality, when Tencent employees complete their onboarding, they also go through a service platform called 'Red Scarf.' The saying goes, 'Put on the red scarf, become part of Tencent.' Now, Marvis has a similar onboarding ritual.

But Marvis's 'lifelikeness' doesn't just come from the little horse. It also stems from the entire office setup built for its Agents.

In this virtual office, 'boss' Marvis sits at the front, with five team members behind: App Agent, File Agent, Computer Agent, Browser Agent, and Search Agent.

Each is responsible for different tasks but is presented not as functional modules but as coworkers going about their day—some snacking in the pantry, others working out in the gym, and some leisurely scrolling on their phones or freely taking tissue in the restroom.

Occasionally, they visit each other's desks, watch colleagues play Honor of Kings, and exchange a few words. The entire office is bright, clean, and the employees are neatly dressed, resembling a genuine corporate environment.

From an external perspective, it seems like the product manager has brought Tencent's corporate culture and office routine online, with Marvis the little horse acting like a true Tencent employee.

However, when we discussed this with friends at Tencent, they said this clearly isn't the real Tencent, as actual workers aren't this relaxed.

They joked that this might be Tencent as seen by its bosses or perhaps the ideal Tencent in the minds of some employees.

But jokes aside, no matter how 'lifelike' the product's promotional atmosphere may be, Agent products ultimately must address a fundamental question: Can they get the job done?

Testing Marvis's Capabilities

So, how capable is Marvis in reality?

Before formal use, we saw a lot of hype. Given its positioning as a system-level AI assistant, we had high expectations for Marvis's capabilities.

Thus, our first task was to have it retrieve stored images on the computer and use Photoshop to create a cover for a WeChat Official Account article.

We set a prompt:

'Help me open the provided PSD file, replace the background image layer with the new image I provided, keeping the original PSD's canvas size, layer structure, text, effects, and layout unchanged. After replacement, check the image position, proportion, and cropping for correctness, avoiding distortion or misalignment. Finally, export a high-quality JPG file and save it to the computer's [Downloads] folder.'

We expected a result like this, which is what we normally use:

But what Marvis actually delivered was this:

Logically, as a system-level AI assistant, Marvis should have used Computer Agent to directly open the PSD file, replace the image with the one I provided, and re-export it.

However, during actual execution, Marvis chose File Agent and took a long time.

I specifically confirmed with Marvis, but it insisted on using File Agent. Yet, the actual output remained unsatisfactory.

If the PSD test assessed local software invocation capabilities, the next test examined its ability to perform a series of continuous operations in a web environment.

I attempted to have it search for and open SpaceX's prospectus, focusing on extracting its equity structure, revenue structure, core business proportions, and recent revenue, profits, and main growth businesses.

Marvis dispatched Browser Agent to search online and delivered a table listing each item. However, upon closer inspection, a ridiculous error emerged—key data in the local table were missing their 'beginnings':

For instance, SpaceX's annual revenue for 2025 was listed as $18.7 billion, but the Excel table showed '$8.7 billion.'

Marvis essentially completed the task, but the delivered result had significant issues...

However, during this process, I also reflected on whether my request itself exceeded the product's capabilities.

So, in subsequent tests, I shifted from asking 'what I want it to do' to 'what it can do.'

From Marvis's guided functionalities, its default emphasis on operating system-level capabilities primarily focuses on three areas.

The first category is computer settings and system checks, such as inspecting battery health, reviewing the status of apps on the computer, and reminding which software may need updates.

The second category involves categorizing and processing local documents and galleries. For example, it can attempt to identify file types on the computer and reorganize materials by purpose. However, some 'puzzling operations' occurred in this experience, such as categorizing some Morgan Stanley research reports as 'resumes.'

The third category is scheduling standardized tasks. For example, setting an automatic daily 5:00 PM check for GitHub's trending projects of the day or subscribing to update reminders for 'Hahahahaha 6.' These tasks are essentially 'reminders + fixed processes,' with clear paths and actions, making them suitable for Agent execution.

So, within this scope, how well does Marvis perform?

First, try the system-set task ['Can't Remember Mac Shortcuts']. After clicking, the task was completed in seconds, generating a quick-reference image:

System-set tasks lack challenge, so what about newly created tasks within this scope?

I asked Marvis to check the installed office apps on my computer, identify which ones weren't updated to the latest version, and attempt to update some of them.

I provided a prompt:

'Please check the office apps installed on my computer and identify which ones aren't updated to the latest version. Focus on software including Office, WPS, Feishu, DingTalk, WeCom, Tencent Meeting, Zoom, Notion, Obsidian, Adobe Acrobat, OneDrive, Google Drive, Dropbox, and other office, collaboration, document, and productivity tools. Only check versions; do not automatically update, uninstall, or modify settings. Finally, list the app names, current versions, latest versions, whether updates are needed, check channels, and remarks in a table; mark items that cannot be confirmed as 'Pending Manual Confirmation.'

Soon, Marvis provided a diagnostic report:

It reminded me to update ChatGPT before June 12, so I asked Marvis to download it, but the result was unsatisfactory—Marvis had almost no browser invocation rights.

After multiple failed attempts, it provided manual download advice:

From these tasks, Marvis's capabilities need to be viewed from two dimensions:

On one hand, for system-set fixed tasks, Marvis has high completion rates; on the other hand, once tasks enter open environments, its performance becomes lackluster.

This can also be understood as Marvis being essentially similar to lobster products on the market, requiring the encapsulation of skills or workflows to complete tasks.

Judging from our results, the issue Marvis brings us is that the initial promotion has raised our expectations, but its actual capabilities fail to deliver on the promised outcomes, leading to a significant psychological gap during our testing process.

However, beyond this psychological gap, the value Marvis provides to users is actually quite basic, even somewhat redundant:

For example, document retrieval and data organization—there are numerous data management and retrieval tools available on the market that are far more efficient and effective than Marvis. The same goes for task breakdown and content generation; there's no need to compare it with international products—even Tencent's own Workbuddy is far more polished than Marvis.

What sets Marvis apart from similar lobster products on the market is that it isolates this capability and packages it as a clearer product selling point. However, the actual experience hasn't proven it to be better than others.

So, after experiencing it, I actually have a question: Why did Tencent make such a bold claim but deliver such a product? What is the significance of Marvis?

Marvis: A Productized OpenClaw?

From a promotional standpoint, Marvis is positioned as an operating system-level AI assistant, meaning it can directly operate the user's computer based on instructions and help complete tasks.

Does this description sound familiar? Yes, the previously popular OpenClaw (Lobster) was marketed in a similar way.

In fact, from a product-level goal perspective, Marvis and Lobster are aligned, which is why many of their capabilities/functions are very similar, such as directly operating the user's computer and helping complete tasks.

The difference lies in that OpenClaw is a framework that gives users more choices and control, requiring them to deploy the Gateway themselves, connect various chat channels, and then configure models, tools, plugins, and permissions.

Marvis, on the other hand, is more like a fully productized Lobster—truly ready to use out of the box.

For example, Marvis downplays the concept of models.

There are no model options in the interface, and users don't need to switch between GPT, Claude, Tongyi Qianwen, or MiniMax. It's as if it has pre-selected a base model for you, most likely Tencent's own Hunyuan large model. In contrast, many Lobster products prominently display

So, perhaps the core value of Marvis at present lies in its transformation of the concept of "AI simulating human operation of a computer system" into a tangible, experienceable product.

It points users in a clear direction: Future AI assistants will not merely respond to queries within a chatbox; instead, they will be capable of navigating the computer, comprehending data, and executing tasks. However, based on the current experience, it appears that Marvis is more about securing a foothold in this direction, with true breakthroughs in capabilities yet to fully materialize.

Why Didn't Marvis Evolve into 'Jarvis'?

Let's take a moment to reflect on the experience of using Marvis.

To be candid, prior to using it, the "operating system-level" promotion of Marvis did set our expectations high.

Before diving in, my friends and I discussed that this was the direction Agent products should genuinely pursue—building upon the shortcomings of OpenClaw, packaging it into a cohesive product, lowering the barrier for users, and enabling AI to do more than just engage in chat, call tools, or execute pre-packaged processes. It should genuinely immerse itself in the computer environment, open software, process files, navigate web pages, and accomplish tasks continuously, just like a real person.

Given this, our expectations were not merely for another AI assistant but for something that could propel the concept of "simulating human computer operation" beyond the existing products on the market.

However, the actual experience revealed a considerable gap between our expectations and the reality.

Of course, this is not solely a Marvis issue. The challenges it faces are also indicative of the directions the entire Agent market is striving to overcome: How to make AI not just respond to questions or call tools but truly excel in real operational environments.

The first hurdle is application permissions.

For Marvis to function as the "subtenant" of the computer, users must grant it local permissions, enabling it to manage local files, monitor system status, and organize desktop data.

Yet, much of today's data is not stored locally but in platforms like WeChat, Evernote, Feishu, Tencent Docs, cloud drives, and email. A truly effective Agent should be able to access these daily applications, locate, read, and organize scattered information.

The reality, however, is that WeChat is not on Marvis's permission list.

Nor is Evernote operable through Marvis.

Interestingly, Marvis exhibits a tenacious "workhorse spirit." Despite lacking permission to access certain apps, it provides a "camera" feature for me to capture photos of relevant pages and continue with recognition and processing.

This is akin to taking the shortest path between two points but encountering tolls at every intersection, necessitating constant detours. The task may eventually be completed, but efficiency and the overall experience suffer.

For Agents to be genuinely useful, they must seamlessly integrate commonly used applications as tools. However, the interfaces are controlled by various platforms. WeChat, Evernote, Taobao, and Alipay are unlikely to readily open up their ecosystems. Major platforms prefer to construct their own ecosystems rather than relinquish entry points.

The second challenge lies in the technology itself.

An Agent operating within a computer is akin to a humanoid robot performing household chores: It can function effectively in standardized environments but encounters innumerable obstacles in generalized ones.

Computer interfaces do not present structured data. Humans naturally recognize buttons, input fields, mandatory pop-ups, and document-like files when viewing a screen. But AI first encounters a screenshot. It must interpret the screenshot into an operable interface structure before deciding where to click, what to input, or which file to open.

Achieving success with a single click is not difficult; the challenge lies in performing dozens of consecutive steps flawlessly. Searching for files, filtering dates, judging topics, copying data, reading content, generating documents, and saving to the desktop—any mistake will accumulate deviations.

Marvis envisions a promising future: AI enters the computer, becoming a new intermediary between users and the operating system. However, to truly become a "Jarvis" within the computer, it must overcome challenges related to application permissions, ecosystem interfaces, and long-task stability.

Conclusion

In the era of AI, Tencent has always been expected to aim for the stars.

Pony Ma stated at the May 2023 shareholders' meeting: "For opportunities as significant as the Industrial Revolution, whether you introduce the lightbulb a month earlier or later doesn't hold much weight in the long run."

By the May 2026 shareholders' meeting, Pony Ma once again responded to the outside world's claims that Tencent's AI is "lagging": "A year ago, we thought we were on board, but then we realized the ship was leaking. Now we feel like we've stepped on board but can't sit down yet. We still hope the ship can speed up."

He also mentioned: "The company once blindly followed trends into non-core areas, chasing various hot sectors, but most ended in failure. Facing this wave of AI development, we remain rational and clear-headed, determined to avoid past mistakes."

From these remarks, it is evident that Tencent's AI strategy has always prioritized "stability": not rushing to be the first to make noise but hoping AI becomes a "multiplier" for businesses, solving problems in specific scenarios.

However, by 2026, Tencent's AI moves on the consumer side have become noticeably more frequent: "Yuanbao" launched AI social features during the February Spring Festival; "Lobster Array" was introduced in March, with WeChat opening the ClawBot interface; the AI interactive movie game "DreamNow" launched in April; and Marvis in May.

So, Tencent is not devoid of anxiety about AI. It just packages it more restrainedly.

This anxiety is not hard to fathom. The reality is that the entire AI industry faces the same issue: More products and entry points exist, but truly game-changing, habit-altering killer apps have yet to fully emerge. As the Marvis product manager candidly put it, "To be honest, we really don't have a killer feature right now."

This statement also clarifies Marvis's position. It is not a product Tencent launched after finding the definitive answer but more like pushing a possible direction to users before the answer is clear: Let AI step out of the chatbox, enter the computer, and take over files, applications, and tasks.

*Images sourced from the internet

- END -

Welcome to add WeChat: cyxx-z

Join the 'YouJie UnKnown' tech community

'The Sin and Punishment' of Falling in Love with AI: Why Image2+ViduQ3 is the Ultimate Combo for Cost-Effective Video Production: Can AI Replace Actors?

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.