Facing the wall and firing the first shot for the "Edge-side Agent", the five domestic large model startups are taking different paths

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

04/01 2025 472

Sending the model to work is what an Agent does.

If you have been following the AI circle recently, it is almost impossible to avoid the term "Agent". After the popularity of Manus AI, which can independently build websites, conduct stock analysis, and arrange travel schedules, more companies at home and abroad have begun to publicly bet on "AI Agents".

Miranda Nash, Vice President of Oracle AI Group, said in an interview that AI Agents will become the focus of attention for technology companies this year. Zhang Peng, CEO of Zhipu, released the new agent AutoGLM Reflection at the recently held 2025 Zhongguancun Forum Annual Meeting and said that 2025 will be the year of the explosion of AI Agents.

Not just Zhipu and AutoGLM Reflection, the wave of agents at the 2025 Zhongguancun Forum Annual Meeting also had a landmark endpoint—Facing Wall Intelligence officially released the "Mini Cannon Super Assistant cpmGO", claiming to be the world's first pure edge-side intelligent assistant deployed on automotive devices, and also a technical practice of Facing Wall Intelligence's vision of "Model as Agent".

Image/Facing Wall Intelligence

In particular, this small, intelligent assistant does not run on the cloud but is fully deployed on automotive devices, capable of independently completing the entire process of perception, understanding, decision-making, and execution in offline and weak network environments. Unlike the voice assistants we are familiar with, Facing Wall Intelligence claims that the "Mini Cannon" can see, hear, speak, and even help you control the car, watch the screen, recognize people, recognize the cabin, and even keep watch over the car.

As a domestic large model startup founded in 2022, Facing Wall Intelligence has always taken "efficient large models" and "edge-side deployment" as its technical mainstay. Its MiniCPM series of models are currently applied to multiple terminal forms such as AI phones, AI PCs, and smart cabins.

At this Zhongguancun Forum, Li Dahai, co-founder and CEO of Facing Wall Intelligence, also quoted Alexander Doria, a well-known AI engineer and co-founder of Pleias, saying that the model itself, rather than the workflow, is the future direction of AI agents—Model as Agent, Model as Product, Model as Interaction:

"Sending the Mini Cannon edge-side model to work is what an Agent does.""

Li Dahai, co-founder and CEO of Facing Wall Intelligence, Image/Zhongguancun Forum

Li Dahai also revealed that the Mini Cannon Assistant will be officially launched with vehicles in Q3 this year, and some vehicle manufacturers and Tier 1 suppliers are already conducting tests. In this race of agents, Facing Wall Intelligence has taken the lead in firing the first shot for domestic edge-side agents with an AI agent that can "get on board".

Facing Wall's "Mini Cannon" getting on board is not just a voice assistant

Compared to traditional in-vehicle voice assistants that can only listen to you say "turn on the air conditioning", the Mini Cannon Super Assistant cpmGO is significantly more complex.

As a pure edge-side model and super assistant deployed on the vehicle side, the "Mini Cannon" does not rely on cloud reasoning but is locally run by Facing Wall Intelligence's self-developed MiniCPM-o full-modal model, forming a complete agent closed loop from "perception → understanding → reasoning → tool invocation → actual execution".

Firstly, it can "see". Based on cameras inside and outside the cabin, the "Mini Cannon" can recognize visual information such as passenger gestures, faces, and external dynamics, combined with microphone arrays for voice perception, to build a powerful multimodal perception system. It can respond to your hand gestures and head turns.

The "Mini Cannon" can also "understand" and "make decisions", not only recognizing command intentions but also understanding context. For example, when you ask the central control screen "How do I brighten this?", it can understand that "this" refers to screen brightness and jump to the corresponding interface to perform the adjustment operation.

Image/Facing Wall Intelligence

It is worth mentioning here that Facing Wall created the first purely edge-side deployed, Always On "GUI Agent Screen Assistant" in December last year, so it can not only see and speak but also act, realizing "what you see is what you can say" on the screen. On this basis, the "Mini Cannon" has a certain degree of "tool invocation capability", which can not only realize generalized voice vehicle control, intelligent sentinels, children and pet monitoring but also automatically complete a series of operational tasks directly on the in-car or mobile phone screen.

Especially in the vehicle scenario, it can avoid touching the screen with your fingers, directly navigate through voice commands, and even ask the Mini Cannon to take photos and share them with friends. More importantly, all this can operate in weak or even no network environments, truly meeting the rigid requirements of edge-side agents for low latency and strong privacy.

But the greater significance of the "Mini Cannon" lies in the fact that it already possesses the core three elements of an "agent": autonomous perception, intention judgment, and tool invocation. It is not a script executor for preset tasks but an intelligent agent and AI assistant that can understand user context, make reasonable judgments, and complete tasks—of course, currently, this "assistant" may still be more execution-oriented than planning-oriented.

At least from the current information and demonstrations, Facing Wall's "Mini Cannon" can work but is not yet good at "actively arranging tasks", and it also lacks the personalized settings and emotional expression as a "virtual driving assistant". At the same time, a large number of scenarios have not yet been widely validated in the market. In addition, although Li Dahai also mentioned in the evolution that a major common problem of current Agent intelligent agents is "poor long-term contextual memory", the "Mini Cannon" currently has not demonstrated the ability to solve this challenge.

Image/Facing Wall Intelligence

In other words, Facing Wall has solved the problem of whether an agent can "run locally" and "work stably" with the "Mini Cannon", but the question of whether an agent is "smart enough" and can "continue to grow" may still need to be verified and explored.

Five domestic large model startups, different approaches

If Manus has sparked a new wave of discussion on "agents", then among domestic large model startups, who is truly driving the implementation of agents? And who is taking different paths?

At present, it seems that the five representative domestic large model startups—Facing Wall Intelligence, Zhipu AI, Baichuan Intelligence, Dark Side of the Moon (Kimi), and DeepSeek—have gradually shown their clear and unique approaches.

Facing Wall Intelligence: All in edge-side, ready to go to work

Image/Facing Wall Intelligence

Facing Wall's strategic keywords can be summarized in three points: "lightweight model, pure edge deployment, scenario implementation". The positioning of the MiniCPM series has never been to compete with cloud-based large models but to pursue small models that can run locally and work reliably on terminals such as AI phones, cars, and AI PCs.

Compared to most peers who are still demonstrating and publishing papers on the stage of "what is an agent", Facing Wall Intelligence can be considered as a faster vendor in implementing "agents", having already deployed them on vehicles, supporting real-time, multimodal, and offline operations, and truly entering real usage scenarios.

But this also means that it must face the "heavy lifting" of engineering challenges, chip adaptation, and commercial validation, and the path is not easy.

Zhipu AI: Technology platform faction, B-end + model ecosystem

Image/Zhipu

As one of the earliest large model companies, Zhipu follows the route of a large model platform + enterprise services, emphasizing the full-chain capabilities from pre-trained models, to industry datasets, to vertical applications. In terms of agents, Zhipu currently emphasizes "growing agents from large model capabilities" and pays more attention to versatility and ecosystem construction.

At the same time, Zhipu is also one of the earliest domestic vendors to reach the stage of "agents", successively launching the GLM-PC agent, AutoGLM (mobile) agent, and the latest AutoGLM Reflection agent. Among them, the usage scenario of AutoGLM Reflection is similar to the previously popular Manus, capable of industry research, shopping recommendations, lesson plan/tutorial creation, travel planning, etc.

The difference is that behind AutoGLM Reflection lies Zhipu's self-developed full-stack large model technology, integrating the general capabilities of GLM-4, the reflective capabilities of GLM-Z1, the meditative capabilities of GLM-Z1-Rumination, and the automatic execution capabilities of AutoGLM.

Baichuan Intelligence: Commercially application-oriented, All in medical large models

Image/Baichuan Intelligence

Baichuan Intelligence, founded by Wang Xiaochuan, the founder of Sogou, moves quickly but chooses a more focused direction. Starting with the early general Baichuan series of large models, it quickly advances to vertical commercial scenarios and even expresses its commitment to All in medical, creating medical large models and agents.

Unlike Facing Wall, Baichuan takes the combined route of "model + knowledge graph + industry expert system", placing more emphasis on the customization of service capabilities. Simply put, for Baichuan, an agent is more like an "execution engine in vertical fields" rather than a general platform.

Dark Side of the Moon (Kimi): Model as a Service, focusing on C-end assistants

Image/Dark Side of the Moon

Compared to the other companies, Dark Side of the Moon has taken a completely different route, focusing almost all its efforts on "Model as a Service", specializing in the vertical integration of large models into C-end products—Kimi's vertical integration and user experience refinement.

Although Dark Side of the Moon has not yet launched a true agent, it is believed that this is already on the way. Fundamentally, the AI intelligent assistant that Dark Side of the Moon wants to create cannot rely solely on excellent dialogue capabilities. An agent capable of learning and replacing humans in completing tasks should be said to be an inherent part of the equation.

On the other hand, the difficulty in creating an agent lies in reasoning ability, memory ability, and tool invocation ability. Reasoning is currently the general trend for all large model vendors, including Dark Side of the Moon, while memory (long context) is one of Kimi's main advantages. It can be said that the only thing Kimi lacks is tool invocation ability, but this is also a challenge faced by the industry as a whole.

DeepSeek: Using open-source large models to drive the explosion of agents

Image/DeepSeek

Strictly speaking, DeepSeek has not launched its own agent, but this does not prevent DeepSeek from playing a pivotal role in the field of agents today. In particular, the open-source DeepSeek R1, with relatively low cost and relatively high reasoning and thinking capabilities, is almost the most popular "base model" for creating agents at present.

From a strategic perspective, DeepSeek is not in a hurry to launch its own agent products. Judging from DeepSeek's current actions, they are still more focused on continuously optimizing the efficiency and reasoning capabilities of large models at this stage, building an "affordable, customizable, and scalable" underlying capability platform.

The best example is the large model startup ZeroOne, which decided to build the Windows of the AI era (system) based on DeepSeek (core) after its strategic transformation. In fact, many agents, including Manus, are built based on base large models such as DeepSeek or Claude.

Looking up, is Facing Wall Intelligence's choice a good path?

From a horizontal comparison, Facing Wall is the only one of the five companies that has deployed a pure edge-side agent to edge-side physical devices and formed a closed-loop interaction.

But for Facing Wall, "All in edge-side" may not be a short-term marketing label but a technical path with little room for turning back once chosen. The benefits of this path are clear: stronger privacy protection, lower latency, higher deployment flexibility, and closer to the computing power reality of smartphones, cars, and other terminal devices.

However, its difficulties are also clear—the model must be lighter, more stable, more accurate, and better at "working". The most critical thing is to face the problem of limited edge-side computing power and complete complex tasks through agents. So we see that Facing Wall Intelligence emphasizes its engineering capabilities:

MiniCPM is lightweight and efficient, the GUI Agent Screen Assistant has been implemented, cpmGO can operate without a network in the car, and perform full-chain tasks.

Deployment is not the end. Can agents handle variables in task chains? Can they grow with changes in user habits? These are challenges that agents must complete independently, not to mention pure edge-side agents. A more realistic question is—what is the experience like? This is something that Facing Wall needs to answer.

Source: Leitech

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links