04/13 2026
392

Author | Lianyi Zhang
Since spinning off from DJI in 2024, many have asked Shen Shaojie: Why? "It's because it's a bit different," Shen Shaojie, CEO of Zhuoyu Technology, replied during an interview after his speech at the High-Level Forum on Intelligent Electric Vehicle Development (2026) on April 11. "DJI focuses on B2C with short product cycles; automotive is B2B, requiring high safety, long cycles, stable supply chains, and is influenced by geopolitics." Thus, independent operations became a natural progression.
Of course, there was also a practical consideration behind it: financing and going public. "This is actually a very natural progression—any company would follow a similar path," he said.
After independence, Zhuoyu Technology redefined itself as a "mobile physical AI company." Shen believes this is the future for all intelligent driving companies.
Zhuoyu Technology CEO Shen Shaojie
He boldly predicts that over the next two years, the global intelligent driving industry will gradually move away from its current fragmented, region-specific delivery models due to advancements in core technologies and fully embrace foundational models to achieve unification. "To add another bold prediction: intelligent driving is merely the most initial form of physical AI, not the final destination. Even companies like ours that survive in the future will transition into mobile physical AI companies," Shen stated, emphasizing that this is not a strategic judgment but a survival instinct.
He hopes that mobile physical AI can enable intelligent mobility for everything, whether passenger or commercial vehicles, L2 or L4 autonomy, and even the broader robotics industry, through a combined solution of model software and automotive-grade, highly reliable hardware to truly achieve autonomous mobility for all things. "These are not just empty promises—they are happening."
01
Report Card: Expanding Passenger Vehicle Market, Breaking Through in Heavy Trucks
Let's look at the report card first.
In the passenger vehicle segment, in addition to the nine clients disclosed last year, three more have been added. Shen specifically emphasized that these are three new "client entities," not just three new brands—"they are bigger." However, he declined to name them for now.

Over 50 mass-produced models have been launched, with the cumulative number of designated models reaching three digits.
Shen summarized the current model matrix with four phrases: "unified quality for ICE and EV," "shared core for cabin and driving," "equal excellence in parking and cruising," and "synchronized global intelligence." Among them, "synchronized global intelligence" means that whether it's domestic or joint venture brands, ICE or EVs, they can all achieve the same level of intelligence.
In April this year, Zhuoyu's latest "Highly Perceptive End-to-End 4.0" version began rolling out gradually. Shen called it "100% end-to-end flavor"—balancing efficiency and safety, capable of navigating narrow and complex roads efficiently while incorporating dynamic traffic flow into model processing for smarter navigation.
An even bigger breakthrough came in the commercial vehicle sector.
All six of China's top heavy truck companies have become Zhuoyu's clients. The first heavy truck model will enter mass production in June this year, with a total of a few dozen models to be delivered sequentially from June this year to the first half of next year.

"Overall development, including acquiring new clients and securing new model designations, has exceeded my expectations," Shen said.
The heavy truck solution uses the same controller computing power as passenger vehicles but adds something unique: Zhuoyu's self-developed "Laser-Vision System"—a fusion module of LiDAR and vision installed inside the cabin. Depending on the installation position, it can achieve detection distances of 300 to 400 meters. Given heavy trucks' large inertia and long braking distances, seeing far is a prerequisite for safety.
"More importantly, this solution can meet stringent standards. I believe it's also the industry's first commercial heavy truck solution capable of meeting such standards," Shen said. Meeting stringent standards means it's not just a demo—it can safely operate on real roads. Besides safety, intelligent driving for heavy trucks offers another significant value: fuel savings. By accelerating slowly uphill and braking less downhill, it can simultaneously reduce fuel consumption and brake wear.
The rapid breakthroughs in commercial vehicles are backed by the implementation of the "mobile intelligence foundation" concept: reusing a single technological core across different scenarios.
Shen further explained this.
The first layer is a strong foundational model. The same model runs in both passenger and heavy trucks, designed from the outset not for a specific vehicle type. Its driving style tends to be smooth and predictive rather than abrupt. "This may be subjective in passenger vehicles, but it becomes crucial in heavy trucks. The model's characteristics naturally adapt to such transitions," Shen said.
The second layer is strong engineering capabilities. How to adapt control algorithms to vehicle weight variations by dozens of times, plus different combinations of tractor and trailer weights? This cannot be solved by models alone—it requires engineers' hard skills. "We are essentially a company that understands how to control robots," Shen said, attributing this capability to accumulated experience since building robots.
The third layer is hardware advantages. Commercial vehicles use a 24V electrical architecture, while passenger vehicles use 12V; commercial vehicles require continuous operation lifespans, unlike passenger vehicles' intermittent use. "We have the hardware capabilities to handle these differences directly," Shen said. For companies lacking hardware capabilities, disputes would arise before even starting work.
"All these together form a systematic capability," Shen summarized. This capability determines that once new vertical opportunities arise, Zhuoyu can quickly enter them.

Of course, not all tracks are run at the same speed. "For example, we are latecomers in Robotaxi," Shen admitted. But he believes being a latecomer has advantages—skipping the early high-investment phase of requiring HD maps for city deployment and jumping directly to deploying at lower costs using native multimodal foundational models. Zhuoyu will officially launch trial operations with partners in Robotaxi and autonomous logistics vehicles in July this year, achieving unified adaptation across multiple verticals from L2 to L4.
While the technological core is reusable, business models differ. In Shen's view, for passenger vehicles, Zhuoyu will either act as a Tier 1 itself or partner to form a Tier 1; for commercial vehicles, it will directly serve as a Tier 1. In the L4 domain, Zhuoyu's role will be solely as a technology provider, operating and sharing profits with partners.
02
Next Step: Native Multimodal Foundational Model on Vehicles
If the past year was about "scaling up" in passenger and commercial vehicle sectors for Zhuoyu, the next focus will be upgrading the technological foundation.
Shen revealed that Zhuoyu Technology will officially release its next-generation architecture—the native multimodal foundational model—at this year's Beijing International Automotive Exhibition. This model fundamentally differs from current mainstream end-to-end solutions.

To understand it, one must first look at current mainstream "end-to-end" solutions. Shen calls them "medium models"—with tens to hundreds of millions of parameters, requiring large amounts of high-quality driving data for training. If encountering scenarios not in the training data, they cannot generalize on their own.
He used overseas deployment as an example. The biggest difference between China and Germany is not traffic signs but driving styles. China has weak road rights awareness—"whoever squeezes in goes first"; Germany has strong rule adherence. Taking a model trained in China directly to Germany "would work but get heavily criticized." To solve this, data from 30 people in Germany would need to be collected for a year to retrain the model. "It works but at a cost." Repeating this for every country would be financially unsustainable.
The same issue applies across scenarios: adapting from passenger to heavy trucks would require redoing everything for each new scenario using medium models.
The approach of a "native multimodal large model" is different. During pre-training, besides intelligent driving data, it also ingests internet data, mobile robot data, and even videos of people walking with cameras. "Pouring in as much knowledge as possible" allows the model to learn universal patterns of the physical world on its own. In new scenarios, only a small amount of data is needed to "activate" it for use.

Shen used an analogy to explain: if different solutions are compared to different types of students, a "medium model" is like a moderately intelligent child who performs well on exams if the questions have been practiced extensively but struggles with unseen questions. A "large model" is like a truly top student who, given a few textbooks or even miscellaneous books, can figure things out on their own and easily get into a 985 university.
Shen acknowledged that this direction was not invented by Zhuoyu. "Tesla is definitely ahead; currently, the industry has achieved this with Tesla FSD V14 and XPENG VLA2.0. But cross-vertical adaptation hasn't happened yet."
Zhuoyu's goal is to actually implement this model.
The timeline is set: this year, the model will be deployed in passenger and heavy trucks; two weeks later at the Beijing Auto Show, external experiences will open; in July this year, trial operations will begin with partners in L4 Robotaxi and autonomous logistics vehicles.

Regarding industry discussions about "skipping L3," Shen's stance is clear: "I agree."
He gave two reasons.
First is liability division: if consumers are required to take over within 10 seconds but are asleep and cannot, who bears the responsibility?
Second is technological: concepts like L2, L3, and L4 were proposed decades ago before large models existed. Now it's confirmed that using models with native multimodal and emergent capabilities, plus remote operations, safety fallbacks, and sensor redundancy, "L2, L3, and L4 can essentially share the same origin." Since they share the same origin, the same technology can be used for two states: existing L2 and L4.
As for when L4 will become widespread, Shen believes it already has—you can hail autonomous Robotaxis on the road. The issue is not capability but cost. As model capabilities improve and generalization costs decrease, the cost line will drop, and adoption will accelerate.
He also envisioned a hybrid model: a vehicle with both L2+ and L4 capabilities. It would operate in L2+ mode in uncertified areas; upon entering certified areas, a cloud monitoring platform would take over, turning it into a "personal taxi" billed by distance. "It doesn't have to operate purely as a Robotaxi."
03
Supply Chain, Costs, and a Survival Judgment
Although Zhuoyu emphasizes winning markets through technology, outsiders have labeled it a "price killer." During the interview, Shen dissected this label.
If "price killer" means only selling cheap, low-end products, "I definitely don't accept that," he said.
His judgment is that good products must cover a full matrix—different price points and scenarios require different solutions. A clear trend now is that as model capabilities grow stronger, more hardware will be needed, but performance improvements will outpace price increases. "Undoubtedly, the products we sell will become more expensive, but performance gains will definitely outpace price increases," he said.
If "price killer" refers to supply chain control capabilities, "I accept that," Shen said.
Zhuoyu's self-developed inertial navigation triple-camera system
Shen pointed out an industry reality: among new-generation intelligent driving companies, very few truly possess integrated software-hardware capabilities and supply chain control. Most are pure software companies that, when facing supply chain fluctuations, can only ask partners to raise prices.
How does Zhuoyu handle this? Taking the recent memory price hike as an example, "we stockpiled a large amount in the second half of last year to ensure no delivery issues this year," Shen said.
When asked about "intelligent driving democratization," Shen's view was counterintuitive. He believes the essence of democratization is not making intelligent driving cheaper but increasing its proportion in overall vehicle costs. A 100,000-yuan car might have only allocated 2,000 yuan to intelligent driving before. Now, as acceptance of intelligence grows, this proportion changes. But hardware usage is actually increasing.
"If you use more things, it can't be cheap," he said, echoing his response to the "price killer" label: Zhuoyu's core competitiveness is not low prices but providing higher performance at lower costs.
Regarding whether to follow some automakers in self-developing chips, Shen answered negatively. "The number of reliable edge chip suppliers exceeds that of reliable intelligent driving solution providers," he said. Partnering or even deeply customizing with suppliers is more cost-effective than doing it oneself.
But underlying all of Zhuoyu's strategies is an even more fundamental judgment.

"Intelligent driving is actually just the most initial form of physical AI, and by no means the final destination. In the future, companies like ours that can survive will all transform into mobile physical AI companies. I want to emphasize that this is not merely a strategic judgment, but rather a survival judgment."
His logical chain is clear: model capabilities are growing stronger, and training costs are rising, reaching billions of dollars per year. At the same time, this model also has cross-vertical application capabilities. Only by allocating the enormous costs across more applications and using data from those applications to improve the model can a virtuous cycle be established.
"In the end, this leads us to only one conclusion: all of us intelligent driving companies must eventually transform into mobile physical AI companies. And Zoyu's vision is to become one of the key infrastructures in the era of mobile physical AI."
-END-