Unveiling Two 'Ace' Models! Volcano Engine Fires Up, Leaving Translators and Designers in Awe

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

07/31 2025 536

Volcano Engine has solidified its position as the cornerstone for AI implementation.

The international scene for large models has been buzzing with activity lately.

Night owls engrossed in tech circles have been inundated with news: OpenAI is poised to release GPT-5, Musk's frugal Grok 3 sparked debates due to its anime female characters, and Google AI Studio's latest Gemini 2.5 Pro kept me engrossed for weeks, along with a glimpse into Google's nuanced censorship standards.

Switching gears to domestic developments...

Ah yes, on July 30th, Volcano Engine's AI Innovation Tour touched down in Xiamen.

Although I hail from Guangzhou, where Leitech is also based, ByteDance's invitation was too enticing to pass up. A few hundred kilometers couldn't deter my curiosity. I swiftly booked a high-speed train ticket, packed a small bag, and embarked on my journey to the venue.

(Image source: Leitech)

Despite the modest size of the Xiamen stop, the enthusiasm of attendees made the venue buzzing with activity. As expected, seats were scarce, and many attendees had to stand outside, gathered around the entrance, listening intently.

Just picturing the scene is quite remarkable.

Intriguingly, contrary to the excitement from outside observers, Volcano Engine's agenda at this tour focused not on unveiling flashy new gadgets, but on practical implementations of AI large model technology.

Eager to know what Volcano Engine has up its sleeve? Follow me.

About a month ago, I attended the "2025 Volcano Engine Spring FORCE Motive Power Conference" in Shanghai.

As a biennial event, Volcano Engine delivered numerous noteworthy updates, including the official release of Doubao Large Model 1.6, a comprehensive refresh of the Doubao Large Model family, and new information on Kouzi and TRAE, giving the impression that it aimed to outshine other similar Chinese large models.

Surprisingly, just a month later, the Doubao Large Model family welcomed two new members.

Indeed! The highlight of the Xiamen stop was undoubtedly the official release of the Doubao Simultaneous Interpretation Model Seed LiveInterpret 2.0 and the Doubao Image Editing Model Seededit 3.0.

According to Tan Dai, President of Volcano Engine, the Doubao Simultaneous Interpretation Model 2.0 is the first product-level Chinese-English speech simultaneous interpretation system with latency and accuracy approaching human levels. It achieves industry-leading translation quality in Chinese-English simultaneous interpretation while maintaining extremely low speech latency.

(Image source: Leitech)

For a long time, traditional machine simultaneous interpretation has been hampered by the cascaded architecture of "speech recognition → MT machine translation → speech synthesis." This multi-stage process introduces delays and losses at each stage, with errors accumulating, resulting in high latency, poor prosody, and stilted semantics in translation results.

The Doubao simultaneous interpretation model breaks this mold, adopting an industry-leading end-to-end full-duplex speech translation framework. It not only translates directly from the source language to the target language but also retains the rich prosodic information of the source language, making the content closer to the speaker's true intent and compressing latency to an astonishing 2-3 seconds.

Moreover, this model also achieves "zero-shot voice cloning."

Leveraging powerful speaker identity encoding technology, the Doubao simultaneous interpretation model can swiftly extract unique voiceprint features from just the first 3-5 seconds of the speaker's audio and dynamically adjust the output rhythm based on the target language's linguistic habits, bidding farewell to the monotonous, flat "robotic tone" of traditional machine translation.

(Image source: Leitech)

Tan Dai demonstrated on-site that, without prior voice bank training, Doubao can clone, translate, and complete simultaneous interpretation almost immediately after the user finishes speaking, yielding impressive results.

Regarding the upgrade of the Doubao Image Editing Model 3.0, it addresses issues such as the model's difficulty in understanding users' real needs, inadequate instruction execution, incorrect modifications of unintended areas, and the lack of aesthetic appeal in output images. The new version boasts stronger instruction-following capabilities, image preservation capabilities, and higher image generation quality.

(Image source: Leitech)

The series of Xiamen landscapes transformed by Doubao's on-site demonstration was indeed fascinating, but the true fun of such capabilities unfolds when you try Doubao yourself.

Additionally, the underlying large model of Doubao 1.6, released last month, has recently upgraded its large language model capabilities in coding, reasoning, mathematics, etc. The previously open-sourced Coze-related projects have also garnered unanimous praise from developers.

It's safe to say that this series of releases alone made the attendees feel it was a trip well worth taking.

Beyond the impressive simultaneous interpretation model, Agent (intelligent agent) remains Volcano Engine's core focus area.

Currently, the unification of multimodal models and their associated APIs is a significant trend in the development of all large models on the market.

To this end, the Volcano Ark platform has upgraded its API system and launched the Responses API.

(Image source: Leitech)

According to Wu Di, head of intelligent algorithms at Volcano Engine, the Responses API boasts native context management capabilities, supports chained management of multi-turn dialogues, and can seamlessly integrate text, images, and mixed-modal data, significantly reducing latency and cost. In typical applications, the overall cost reduction can reach 80%.

Furthermore, the Responses API supports the autonomous selection of call tools. With a single user request, it can link multiple built-in tools, custom functions, and multi-turn model combinations to respond, solving complex tasks and making Agent development more time- and labor-efficient.

For enterprise customers with model customization needs, Volcano Engine has also introduced a corporate-owned model hosting solution.

Relying on the Volcano Ark model unit, enterprises can fully host their self-developed models on Volcano Ark without needing to manage underlying GPU resources or perform complex network configurations. This solution offers extreme elastic computing resources, significantly reducing costs and increasing efficiency.

(Image source: Leitech)

This series of sharing marks that Volcano Engine is providing a full lifecycle solution for Agent implementation, from development, management to deployment. It may only be a matter of time before digital employees are fully integrated into our daily work.

Of course, to prove that good models and tools can effectively accelerate Agent implementation, testimonials from front-line customers are invaluable.

At this tour, Volcano Engine specifically invited two highly representative guests: Director Xu Zhuobin from the Information Center of Xiamen University, who shared innovative AI practices in education and scientific research; and Huang Jifeng from NetDragon Tianqing AI Platform, who discussed how AI creates more intelligent human-computer interactions to help players navigate the novice period.

(Image source: Leitech)

These real-world case studies from local and industry sources are far more compelling than mere technical presentations.

Among domestic large model vendors, Volcano Engine's achievements are quite impressive.

According to the latest data, as of the end of May 2025, the daily average number of tokens processed by the Doubao Large Model has soared to over 16.4 trillion, a 136-fold increase compared to the same period last year. Currently, the Doubao Large Model has been widely implemented in industries such as automobiles, smart terminals, the internet, finance, education and scientific research, and retail consumption, covering over 500 million terminal devices—a truly impressive resume.

(Image source: Leitech)

After reviewing the entire tour agenda, my feelings are quite clear.

As a promising direction for large model implementation, enterprises lacking technical support find it challenging to handle AI and Agents easily. Poor results, high costs, and difficult implementation have always been key challenges.

The series of products and agenda showcased by Volcano Engine at the Xiamen stop—from the iteration of the underlying large model, simultaneous interpretation model, and image editing model, to the systematic Agent development and operation platform, and in-depth industry practice sharing—have indeed standardized processes, improved efficiency, and promoted collaboration at the level of large model implementation, making it easier for enterprises to develop Agents.

It is foreseeable that with the effective reduction of the technical threshold, to strengthen market competitiveness, enterprises will more boldly embark on transformation and accelerate the integration of AI capabilities with their own businesses.

Making large models a genuine part of corporate productivity may no longer be a dream.

Source: Leitech

Images in this article are from: 123RF Authentic Library Source: Leitech

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links