06/12 2026
516
Translation stands out as one of the most remarkable application scenarios for artificial intelligence (AI).
At the recent Google I/O event, Google officially introduced the latest iteration of its Gemini model, version 3.5, and announced plans to integrate Gemini's capabilities across a broader range of products within the Google ecosystem.
Surprisingly, the first Google ecosystem application to leverage Gemini wasn't one of the major services like Search, YouTube, or Maps. Instead, it was Google Translate, a tool that has seen a diminished presence in recent years. Yesterday, Google rolled out an update for Google Translate, embedding the latest audio model, Gemini 3.5 Live Translate (hereinafter referred to as Gemini 3.5 LT).

Image Source: Google
So, how does Google Translate perform in real-time voice translation with the integration of Gemini 3.5 LT?
To experience Google Translate's real-time translation feature, select the "Listen in Real-Time" mode, which is now powered by Gemini 3.5 LT. Interestingly, this mode is only accessible when connected to external headphones. Given that Google Translate's real-time listening mode doesn't offer bidirectional functionality like Timekettle's simultaneous interpretation mode, the necessity for headphones is somewhat puzzling.
Fortunately, operating the "Listen in Real-Time" mode is straightforward: activate the mode, position your phone near the speaker, and you can listen to the translated audio in real-time through your headphones.

Image Source: Leitech
Compared to Google Translate's previous "Conversation" mode, which required users to hold a button to speak and release it to initiate translation, the "Listen in Real-Time" mode commences translation after the speaker completes a short sentence, significantly enhancing timeliness. However, Google Translate still exhibits noticeable latency during the translation process.
Take, for instance, the translation of ancient poems from Chinese to English: the translation of the first sentence is only audible in the headphones when the third sentence is being spoken. There remains a considerable gap compared to professional simultaneous interpretation headphones, which commence translation almost instantaneously.
Regarding translation accuracy, Leitech conducted tests on Google Translate using several classic gaming dialogue "original soundtracks" and compared the results with Apple Translate and Youdao Translate.
Let's first examine the original text. Players of GTA: SA should be familiar with this ordering recording:
"I'll have 2 number 9s, a number 9 large, a number 6 with extra dip, a number 7, 2 number 45s, one with cheese and a large soda."
Here's how Google Translate "heard" the English original text:
"Now, I have two number nines, but number nine Lord, number six with extra dip, a number seven, two number 45s, one with cheese and a large soda. Okay. Okay."
As evident, even with Gemini 3.5 LT, Google Translate occasionally misses some details in the original text, such as:
"I’ll have" is translated as "I have" (due to connected speech);
"a number 9 large" is misinterpreted as "but number nine Lord" (recognition error);
Superfluous words like "Now" and "Okay. Okay." appear at the beginning and end (recognition hallucination).
Nonetheless, in terms of translation effectiveness, Google Translate accurately translates this erroneous original text.

Image Source: Leitech
Regarding speech rendering, the voice output by Google Translate is indeed more rhythmic and natural compared to pure text-to-speech (TTS), but it's still discernible as AI-synthesized speech.
In contrast, Apple Translate encounters significantly more issues, with numerous recognition errors rendering the translation nearly unusable.

Image Source: Leitech
Youdao's simultaneous interpretation performs more stably, with only one omission due to connected speech ("I’ll have" becomes "I have") and one recognition error ("Soda" becomes "Soup"), maintaining consistent translation quality and speed.

Image Source: Leitech
Additionally, there's an oddity with Google Translate's "Listen in Real-Time" mode: for reasons unknown, it doesn't save translation records from this mode.
While Google Translate offers various translation modes like "Conversation" and "Text," which allow users to view the original and translated texts in the history after translation, all translation records in "Listen in Real-Time" mode vanish after exiting (including forced exits due to headphone disconnection).
If you intend to use Google Translate's "Listen in Real-Time" mode for interview recording, Leitech advises against it.

Image Source: Leitech
Moreover, during testing, Leitech discovered that Google Translate's "Listen in Real-Time" mode still suffers from mistranslations and subject confusion. However, on a positive note, even human translators make mistakes, and translation software like Google Translate can iterate technologies and rectify issues at a swifter pace.
More importantly, as a frontrunner in the global general translation sector, Google Translate's integration into the AI camp will undoubtedly propel the entire general translation sector towards AI translation.
Viewing Google Translate's update in isolation, Leitech perceives it as merely catching up on "AI features": with everyone else integrating large models, Google Translate had to follow suit. However, when considering the entire AI hardware market in 2026, Leitech believes translation could emerge as one of the most noteworthy AI application scenarios this year.
The rationale is straightforward: compared to many AI features still in the "show-off" stage, the demand for "translation" is clear and frequent. Unlike categories that necessitate "educating users" or "creating demand," brands don't need to elucidate "what AI can do," and users won't question "why use AI."
For instance, Timekettle's X1 Meeting AI simultaneous interpretation device leverages its AI model capabilities to address traditional shortcomings in simultaneous interpretation, such as sentence boundary recognition, semantic inference, and contextual error correction. Beyond translation, Timekettle also employs AI technology to create bone conduction voice recognition, enabling translation headphones to accurately identify speakers, laying the groundwork for subsequent simultaneous interpretation.

Image Source: Leitech
iFlytek's newly released AI glasses adopt a different strategy. Since their inception, mainstream smart glasses have primarily focused on "photography" as their core scenario. However, iFlytek's AI glasses, released last month, uniquely position "omni-scenario translation" as their core AI feature.

Image Source: Leitech
Replacing photography with translation not only addresses the issue of smart glasses "gathering dust" by utilizing a more frequent scenario but also fully leverages iFlytek's strengths. By relying on its years of technical accumulation in translation, iFlytek swiftly establishes its AI glasses in the smart glasses market.
From Leitech's perspective, whether it's Timekettle's simultaneous interpretation headphones or iFlytek's AI glasses, these AI translation devices essentially aim to transform translation from an app feature into a capability that can be "transplanted" into different hardware, covering more scenarios.
Take Google Translate as an example: although it currently faces issues like latency, missed translations, and record loss, in the long run, Gemini 3.5 LT's real-time audio capabilities could be integrated into headphones, glasses, conference devices, and even car cabins.
For professional translation hardware manufacturers like Timekettle and iFlytek, the emergence of Gemini 3.5 LT presents both pressure and opportunity. The "pressure" is evident: once players like Google enter the field, they will inevitably raise user expectations for free translation tools. Last year, various smartphone brands incorporated AI translation features into their TWS earphones, directly compressing the market space for entry-level translation headphones and elevating the "passing grade" for translation headphone capabilities.

Image Source: JD.com
However, from another perspective, general translation also has inherent shortcomings: business meetings require multi-person recognition, interviews necessitate backups, cross-border exhibitions demand long battery life, and noisy environments require stronger sound reception. These issues cannot be resolved with a single model update.
Therefore, Google Translate's improvement doesn't signify the end of the road for translation headphones and glasses.
General translation apps like Google Translate and products utilizing general translation models can only address the issue of "going from nothing to something." Future high-end translation headphones will undoubtedly employ dedicated AI capabilities as the core driver for product iteration, widening the experience gap with faster and more robust dedicated translation models. Only in this manner can they retain their core user base amid the impact of "AI translation headphones" like AirPods and continue their dominance in more niche, high-value markets.
As free apps continue to lower the barrier for basic translation, professional devices must demonstrate their strength and value in professional scenarios. It's certain that with the widespread adoption of AI technology in the translation sector, translation hardware will inevitably undergo a new round of reshuffling.
Technological innovation, product survival of the fittest, and enhanced consumer experiences—this is the significance of AI technology driving industry development.
Google, Google Translate, Gemini, AI Translation, Real-Time Translation
Source: Leitech
Images in this article are from the 123RF royalty-free image library. Source: Leitech