NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
AGENTIC-SYSTEMS...3 min read

Fluid, natural voice translation with Gemini 3.5 Live Translate

Share
NOW LET US Article – Fluid, natural voice translation with Gemini 3.5 Live Translate

Google has introduced Gemini 3.5 Live Translate, its latest audio model for seamless, real-time speech-to-speech translation with natural intonation across over 70 languages.

Fluid, natural voice translation with Gemini 3.5 Live Translate

Twenty years ago, translation at Google began as one of our pioneering machine learning experiments to turn the science of language into the magic of human connection. That experiment has come a long way with over a trillion words being translated for billions of users across our products every month.

Today, we’re taking our next step with the release of Gemini 3.5 Live Translate, our latest audio model for live speech-to-speech translation.

The model automatically detects 70+ languages and generates smooth, natural-sounding translated speech that preserves the speakers' intonation, pacing and pitch. Unlike turn by turn systems that wait for the speaker to finish speaking before responding, 3.5 Live Translate generates speech continuously, balancing the trade-off between waiting for context to improve quality and translating immediately to stay in sync with the speaker. It delivers fluid audio without awkward pauses and stays just a few seconds behind the speaker throughout the session.

Gemini 3.5 Live Translate is rolling out starting today across Google products:

  • For developers in public preview via the Gemini Live API and Google AI Studio
  • For enterprises in private preview starting this month in Google Meet
  • For everyone via Google Translate on Android and iOS

Build with 3.5 Live Translate

Gemini 3.5 Live Translate processes speech as it’s streamed, enabling a more seamless connection across languages. The model handles multilingual inputs without the need to manually configure settings. At the same time, its noise robustness ensures applications can handle loud, unpredictable environments. You can use its capabilities to help facilitate live interpretation for multilingual calls, meetings, lessons, broadcasts and more.

By utilizing the Gemini Live API, developer platforms like Agora, Fishjam, LiveKit, Pipecat, and Vision Agents enable developers to build and deploy voice translation apps with ease. These integrations handle the complex real-time media streaming infrastructure, so developers can focus on the user experience.

Our partners at Grab are testing the model to enable multilingual communication in near real-time between drivers and travelers at pickups. These users make over 10 million voice calls per month through Grab.

Read the early reviews

In addition to Grab, companies like CJ ENM, LiveKit and others have shared positive feedback on 3.5 Live Translate highlighting its impressive translation quality, accuracy and low latency.

Experience 3.5 Live Translate in your video meetings

Speech translation in Google Meet will soon use 3.5 Live Translate, improving the experience by:

  • Offering 70+ languages, an improvement from the previous limit of just five languages,
  • Enabling conversations across over 2000+ language combinations in one meeting, expanding from the previous state of only translating to and from English,
  • Updating the interface to provide instant access to speech translation.

We’re launching this update in private preview for select business Google Workspace customers starting this month, followed by a broader rollout later this year.

Get 3.5 Live Translate in the Google Translate app on Android or iOS

The model is also rolling out on the Google Translate app globally, on both Android and iOS. When using the Live translate feature, simply connect any pair of headphones to experience a more seamless translation that mirrors the speaker’s tone across 70+ languages.

For Android users, we’re also starting to roll out a new ‘listening mode’ with 3.5 Live Translate that lets you hear translations directly through your phone’s earpiece. Simply hold your phone to your ear just like a regular call, and the translated audio streams straight to you. This new experience can be helpful in situations where you want to quickly hear translations without others hearing, and you don’t have your headphones handy.

Using the new listening mode, users can hear a near real-time English translation of a guided tour in Spanish directly through their phone's earpiece.

Watermarked with SynthID

All audio generated by our models is watermarked with SynthID. This imperceptible watermark is woven directly into the audio output, ensuring AI-generated content remains detectable to help prevent misinformation.

© 2026 Now Let Us. All rights reserved.

Source: Google DeepMind Blog

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Position: Hippocampal Explicit Memory Is the Cornerstone for AGI

agentic-systems

Position: Hippocampal Explicit Memory Is the Cornerstone for AGI

A new position paper argues that integrating hippocampal explicit memory is the cornerstone for advancing Large Language Models (LLMs) toward Artificial General Intelligence (AGI), as current LLMs rely primarily on mechanisms analogous to human implicit memory.

NOW LET US Related – Toward Trustworthy AI: Multi-Target Adversarial Attacks and Robust Defenses for Continuous Data Summarization

agentic-systems

Toward Trustworthy AI: Multi-Target Adversarial Attacks and Robust Defenses for Continuous Data Summarization

A new study proposes multi-target adversarial attacks and robust defense mechanisms for continuous data summarization, marking a significant step toward securing the entire data-processing pipeline of trustworthy AI systems.

NOW LET US Related – Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning

agentic-systems

Reasoning or Memorization? Direction-Aware Diversity Exploration in LLM Reinforcement Learning

Current reinforcement learning methods for LLMs often struggle to distinguish between genuine reasoning and memorized shortcuts. To address this, researchers propose DiRL, a novel framework that guides exploration toward true reasoning.

NOW LET US Related – Introducing Gemma 4 12B: a unified, encoder-free multimodal model

agentic-systems

Introducing Gemma 4 12B: a unified, encoder-free multimodal model

Google introduces Gemma 4 12B, a unified, encoder-free multimodal model designed to run agentic workflows locally on laptops with just 16GB of RAM.

NOW LET US Related – Measuring the impact of learning with AI in Sierra Leone and beyond

agentic-systems

Measuring the impact of learning with AI in Sierra Leone and beyond

A real-world trial in Sierra Leone demonstrates that Gemini-powered Guided Learning significantly boosts math scores and fosters critical thinking. The study highlights AI's role as a powerful pedagogical partner that augments, rather than replaces, teachers.

NOW LET US Related – Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

agentic-systems

Detecting and Mitigating Bias by Treating Fairness as a Symmetry Operation

Researchers propose a novel framework that treats fairness in machine learning as a symmetry operation, mitigating bias by over 90% with minimal impact on accuracy.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.