NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...1 min read

Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

Share
NOW LET US Article – Show HN: Real-time AI (audio/video in, voice out) on an M3 Pro with Gemma E2B

Parlor is an on-device, real-time multimodal AI that enables natural voice and vision conversations using Google's Gemma 4 E2B and Kokoro TTS, running locally on Apple Silicon or Linux GPUs.

On-device, real-time multimodal AI. Have natural voice and vision conversations with an AI that runs entirely on your machine.

Parlor uses Gemma 4 E2B for understanding speech and vision, and Kokoro for text-to-speech. You talk, show your camera, and it talks back, all locally.

Research preview. This is an early experiment. Expect rough edges and bugs.

I'm self-hosting a totally free voice AI on my home server to help people learn speaking English. It has hundreds of monthly active users, and I've been thinking about how to keep it free while making it sustainable.

The obvious answer: run everything on-device, eliminating any server cost. Six months ago I needed an RTX 5090 to run just the voice models in real-time.

Google just released a super capable small model that I can run on my M3 Pro in real-time, with vision too! Sure you can't do agentic coding with this, but it is a game-changer for people learning a new language. Imagine a few years from now that people can run this locally on their phones. They can point their camera at objects and talk about them. And this model is multi-lingual, so people can always fallback to their native language if they want. This is essentially what OpenAI demoed a few years ago.

Voice Activity Detection in the browser (Silero VAD). Hands-free, no push-to-talk. Barge-in. Interrupt the AI mid-sentence by speaking. Sentence-level TTS streaming. Audio starts playing before the full response is generated.

  • Python 3.12+
  • macOS with Apple Silicon, or Linux with a supported GPU
  • ~3 GB free RAM for the model

Models are downloaded automatically on first run (~2.6 GB for Gemma 4 E2B, plus TTS models).

| Stage | Time | |---|---| | Speech + vision understanding | ~1.8-2.2s | | Response generation (~25 tokens) | ~0.3s | | Text-to-speech (1-3 sentences) | ~0.3-0.7s | | Total end-to-end | ~2.5-3.0s |

Decode speed: ~83 tokens/sec on GPU (Apple M3 Pro).

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

NOW LET US Related – Open source AI must win

dev-tools

Open source AI must win

If artificial intelligence becomes a utility rented only from a few closed institutions, humanity loses its operational freedom. Open-source AI is a vital infrastructure for the future of our digital society.

NOW LET US Related – Statement on US government directive to suspend access to Fable 5 and Mythos 5

dev-tools

Statement on US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive forcing Anthropic to suspend all access to its Fable 5 and Mythos 5 models due to national security concerns, a move the AI safety startup strongly disputes.

NOW LET US Related – Electric motors with no rare earths

dev-tools

Electric motors with no rare earths

Renault Group is pioneering the development of electrically excited synchronous motors (EESM) that eliminate the need for rare earth magnets, reducing dependency on global monopolies while driving efficiency and sustainability.

NOW LET US Related – Swift at Apple: Migrating the TrueType hinting interpreter

dev-tools

Swift at Apple: Migrating the TrueType hinting interpreter

Apple has rewritten its TrueType hinting interpreter from C to memory-safe Swift for its Fall 2025 OS releases, improving security and boosting performance by an average of 13%.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.