Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud | Gemma Gem: Đưa mô hình AI chạy trực tiếp trên trình duyệt, không cần Cloud hay API | Now Let Us

Gemma Gem is a browser extension that runs Google's Gemma 4 model entirely on-device via WebGPU, ensuring data privacy while enabling AI to interact directly with web content.

Your personal AI assistant living right inside the browser. Gemma Gem runs Google's Gemma 4 model entirely on-device via WebGPU — no API keys, no cloud, no data leaving your machine. It can read pages, click buttons, fill forms, run JavaScript, and answer questions about any site you visit.

Chrome with WebGPU support
~500MB disk for E2B model, ~1.5GB for E4B (cached after first run)

pnpm install
pnpm build

Load the extension in chrome://extensions

(developer mode) from .output/chrome-mv3-dev/

Navigate to any page
Click the gem icon (bottom-right corner) to open the chat
Wait for model to load (progress shown on icon + chat)
Ask questions about the page or request actions

Offscreen Document Service Worker Content Script
(Gemma 4 + Agent Loop) <-> (Message Router) <-> (Chat UI + DOM Tools)
| |
WebGPU inference Screenshot capture
Token streaming JS execution

Offscreen document: Hosts the model via@huggingface/transformers

WebGPU. Runs the agent loop.Service worker: Routes messages between content scripts and offscreen document. Handlestake_screenshot

andrun_javascript

.Content script: Injects gem icon + shadow DOM chat overlay. Executes DOM tools (read_page_content

,click_element

,type_text

,scroll_page

| Tool | Description | Runs in | |---|---|---| read_page_content | Read text/HTML of the page or a CSS selector | Content script | take_screenshot | Capture visible page as PNG | Service worker | click_element | Click an element by CSS selector | Content script | type_text | Type into an input by CSS selector | Content script | scroll_page | Scroll up/down by pixel amount | Content script | run_javascript | Execute JS in the page context with full DOM access | Service worker |

Click the gear icon in the chat header:

Model: Switch between Gemma 4 E2B (~500MB) and E4B (~1.5GB). Selection persists across sessions.Thinking: Toggle native Gemma 4 thinkingMax iterations: Cap on tool call loops per requestClear context: Reset conversation history for the current pageDisable on this site: Disable the extension per-hostname (persisted)

pnpm build # Development build (with logging, source maps)
pnpm build:prod # Production build (logging silenced, minified)

WXT — Chrome extension framework (Vite-based)
@huggingface/transformers — Browser ML inference
marked — Markdown rendering in chat
Gemma 4 E2B / E4B ( onnx-community/gemma-4-E2B-it-ONNX

,onnx-community/gemma-4-E4B-it-ONNX

) — q4f16 quantization, 128K context

All logs are prefixed with [Gemma Gem]

. In development builds, info/debug/warn logs are active. Production builds only log errors.

Service worker logs:chrome://extensions

→ Gemma Gem → "Inspect views: service worker"Offscreen document logs:chrome://extensions

→ Gemma Gem → "Inspect views: offscreen.html"Content script logs: Open DevTools on any page → ConsoleAll extension pages:chrome://inspect#other

lists all inspectable extension contexts (service worker, offscreen document, etc.)

The offscreen document logs are the most useful — they show model loading, prompt construction, token counts, raw model output, and tool execution.

The agent/

directory has zero dependencies. It defines interfaces (ModelBackend

, ToolExecutor

) and can be extracted to a standalone library.

Source: Hacker News

Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud

More in this category

Leaving Mozilla

Shepherd's Dog: A Game by the Most Dangerous AI Model

Open source AI must win

Statement on US government directive to suspend access to Fable 5 and Mythos 5

Electric motors with no rare earths

Swift at Apple: Migrating the TrueType hinting interpreter

Discover All Categories