Show HN: Gemma Gem – AI model embedded in a browser – no API keys, no cloud

Gemma Gem is a browser extension that runs Google's Gemma 4 model entirely on-device via WebGPU, ensuring data privacy while enabling AI to interact directly with web content.
Your personal AI assistant living right inside the browser. Gemma Gem runs Google's Gemma 4 model entirely on-device via WebGPU — no API keys, no cloud, no data leaving your machine. It can read pages, click buttons, fill forms, run JavaScript, and answer questions about any site you visit.
- Chrome with WebGPU support
- ~500MB disk for E2B model, ~1.5GB for E4B (cached after first run)
pnpm install
pnpm build
Load the extension in chrome://extensions
(developer mode) from .output/chrome-mv3-dev/
.
- Navigate to any page
- Click the gem icon (bottom-right corner) to open the chat
- Wait for model to load (progress shown on icon + chat)
- Ask questions about the page or request actions
Offscreen Document Service Worker Content Script
(Gemma 4 + Agent Loop) <-> (Message Router) <-> (Chat UI + DOM Tools)
| |
WebGPU inference Screenshot capture
Token streaming JS execution
Offscreen document: Hosts the model via@huggingface/transformers
- WebGPU. Runs the agent loop.Service worker: Routes messages between content scripts and offscreen document. Handles
take_screenshot
andrun_javascript
.Content script: Injects gem icon + shadow DOM chat overlay. Executes DOM tools (read_page_content
,click_element
,type_text
,scroll_page
).
| Tool | Description | Runs in |
|---|---|---|
read_page_content |
Read text/HTML of the page or a CSS selector | Content script |
take_screenshot |
Capture visible page as PNG | Service worker |
click_element |
Click an element by CSS selector | Content script |
type_text |
Type into an input by CSS selector | Content script |
scroll_page |
Scroll up/down by pixel amount | Content script |
run_javascript |
Execute JS in the page context with full DOM access | Service worker |
Click the gear icon in the chat header:
Model: Switch between Gemma 4 E2B (~500MB) and E4B (~1.5GB). Selection persists across sessions.Thinking: Toggle native Gemma 4 thinkingMax iterations: Cap on tool call loops per requestClear context: Reset conversation history for the current pageDisable on this site: Disable the extension per-hostname (persisted)
pnpm build # Development build (with logging, source maps)
pnpm build:prod # Production build (logging silenced, minified)
- WXT — Chrome extension framework (Vite-based)
- @huggingface/transformers — Browser ML inference
- marked — Markdown rendering in chat
- Gemma 4 E2B / E4B (
onnx-community/gemma-4-E2B-it-ONNX
,onnx-community/gemma-4-E4B-it-ONNX
) — q4f16 quantization, 128K context
All logs are prefixed with [Gemma Gem]
. In development builds, info/debug/warn logs are active. Production builds only log errors.
Service worker logs:chrome://extensions
→ Gemma Gem → "Inspect views: service worker"Offscreen document logs:chrome://extensions
→ Gemma Gem → "Inspect views: offscreen.html"Content script logs: Open DevTools on any page → ConsoleAll extension pages:chrome://inspect#other
lists all inspectable extension contexts (service worker, offscreen document, etc.)
The offscreen document logs are the most useful — they show model loading, prompt construction, token counts, raw model output, and tool execution.
The agent/
directory has zero dependencies. It defines interfaces (ModelBackend
, ToolExecutor
) and can be extracted to a standalone library.
Source: Hacker News












