NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...3 min read

April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

Share
NOW LET US Article – April 2026 TLDR Setup for Ollama and Gemma 4 26B on a Mac mini

A comprehensive guide to setting up Gemma 4 26B on Apple Silicon Mac mini, featuring optimizations for MLX, NVFP4 format, and advanced memory management techniques.

  • Mac mini with Apple Silicon (M1/M2/M3/M4/M5)
  • At least 24GB unified memory for Gemma 4 26B
  • macOS with Homebrew installed

Install the Ollama macOS app via Homebrew cask (includes auto-updates and MLX backend):

brew install --cask ollama-app

This installs:

Ollama.app

in/Applications/

ollama

CLI at/opt/homebrew/bin/ollama

open -a Ollama

The Ollama icon will appear in the menu bar. Wait a few seconds for the server to initialize.

Verify it's running:

ollama list

ollama pull gemma4:26b

This downloads ~17GB. Verify:

ollama list
# NAME ID SIZE MODIFIED
# gemma4:26b 5571076f3d70 17 GB ...

ollama run gemma4:26b "Hello, what model are you?"

Check that it's using GPU acceleration:

ollama ps
# Should show CPU/GPU split, e.g. 14%/86% CPU/GPU

Click the Ollama icon in the menu bar > Launch at Login (enable it).

Alternatively, go to System Settings > General > Login Items and add Ollama.

Create a launch agent that loads the model into memory after Ollama starts and keeps it warm:

cat << 'EOF' > ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.ollama.preload-gemma4</string>
<key>ProgramArguments</key>
<array>
<string>/opt/homebrew/bin/ollama</string>
<string>run</string>
<string>gemma4:26b</string>
<string></string>
</array>
<key>RunAtLoad</key>
<true/>
<key>StartInterval</key>
<integer>300</integer>
<key>StandardOutPath</key>
<string>/tmp/ollama-preload.log</string>
<key>StandardErrorPath</key>
<string>/tmp/ollama-preload.log</string>
</dict>
</plist>
EOF

Load the agent:

launchctl load ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist

This sends an empty prompt to ollama run

every 5 minutes, keeping the model warm in memory.

By default, Ollama unloads models after 5 minutes of inactivity. To keep them loaded forever:

launchctl setenv OLLAMA_KEEP_ALIVE "-1"

Then restart Ollama for the change to take effect.

Note:This environment variable is session-scoped. To persist across reboots, addexport OLLAMA_KEEP_ALIVE="-1"

to your~/.zshrc

, or set it via a dedicated launch agent.

# Check Ollama server is running
ollama list
# Check model is loaded in memory
ollama ps
# Check launch agent is registered
launchctl list | grep ollama

Expected output from ollama ps

:

NAME ID SIZE PROCESSOR CONTEXT UNTIL
gemma4:26b 5571076f3d70 20 GB 14%/86% CPU/GPU 4096 Forever

Ollama exposes a local API at http://localhost:11434

. Use it with coding agents:

# Chat completion (OpenAI-compatible)
curl http://localhost:11434/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma4:26b",
"messages": [{"role": "user", "content": "Hello"}]
}'

| Command | Description | |---|---| ollama list | List downloaded models | ollama ps | Show running models & memory usage | ollama run gemma4:26b | Interactive chat | ollama stop gemma4:26b | Unload model from memory | ollama pull gemma4:26b | Update model to latest version | ollama rm gemma4:26b | Delete model |

# Remove the preload agent
launchctl unload ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist
rm ~/Library/LaunchAgents/com.ollama.preload-gemma4.plist
# Uninstall Ollama
brew uninstall --cask ollama-app

On Apple Silicon, Ollama automatically uses Apple's MLX framework for faster inference — no manual configuration needed. M5/M5 Pro/M5 Max chips get additional acceleration via GPU Neural Accelerators. M4 and earlier still benefit from general MLX speedups.

Ollama now leverages NVIDIA's NVFP4 format to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads. As more inference providers scale inference using NVFP4 format, this allows Ollama users to share the same results as they would in a production environment. It further opens up Ollama to run models optimized by NVIDIA's model optimizer.

**Lower memory utilization:**Ollama reuses its cache across conversations, meaning less memory utilization and more cache hits when branching with a shared system prompt — especially useful with tools like Claude Code.**Intelligent checkpoints:**Ollama stores snapshots of its cache at intelligent locations in the prompt, resulting in less prompt processing and faster responses.**Smarter eviction:**Shared prefixes survive longer even when older branches are dropped.

**Memory:**Gemma 4 26B uses ~20GB when loaded. On a 24GB Mac mini, this leaves ~4GB for the system — close memory-heavy apps before running.

  • Ollama MLX Blog Post — Ollama Newsletter, March 31, 2026
  • Ollama v0.20.0 Release
  • Gemma 4 Announcement — Google DeepMind
© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – GLM 5.2 Is Out

dev-tools

GLM 5.2 Is Out

Zhipu AI has officially released GLM-5.2, its most powerful open-source model to date, featuring a 1M context window and advanced long-horizon task capabilities. The release underscores Zhipu's commitment to open-source AI and global scientific collaboration amid rising technological restrictions.

NOW LET US Related – Treating pancreatic tumours may have revealed cancer's master switch

dev-tools

Treating pancreatic tumours may have revealed cancer's master switch

A promising new drug called daraxonrasib has shown breakthrough results in treating pancreatic cancer, doubling median survival times. This achievement could pave the way for an entirely new class of cancer treatments.

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

NOW LET US Related – Open source AI must win

dev-tools

Open source AI must win

If artificial intelligence becomes a utility rented only from a few closed institutions, humanity loses its operational freedom. Open-source AI is a vital infrastructure for the future of our digital society.

NOW LET US Related – Statement on US government directive to suspend access to Fable 5 and Mythos 5

dev-tools

Statement on US government directive to suspend access to Fable 5 and Mythos 5

The US government has issued an export control directive forcing Anthropic to suspend all access to its Fable 5 and Mythos 5 models due to national security concerns, a move the AI safety startup strongly disputes.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.