NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...2 min read

Ollama is now powered by MLX on Apple Silicon in preview

Share
NOW LET US Article – Ollama is now powered by MLX on Apple Silicon in preview

Ollama has introduced a preview version powered by Apple's MLX framework, delivering significant performance gains on Apple Silicon. This update accelerates demanding AI tasks like coding agents and personal assistants on macOS.

Ollama is now powered by MLX on Apple Silicon in preview

March 30, 2026

Today, we’re previewing the fastest way to run Ollama on Apple silicon, powered by MLX, Apple’s machine learning framework.

This unlocks new performance to accelerate your most demanding work on macOS:

  • Personal assistants like OpenClaw
  • Coding agents like Claude Code, OpenCode, or Codex

Accelerate coding agents like Pi or Claude Code

OpenClaw now responds much faster

Fastest performance on Apple silicon, powered by MLX

Ollama on Apple silicon is now built on top of Apple’s machine learning framework, MLX, to take advantage of its unified memory architecture.

This results in a large speedup of Ollama on all Apple Silicon devices. On Apple’s M5, M5 Pro and M5 Max chips, Ollama leverages the new GPU Neural Accelerators to accelerate both time to first token (TTFT) and generation speed (tokens per second).

Testing was conducted on March 29, 2026, using Alibaba’s Qwen3.5-35B-A3B model quantized to NVFP4 and Ollama’s previous implementation quantized to Q4_K_M using Ollama 0.18. Ollama 0.19 will see even higher performance (1851 token/s prefill and 134 token/s decode when running with int4).

NVFP4 support: higher quality responses and production parity

Ollama now leverages NVIDIA’s NVFP4 format to maintain model accuracy while reducing memory bandwidth and storage requirements for inference workloads.

As more inference providers scale inference using NVFP4 format, this allows Ollama users to share the same results as they would in a production environment.

It further opens up Ollama to have the ability to run models optimized by NVIDIA’s model optimizer. Other precisions will be made available based on the design and usage intent from Ollama’s research and hardware partners.

Improved caching for more responsiveness

Ollama’s cache has been upgraded to make coding and agentic tasks more efficient.

Lower memory utilization: Ollama will now reuse its cache across conversations, meaning less memory utilization and more cache hits when branching when using a shared system prompt with tools like Claude Code. Intelligent checkpoints: Ollama will now store snapshots of its cache at intelligent locations in the prompt, resulting in less prompt processing and faster responses. Smarter eviction: shared prefixes survive longer even when older branches are dropped.

Get started

This preview release of Ollama accelerates the new Qwen3.5-35B-A3B model, with sampling parameters tuned for coding tasks.

Please make sure you have a Mac with more than 32GB of unified memory.

Claude Code:

ollama launch claude --model qwen3.5:35b-a3b-coding-nvfp4

OpenClaw:

ollama launch openclaw --model qwen3.5:35b-a3b-coding-nvfp4

Chat with the model:

ollama run qwen3.5:35b-a3b-coding-nvfp4

Future models

We are actively working to support future models. For users with custom models fine-tuned on supported architectures, we will introduce an easier way to import models into Ollama. In the meantime, we will expand the list of supported architectures.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – GLM 5.2 Is Out

dev-tools

GLM 5.2 Is Out

Zhipu AI has officially released GLM-5.2, its most powerful open-source model to date, featuring a 1M context window and advanced long-horizon task capabilities. The release underscores Zhipu's commitment to open-source AI and global scientific collaboration amid rising technological restrictions.

NOW LET US Related – Noise infusion banned from statistical products published by Census Bureau

dev-tools

Noise infusion banned from statistical products published by Census Bureau

The U.S. Department of Commerce has banned "noise infusion" from statistical products published by the Census Bureau, a decision that could have severe consequences for both data utility and privacy protection.

NOW LET US Related – Treating pancreatic tumours may have revealed cancer's master switch

dev-tools

Treating pancreatic tumours may have revealed cancer's master switch

A promising new drug called daraxonrasib has shown breakthrough results in treating pancreatic cancer, doubling median survival times. This achievement could pave the way for an entirely new class of cancer treatments.

NOW LET US Related – Every Frame Perfect

dev-tools

Every Frame Perfect

In UI design, perfection isn't just about the start and end states, but every single transition frame in between. Polishing these micro-interactions is key to building user trust.

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.