NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...3 min read

Holo3.1: Fast & Local Computer Use Agents

Share
NOW LET US Article – Holo3.1: Fast & Local Computer Use Agents

Hcompany has released Holo3.1, a new family of computer-use agents designed for robust cross-platform performance and fast, private local execution.

Users want to run the same computer-use capabilities across desktop and mobile environments, with seamless integration with different agent frameworks. They want deployment flexibility, from cloud inference to fully local execution on end-user devices.

This is why we are releasing the Holo3.1 family. Holo3.1 improves robustness across the three dimensions that matter most in production: environments (web, desktop, mobile), agent frameworks, and deployment targets. For the first time, we release quantized checkpoints optimized for local inference, including FP8, Q4 GGUF, and NVFP4.

Holo3.1 is a major step toward our vision of universal computer-use agents: systems that can operate across environments, integrate into any agent stack, and run wherever the workflow lives.

Based on the Qwen family, Holo3.1 was designed to improve robustness across the environments where computer-use agents are actually deployed, while retaining state-of-the-art performance.

As teams moved Holo3 from evaluation to production, we repeatedly observed the same challenge: strong performance in one setting does not necessarily transfer to another. Mobile devices, alternative agent harnesses, and different execution frameworks all introduce their own sources of distribution shift.

Holo3.1 expands Holo3's capabilities beyond browser and desktop control, delivering major gains on mobile environments. On AndroidWorld, our 35B-A3B model improves from 67% to 79.3%, while the smaller 4B and 9B variants improve from 58% to 72%.

To better support teams deploying Holo inside third-party agent stacks, Holo3.1 introduces native support for function-calling protocols in addition to the structured JSON outputs already available in Holo3.

Across OSWorld and our internal benchmark suite covering e-commerce, business software, and collaboration workflows, function-calling and native execution now achieve near-parity performance. Holo3.1 also delivers more than a 25% improvement over Holo3 when evaluated inside our Holotab product harness.

To further enable local and on-device inference, we are also releasing new model sizes including small models (0.8B, 4B, and 9B) for cost-effective and private deployment, in addition to the larger 35B-A3B model for state-of-the-art performance.

Performance versus cost for the Holo3.1 and Qwen 3.5 families. Overall performance averages the four H Corporate benchmarks first (so each family is equally weighted), then takes the mean across OSWorld, AndroidWorld, H Corporate, ScreenSpot-Pro, and OSWorld-G.

This is our first release to ship quantized weights. We’re starting with 35B-A3B checkpoints, available in FP8, Q4 GGUF, and NVFP4.

For NVFP4, we used NVIDIA's Model Optimizer in a W4A16 configuration. These checkpoints enable fast local inference for Computer Use Agents with little to no degradation in model performance. FP8 and NVFP4 achieve the same OSWorld scores, only about two points below the full-precision BF16 checkpoint.

The speedups are substantial: on DGX Spark, NVFP4 W4A16 delivers 1.41× the total token throughput of FP8 and 1.74× that of BF16.

We also release Q4 GGUF checkpoints aimed at local deployment of Computer Use Agents on consumer hardware.

The agent itself runs locally on a Windows or Mac machine, while the model can either run on that same machine—we include reference numbers for Apple Silicon—or on a DGX Spark on the same network. In both cases, execution stays fully private and local, with nothing leaving the user's network.

On Spark, agent harness optimizations we developed with NVIDIA combined with the NVFP4 quantization above deliver a compound ~2× end-to-end speedup over the FP8 baseline, cutting average step time from 6.8s to 3.3s.

Agent request rate across platforms and precisions. On DGX Spark, vLLM with NVFP4 achieves the highest request rate in both Default and Fast modes, followed by Q4 GGUF and FP8. These improvements and more will land in an upcoming desktop agent harness.

The Holo3.1 family is available in four sizes:

| Model | Deployment Target | |---|---| | Holo3.1-0.8B | Ultra-lightweight local agents | | Holo3.1-4B | Cost-efficient deployment | | Holo3.1-9B | Balanced performance and latency | | Holo3.1-35B-A3B | State-of-the-art performance |

We are also releasing optimized FP8, NVFP4, and Q4 GGUF checkpoints for local and edge deployment.

© 2026 Now Let Us. All rights reserved.

Source: Hugging Face Blog

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – GLM 5.2 Is Out

dev-tools

GLM 5.2 Is Out

Zhipu AI has officially released GLM-5.2, its most powerful open-source model to date, featuring a 1M context window and advanced long-horizon task capabilities. The release underscores Zhipu's commitment to open-source AI and global scientific collaboration amid rising technological restrictions.

NOW LET US Related – Noise infusion banned from statistical products published by Census Bureau

dev-tools

Noise infusion banned from statistical products published by Census Bureau

The U.S. Department of Commerce has banned "noise infusion" from statistical products published by the Census Bureau, a decision that could have severe consequences for both data utility and privacy protection.

NOW LET US Related – Treating pancreatic tumours may have revealed cancer's master switch

dev-tools

Treating pancreatic tumours may have revealed cancer's master switch

A promising new drug called daraxonrasib has shown breakthrough results in treating pancreatic cancer, doubling median survival times. This achievement could pave the way for an entirely new class of cancer treatments.

NOW LET US Related – Every Frame Perfect

dev-tools

Every Frame Perfect

In UI design, perfection isn't just about the start and end states, but every single transition frame in between. Polishing these micro-interactions is key to building user trust.

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.