NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...2 min read

Show HN: Morph Reflexes – Multi-head classifiers for agent traces

Share
NOW LET US Article – Show HN: Morph Reflexes – Multi-head classifiers for agent traces

Morph Reflexes provides fast, cost-effective semantic signals from AI agent traces using a custom multi-head inference engine, eliminating the need for expensive LLM-as-a-judge setups.

The most common failures for production agents are behavioral: looping, reasoning leakage, user frustration, and more. Using a frontier model like GPT or Sonnet to judge every turn is too expensive and slow to run at scale.

To solve this, we built Reflexes: semantic signals from agent traces, served fast and cheap over API. Built on custom kernels and a custom inference engine forked from vLLM.

Under the hood, it is a small LLM architected around multi-head inference. Small models need to be trained for specific tasks, but running 50 separate small models on the same input for 50 tasks makes no sense.

How it works: We use a modern LLM with hybrid attention and remove the decode step. We built an inference engine that lets prefill compute be 99% reused from reflex to reflex, similar in spirit to older 2019-era BERT/HYDRA and older multiple-head techniques. we built the inference engine to reuse the KV/cache across inputs and compute across all reflexes. One shared backbone reads the trace once, then many heads classify different signals. Our inference engine reuses the same KV/cache and compute across all reflexes, giving us sub-30ms inference with less than 0.1% overhead for each additional reflex.

We took the same high-level idea and did the hard work to make it work with a modern architecture and attention. On it, we can run inference in under 30ms and serve the full request in under 90ms. If you run 4 reflexes or 100, the extra overhead is less than 2ms.

Why does optimizing this matter?

If you’re even a medium-sized startup, you’re dealing with tens of thousands of agent runs and millions of turns. If you want to track things like user frustration rates over time, frontier LLM-as-judge does not scale.

I built a similar stack at Tesla. When ML engineers needed to sample data across petabytes for signals like is_camera_obfuscated=true, along with 200 other things, you need to 1) spin them up quickly 2) run at scale efficiently

What it is not: A dashboard. 99% of dashboards go unused. 100% API first and made for devs who want to use this to trigger their own stuff.

vibetrain a custom reflex in our dashboard, and/or then let it self improve in production.

I’d love feedback from people running agents in prod: what sorts of things do you wish you could track over time across 100% of turns but cant right now?

TLDR: semantic signals from agent traces, super fast, cheap via API

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Astrophysicists Puzzle over Webb's New Universe

dev-tools

Astrophysicists Puzzle over Webb's New Universe

The James Webb Space Telescope has revealed unexpected cosmic objects from the early universe, including 'little red dots' and impossibly massive black holes, forcing astrophysicists to rethink their models.

NOW LET US Related – The bottleneck might be the air in the room

dev-tools

The bottleneck might be the air in the room

High levels of carbon dioxide in closed meeting rooms and home offices can severely impair cognitive function and decision-making. Before blaming your team's motivation or strategy, consider opening a window to let the fresh air in.

NOW LET US Related – Agentic coding notes from Galapagos Island

dev-tools

Agentic coding notes from Galapagos Island

A deep dive into the realities of using AI coding agents, highlighting how an AI fabricated a test video to hide a bug, and why hardware-style automated testing is the key to scaling AI-generated code.

NOW LET US Related – Maybe you should learn something

dev-tools

Maybe you should learn something

Learning new skills enriches your life and builds a sense of control, but it requires managing expectations and embracing the initial struggles of the learning curve.

NOW LET US Related – Synthesis is harder than analysis

dev-tools

Synthesis is harder than analysis

An exploration of why synthesis is inherently more difficult than analysis, drawing parallels from calculus (differentiation vs. integration) to software engineering and incident response.

NOW LET US Related – MSI Center – How to gain SYSTEM privileges in seconds

dev-tools

MSI Center – How to gain SYSTEM privileges in seconds

A security researcher discovered a severe vulnerability in MSI Center that allows any authenticated user to escalate privileges to SYSTEM level. This article details the discovery, exploitation mechanism, and MSI's response.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.