NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...6 min read

An NSFW Filter for Marginalia Search

Share
NOW LET US Article – An NSFW Filter for Marginalia Search

The developer of Marginalia Search shares the technical journey of building a fast, CPU-efficient NSFW filter, moving from basic domain lists to a custom neural network.

… optional, that is.

I’ve been working on an NSFW filter for Marginalia Search, as that is something some people have asked for, primarily API consumers.

The search engine has had some domain based filtering for a while, based on the UT1 lists, but that isn’t a very comprehensive approach.

We’ll land on a single hidden layer neural network approach, implemented from scratch, but before landing on that, many other things were tried along the way.

This is largely an abbreviated account of the way there.

There is a tension between speed and generality in classification.

Building something that is both fast and reasonably correct in its assessments is incredibly fiddly work, even if the solution itself is often pretty straightforward.

The main limiting constraint for a filter that runs in a search engine is that it needs to be really fast and run well on CPUs.

This immediately disqualifies transformer-based models and other state-of-the art approaches, capable as they are they check neither of those boxes.

Fasttext

One of the early stabs of the problems I tried was using fasttext, which is a classifier library from “Facebook, Inc.” (back when they were).

It’s got a few years on its neck, but it’s well named in that it really is fast. The search engine already uses it for language identification, so no new dependencies! Worth a try at least.

Problem with training a classifier is that you need sample data, and kind of a lot of it. Thankfully, finding candidate samples is easy enough when you run a search engine. You can just search for them! Hook up a little script to the API and search for all manner of depravity, save the results for labeling.

Training Data

To get a filter that is half way decent we need tens of thousands of samples, and as exciting as manually labelling them sounds, I can’t help but feel there are better ways to spend a couple of weeks.

While NSFW sample sets nominally do exist, fast classifiers are very sensitive to the context and shape of the data.

Training a classifier on reddit comments would result in a reddit comment classifier, and as such would produce dismal results when fed search results.

We can’t use state of the art techniques as the classifier, but that doesn't mean we can't use them to label sample data, and then use that sample data to train a faster and simpler model.

The generative aspects of LLMs have largely overshadowed how good they are at unsupervised classification tasks, which is a decidedly less glitzy problem domain.

Open source, self-hostable models are more than capable enough for this task, so ollama and qwen3.5 makes a compelling all open-source pipeline that can be run on relatively modest consumer hardware.

  • Run search queries.
  • Pass the results to ollama / qwen 3.5 with instructions to output an NSFW classification
  • Annotate the search results with the label SAFE or NSFW

This is a slow process, measured in seconds per sample rather than the other way around, but doesn’t need any actual human attention can easily be left to cook for a few days.

The results are probably on par with what I’d expect from a human, especially given most humans would be pretty fatigued 10,000 or so labeling decisions in.

There are many ambiguous cases where a sample could be labeled either NSFW or SAFE depending on who is reviewing it, there’s no getting away from that, but it seemed pretty consistent and made reasonable judgements, certainly good enough for the task.

Evaluation

Thus I’d gathered about 10K samples, roughly 60/40 split between NSFW and SAFE, fed it through fasttext to output a model and, the results were kinda miserable.

The reason it wasn’t doing a good job was that the training samples were skewed toward documents that were NSFW-adjacent, as we’d gathered them by searching for NSFW queries, they all contained search terms that were associated with NSFW content, even if they were not always NSFW in themselves.

Fast classifiers are sensitive to stuff like this, and when shown a broader set of search results, they generated a ton of false positives based on noise in the data.

At this point I could have probably grabbed a document database file from one of the search engine partitions, and ran the qwen classifier over all the ~125M records, and used that instead, but between the fact that this would have taken approximately 20 years of constant coil whine, insane electricity bills, and way more BTUs than my ventilation can handle, this idea was dismissed as impractical.

Problem is that actual NSFW content is relatively rare, so using a representative sample is extremely expensive with how slow the qwen classifier is on consumer GPUs. It’s doable up to order of 100K samples, but then that’s pretty small given the low base rate.

The Neural Network

My assumption was that fasttext was picking up irrelevant features in the noise of the data.

Can we focus the classifier by fixing the features?

I’m going to be honest a big part of this following scheme was inspired by the unreasonable success of the naive recipe detector the search engine uses.

The plan is something like to pick out terms that seem relevant to separating the wheat from the chaff using human eyeballing, then build a classifier model based on those handpicked features.

This means first looking for NSFW terms, easy as just grabbing the term frequency list on the NSFW samples and picking the ones that appear in NSFW contexts.

But we also want terms that would put NSFW terms in an SFW context.

| feature | disambiguated by | |---|---| | cum | laude | | balls | golf, basket | | anal | cancer, fissure, gland | | sex | change, education |

Some of these are funny, but disambiguating terms like ‘gay’ or ’lesbian’ is an actual concern, as the filter could easily turn into an inadvertent erasure machine.

You can get a pretty long way on just making educated guesses, but the list of disambiguating terms can be further refined by doing a chi^2 scoring of the term frequencies of the terms that coincide with each feature, to find disambiguating features. We might find ’escort’ to be a feature that captures escort service spam, but SAFE samples that contain the word ’escort’ often also contain ‘ford’ or ‘destroyer’, so we add those terms too. Rinse and repeat as we approximate the circle.

Next challenge is to build a classifier that allows a hand picked feature set. Fasttext does not. This is a job that a basic neural network should theoretically do a pretty good job at.

Conceptually, it’s just binary input signals matching features -> (math) -> probability of NSFW.

After trying some things in python, it seemed that the simplest approach that did a good job for this was a single hidden layer neural network. This is also easy enough to implement.

The math in machine learning looks daunting as there’s a lot of dense jargon and partial derivatives, but there’s a Scooby-Doo reveal to be had in that underneath the yeti mask it’s just the sort of algebra and multivariate calculus most STEM students will have learned in their first year, with some basic linear algebra terminology that doesn’t strictly add much understanding.

I will gloss over the implementation details here and set up the equations and derive the math in Appendix B instead.

All said and done, this model performed better. Saving some percentage of training data for evaluations, false positive and false negatives were about 10-15%, which looks pretty good, but to be honest, this is the same sort of figures that fasttext’s evaluations were claiming as well.

The whole point of the exercise is to get around the low base rate problem. The real test is running the classifier on real data.

So another script was built, one that grabs search result metadata, labels them with the new classifier, and then verifies the label using ollama+qwen and then saves the labeled “search result” as more training data.

The results were better. There were a lot of f

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – GLM 5.2 Is Out

dev-tools

GLM 5.2 Is Out

Zhipu AI has officially released GLM-5.2, its most powerful open-source model to date, featuring a 1M context window and advanced long-horizon task capabilities. The release underscores Zhipu's commitment to open-source AI and global scientific collaboration amid rising technological restrictions.

NOW LET US Related – Noise infusion banned from statistical products published by Census Bureau

dev-tools

Noise infusion banned from statistical products published by Census Bureau

The U.S. Department of Commerce has banned "noise infusion" from statistical products published by the Census Bureau, a decision that could have severe consequences for both data utility and privacy protection.

NOW LET US Related – Treating pancreatic tumours may have revealed cancer's master switch

dev-tools

Treating pancreatic tumours may have revealed cancer's master switch

A promising new drug called daraxonrasib has shown breakthrough results in treating pancreatic cancer, doubling median survival times. This achievement could pave the way for an entirely new class of cancer treatments.

NOW LET US Related – Every Frame Perfect

dev-tools

Every Frame Perfect

In UI design, perfection isn't just about the start and end states, but every single transition frame in between. Polishing these micro-interactions is key to building user trust.

NOW LET US Related – Leaving Mozilla

dev-tools

Leaving Mozilla

A poignant and candid reflection from a 15-year Mozilla veteran upon their departure. The author highlights the leadership's missteps in trying to emulate tech giants and urges Mozilla to return to its core values: community and uniqueness.

NOW LET US Related – Shepherd's Dog: A Game by the Most Dangerous AI Model

dev-tools

Shepherd's Dog: A Game by the Most Dangerous AI Model

A developer tested Anthropic's latest, supposedly 'too dangerous' AI model by asking it to build a long-held game idea in a single shot. The model succeeded, generating a complete 2,319-line game after a 45-minute reasoning session.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.