NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...3 min read

Miasma: A tool to trap AI web scrapers in an endless poison pit

Share
NOW LET US Article – Miasma: A tool to trap AI web scrapers in an endless poison pit

Miasma is an open-source tool designed to combat unauthorized AI web scraping by trapping bots in an endless loop of poisoned data.

AI companies continually scrape the internet at an enormous scale, swallowing up all of its contents to use as training data for their next models. If you have a public website, they are already stealing your work.

Miasma is here to help you fight back! Spin up the server and point any malicious traffic towards it. Miasma will send poisoned training data from the poison fountain alongside multiple self-referential links. It's an endless buffet of slop for the slop machines.

Miasma is very fast and has a minimal memory footprint - you should not have to waste compute resources fending off the internet's leeches.

Install with cargo (recommended):

cargo install miasma

Or, download a pre-built binary from releases.

Start Miasma with default configuration:

miasma

View all available configuration options:

miasma --help

Let's walk through an example of setting up a server to trap scrapers with Miasma. We'll pick /bots

as our server's path to direct scraper traffic. We'll be using Nginx as our server's reverse proxy, but the same result can be achieved with many different setups.

When we're done, scrapers will be trapped like so:

Embedding Hidden Links

Within our site, we'll include a few hidden links leading to /bots

.

<a href="/bots" style="display: none;" aria-hidden="true" tabindex="1">
Amazing high quality data here!
</a>

The style="display: none;"

, aria-hidden="true"

, and tabindex="1"

attributes ensure links are totally invisible to human visitors and will be ignored by screen readers and keyboard navigation. They will only be visible to scrapers.

Since our hidden links point to /bots

, we'll configure this path to proxy Miasma. Let's assume we're running Miasma on port 9855

.

location ~ ^/bots($|/.*)$ {
proxy_pass http://localhost:9855;
}

This will match all variations of the /bots

path -> /bots

, /bots/

, /bots/12345

, etc.

Lastly, we'll start Miasma and specify /bots

as the link prefix. This instructs Miasma to start links with /bots/

, which ensures scrapers are properly routed through our Nginx proxy back to Miasma.

We'll also limit the number of max in-flight connections to 50. At 50 connections, we can expect 50-60 MB peak memory usage. Note that any requests exceeding this limit will immediately receive a 429 response rather than being added to a queue.

miasma --link-prefix '/bots' -p 9855 -c 50

Let's deploy and watch as multi-billion dollar companies greedily eat from our endless slop machine!

Be sure to protect friendly bots and search engines from Miasma in your robots.txt

!

User-agent: Googlebot
User-agent: Bingbot
User-agent: DuckDuckBot
User-agent: Slurp
User-agent: SomeOtherNiceBot
Disallow: /bots
Allow: /

Miasma can be configured via its CLI options:

| Option | Default | Description | |---|---|---| port | 9999 | The port the server should bind to. | host | localhost | The host address the server should bind to. | max-in-flight | 500 | Maximum number of allowable in-flight requests. Requests received when in flight is exceeded will receive a 429 response. Miasma's memory usage scales directly with the number of in-flight requests - set this to a lower value if memory usage is a concern. | link-prefix | / | Prefix for self-directing links. This should be the path where you host Miasma, e.g. /bots . | link-count | 5 | Number of self-directing links to include in each response page. | force-gzip | false | Always gzip responses regardless of the client's Accept-Encoding header. Forcing compression can help reduce egress costs. | poison-source | https://rnsaffn.com/poison2/ | Proxy source for poisoned training data. |

Contributions are welcome! Please open an issue for bugs reports or feature requests. Primarily AI-generated contributions will be automatically rejected.

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Swift at Apple: Migrating the TrueType hinting interpreter

dev-tools

Swift at Apple: Migrating the TrueType hinting interpreter

Apple has rewritten its TrueType hinting interpreter from C to memory-safe Swift for its Fall 2025 OS releases, improving security and boosting performance by an average of 13%.

NOW LET US Related – Where Did Earth Get Its Oceans? Maybe It Made Them Itself

dev-tools

Where Did Earth Get Its Oceans? Maybe It Made Them Itself

For decades, scientists believed Earth's water was delivered by comets or asteroids. However, new research and space missions suggest our planet might have manufactured its own oceans through a mix of magma and hydrogen.

NOW LET US Related – Digital Sovereignty Becomes an Imperative as the US Reads Dutch Emails

dev-tools

Digital Sovereignty Becomes an Imperative as the US Reads Dutch Emails

The reported access of Dutch officials' emails by the U.S. House of Representatives highlights the critical difference between data residency and true digital sovereignty. It underscores why nations must secure legal and operational control over their data, moving beyond mere local storage promises.

NOW LET US Related – Removing 'um' from a recording is harder than it sounds

dev-tools

Removing 'um' from a recording is harder than it sounds

Removing filler words like 'um' and 'uh' from audio recordings is surprisingly difficult due to audio artifacts and AI limitations. The open-source tool 'erm' solves this by combining Whisper with advanced digital signal processing techniques.

NOW LET US Related – If you are asking for human attention, demonstrate human effort

dev-tools

If you are asking for human attention, demonstrate human effort

As AI-generated content floods the workplace, a new etiquette dilemma emerges. This article highlights a crucial principle for modern collaboration: if you want to request human attention, you must first demonstrate human effort.

NOW LET US Related – Raspberry Pi 5 – 16GB RAM

dev-tools

Raspberry Pi 5 – 16GB RAM

The Raspberry Pi 5 features a massive upgrade with a 2.4GHz quad-core processor, up to 16GB of RAM, and in-house silicon for vastly improved I/O performance.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.