How Do You Find an Illegal Image Without Looking at It?

With over 60 million reports of child abuse material annually, manual review is impossible. This article explores perceptual hashing, the engineering foundation that detects illegal content at scale while preserving user privacy.

61.8 Million Files. One Year.

Child Sexual Abuse Material — known as CSAM — is any image or video depicting the sexual abuse or exploitation of a child. It is not abstract. Every file is a record of a real crime against a real child. And when that file is shared, the child is victimized again.

In 2025, the National Center for Missing & Exploited Children (NCMEC) received 21.3 million reports of suspected child sexual exploitation. Those reports contained 61.8 million images, videos, and files.

If a human analyst spent one second on each file, it would take nearly two years of unbroken work. No sleep. No meals. Just file after file.

Over 1.5 million of those reports involved generative AI. Some of this material depicts entirely fictional children. But a growing share is generated using the likenesses of real, identifiable children — children who have never suffered contact abuse, but who are now victims nonetheless. And all of it — real or synthetic — floods into the same investigation pipeline, where human analysts must treat every image as potentially depicting a real child in danger.

No human can review this volume. No one should have to.

Can a machine recognize an abusive image without understanding what it's looking at? The answer depends on whether the material has been seen before.

Source: NCMEC 2025 Impact Snapshot

This Is Bigger Than You Think

Child safety is a much bigger problem than CSAM detection alone. It includes online grooming (1.4 million enticement reports in 2025 — up 156%), sextortion, child sex trafficking, victim identification, and survivor support. Each is a discipline with its own tools and challenges.

This essay focuses on one critical piece: how machines detect abuse material at internet scale. This is the engineering foundation. Without detection, there is nothing to report, nothing to block, no one to rescue.

But before we look at any technology, there is something more fundamental to understand.

Privacy, dignity, and the constraint that shapes everything

Every person who uploads a photo online — a family picture, a selfie, a sunset — has a right to privacy. Detection systems scan billions of these images. The overwhelming majority are completely innocent. The system must protect children without violating everyone else's privacy in the process.

This is the design constraint that shapes every technology in this essay. Perceptual hashing — originally developed for content identification and duplicate detection — turns out to have a remarkable property: no human or machine ever needs to see the original image. The image is reduced to a short numerical fingerprint — a string of bits that cannot be reversed back into the image. Only the fingerprint is compared. Only the fingerprint travels through the system.

An analogy: imagine airport security could check your luggage by scanning its weight and density profile, without ever opening it. If the profile matches a known threat signature, the bag is flagged. If it doesn't match — which is 99.99% of the time — no one ever looks inside. Your privacy is preserved not because the system trusts you, but because the system was designed to never need to look.

This is why perceptual hashing became the foundation of CSAM detection. It is the only approach where the privacy of billions of innocent users and the safety of exploited children are not in fundamental tension. The system works because it doesn't look.

The Image You've Seen vs. The One You Haven't

There are two fundamentally different kinds of abuse material, and they require different technologies to detect. Understanding this distinction is the most important concept in child safety engineering.

Material that has been seen before. Reported, verified by a human expert, and fingerprinted. A database of digital mugshots.

Detection: perceptual hashing. Compute a fingerprint of the new image, compare it against the database. If it matches — the image is known CSAM.

Material that has never been seen. Newly produced abuse. AI-generated content. First-time uploads. Nothing in any database matches it.

Detection: machine learning classifiers. A model trained to recognize what abuse looks like — not by matching, but by learning patterns.

Most real-world systems combine both. Hashing catches the known material cheaply. Classifiers catch the unknown at higher cost. Together, they form a layered defense.

256 Bits That Describe Any Image

A photograph is made of millions of colored dots — pixels. A computer sees it as a giant grid of numbers: the exact red, green, and blue intensity of every single dot. A typical 12-megapixel photo is about 36 million numbers.

Now imagine you could describe that photo using just 256 ones and zeroes. Not 36 million numbers. Two hundred and fifty-six. These 256 bits wouldn't tell you the color of any specific pixel. Instead, they'd capture something deeper: the image's structure. The overall pattern of light and dark. Where the big shapes are. Whether the top half is brighter than the bottom. Whether there are strong horizontal lines or mostly smooth gradients.

That's what a perceptual hash is. It's a compact description of what an image looks like to a human eye — its visual essence — stripped of every detail that doesn't matter for recognition.

And here's the critical property: if you resize the photo, those 256 bits barely change. Add a watermark? Barely change. Screenshot it, re-save it as JPEG at lower quality, crop the edges slightly? The bits stay almost the same. Because the visual essence — the arrangement of light, dark, shapes, and structure — survives these transformations.

This is the opposite of how cryptographic hashes work. A cryptographic hash like SHA-256 is designed so that changing a single pixel produces a completely different output — the "avalanche effect." A perceptual hash is designed so that visually similar images produce similar outputs. Two images that look the same to your eye will have hashes that are nearly identical, even if every single pixel value is technically different.

This means we can compare these short descriptions against a database of known abuse material without ever storing, transmitting, or viewing the actual images. Only the 256 bits — 32 bytes — travel through the system. The image itself never leaves the device where it was scanned.

The algorithms

Several perceptual hashing algorithms exist, each making different tradeoffs between hash size, robustness, and computational cost. The industry standard for CSAM detection is PhotoDNA (Microsoft, 2009) — it produces a 144-byte hash using gradient analysis and is deployed at Facebook, Google, Twitter, and most major platforms. But it's proprietary: you need a license from the Technology Coalition to use it. The most widely deployed open-source alternative is PDQ (Meta, 2019) — a 32-byte hash based on frequency analysis.

Source: Hacker News