NOW LET US – AI RAG SaaS Studio TP.HCM
NOW LET US
Digital Product Studio
Back to news
DEV-TOOLS...2 min read

Floyd's Sampling Algorithm

Share
NOW LET US Article – Floyd's Sampling Algorithm

An exploration of Robert Floyd's elegant sampling algorithm for generating random subsets, providing two distinct perspectives to understand its counter-intuitive logic.

Floyd's Sampling Algorithm

I love sampling algorithms. Here's the sampling algorithm that I find most magical. We want to generate a subset of {1, 2, ..., n}

of size k

.

def floyd(n, k):
s = set()
for i in range(n - k + 1, n + 1):
t = random.randint(1, i)
if t in s:
s.add(i)
else:
s.add(t)
return s

I learned about this algorithm the canonical way all good algorithm lore is imparted; one of my old coworkers wrote this comment in some code I was looking at once:

// Use Robert Floyd's algorithm to generate numSplits distinct integers
// between 0 and len(splitKeys), just because it's so cool!
chosen := make(map[int]struct{})
for j := len(splitKeys) - numSplits; j < len(splitKeys); j++ {
t := fit.rng.Intn(j + 1)
if _, alreadyChosen := chosen[t]; !alreadyChosen {
// Insert T.
chosen[t] = struct{}{}
} else {
// Insert J.
chosen[j] = struct{}{}
}
}

Unlike, say, Reservoir Sampling or something, this algorithm sort of resists an easy intuition to me. The branch makes it weird and makes a normal combinatorial, or visual argument tricky. In fact, I find even a casual intuition hard to grasp. Here are two possible ways to think about what makes it work.

First Perspective

Here's one way to think about it: each iteration of the algorithm goes from a uniformly distributed k

-subset of [1..n]

and an element of [1..n+1]

to a uniformly distributed k+1

-subset of [1..n+1]

. If we can show that the map from those k

-subsets and that sample is k+1

-to-1, then we've shown that the resulting set is also uniformly distributed.

So, we need to show that each possible k+1

-subset S

of [1..n+1]

has exactly k+1

inputs that map to it:

  • sets containing n+1

come from thek

-subset withoutn+1

, and any of the elements ofS

, and - sets not containing n+1

come from starting with one of itsk

-subsets and adding the missing element.

And that's it; the result is uniformly distributed.

This is not so bad, I think it basically makes sense to me. But I also learned of another way of thinking about this algorithm while trying to write this post.

Second Perspective

This one comes from Paul, and it notes a connection between this algorithm and the Fisher-Yates shuffle for permuting an array uniformly at random. Specifically, the "upward" variation that looks like this:

def shuffle(a):
for i in range(0, len(a)):
j = random.randint(0, i)
a[i], a[j] = a[j], a[i]

This builds up a permutation incrementally. After each step, we have a permutation on [0..i]

, then we augment that and turn it into a permutation on [0..i+1]

.

Imagine plucking off the last k

elements of an array we just shuffled, then that's a uniformly random sample of k

elements. The observation here is that Floyd's algorithm is just doing the last k

swaps here, directly.

If you call the last k

elements of the array the "tail," then for each element i

in the tail originally, either we:

  • swap i

withj

that's in the tail, in which casei

will stay in the tail forever and be part of the sample, or - swap it with j

not in the tail, thenj

will live in the tail forever and be part of the sample.

These two cases correspond directly to the two branches of Floyd's algorithm and we wind up with the same sample following them both (up to some reordering).

© 2026 Now Let Us. All rights reserved.

Source: Hacker News

Advertisement
Ad slot ready: 5887729102

More in this category

NOW LET US Related – Swift at Apple: Migrating the TrueType hinting interpreter

dev-tools

Swift at Apple: Migrating the TrueType hinting interpreter

Apple has rewritten its TrueType hinting interpreter from C to memory-safe Swift for its Fall 2025 OS releases, improving security and boosting performance by an average of 13%.

NOW LET US Related – Where Did Earth Get Its Oceans? Maybe It Made Them Itself

dev-tools

Where Did Earth Get Its Oceans? Maybe It Made Them Itself

For decades, scientists believed Earth's water was delivered by comets or asteroids. However, new research and space missions suggest our planet might have manufactured its own oceans through a mix of magma and hydrogen.

NOW LET US Related – Digital Sovereignty Becomes an Imperative as the US Reads Dutch Emails

dev-tools

Digital Sovereignty Becomes an Imperative as the US Reads Dutch Emails

The reported access of Dutch officials' emails by the U.S. House of Representatives highlights the critical difference between data residency and true digital sovereignty. It underscores why nations must secure legal and operational control over their data, moving beyond mere local storage promises.

NOW LET US Related – Removing 'um' from a recording is harder than it sounds

dev-tools

Removing 'um' from a recording is harder than it sounds

Removing filler words like 'um' and 'uh' from audio recordings is surprisingly difficult due to audio artifacts and AI limitations. The open-source tool 'erm' solves this by combining Whisper with advanced digital signal processing techniques.

NOW LET US Related – If you are asking for human attention, demonstrate human effort

dev-tools

If you are asking for human attention, demonstrate human effort

As AI-generated content floods the workplace, a new etiquette dilemma emerges. This article highlights a crucial principle for modern collaboration: if you want to request human attention, you must first demonstrate human effort.

NOW LET US Related – Raspberry Pi 5 – 16GB RAM

dev-tools

Raspberry Pi 5 – 16GB RAM

The Raspberry Pi 5 features a massive upgrade with a 2.4GHz quad-core processor, up to 16GB of RAM, and in-house silicon for vastly improved I/O performance.

EXPLORE TOPICS

Discover All Categories

Deep dive into the specific technology sectors that matter most to you.