Bloom filter playground
Insert items, watch bits flip; query items, watch the false-positive rate trace the theoretical curve (1 − e^(−kn/m))^k.
What's actually happening:
A bloom filter is a fixed-size bit array (m bits) with k hash functions. Insert sets the k bit-positions an item hashes to. Query says "true if all k bits are set" — false negatives are impossible (you only ever turn bits on), but false positives are, because two items can hash to overlapping positions.
That trade is the whole point: it's a set-membership oracle that uses ~10 bits per item for ~1% FP rate, regardless of how big the items are.
Try this:
- 1Type something into INSERT and submit. Watch the
k=5highlighted bits flip on. - 2Click + 100 a few times to fill it up. The grid lights up;
FP·THEOclimbs. - 3Drag K back and forth. Watch the FP-curve marker move and find the dotted vertical at
k* = (m/n) ln 2— that's the optimum. - 4Query a string you never inserted. If all
kbits already happen to be set, you'll get a false positive — and the bits-checked highlight shows you exactly which collisions caused it.
Where this lives in production:
- Cassandra / LevelDB / RocksDB — skip SSTable lookups for keys that aren't there
- Chrome Safe Browsing — fast pre-check before fetching the full malicious-URL list
- CDN / cache layers — "have we seen this URL before?" without storing every URL
- Bitcoin SPV clients — query peer nodes for transactions of interest without revealing them
The double-hashing trick (idx_i = h1 + i × h2 mod m) means we get k hashes for the cost of two — Kirsch & Mitzenmacher 2006.
→ Set K below K* and the FP rate climbs because too few bits are set per item; set it above and FP climbs again because every query checks more bits, all of which are likely set. The minimum is the sweet spot.