Skip to content

Latest commit

 

History

History
41 lines (36 loc) · 1.95 KB

bloom-filters.md

File metadata and controls

41 lines (36 loc) · 1.95 KB

Bloom filters

  • a data structure
  • usually implemented as a vector or array of bits
  • used to test whether an element is a member of a set
  • space efficient - use when it would not be practical to use a standard hash or set data structure
  • can have false positives, cannot have false negatives
  • when you query the structure for element X you get back one of
    • X is definitely not in the set
    • X is possibly in the set (can't be sure we don't have a false positive)
  • elements can be added to the set but not removed
    • if you use a counting filter then you can also remove elements from the structure
  • concieved by Howard Bloom in 1970
  • variations that store TTL values with each bit exist and are useful for very fast/large datastreams where a normal bloom filter would fill up and give all false positives too soon
  • bloom filters have a maximum on the number of things you can add - a "strict" variant prevents you from adding too many
  • usage ideas:
    • use a bloom filter as the first step in a search or a huge set of logs .e.g we have N directories of daily log files and we are looking for a particular name so maintain N bloom filters (one per day) that can be searched first and will weed out days that don't include the search term.

fewer than 10 bits per element are required for a 1% false positive probability, independent of the size or number of elements in the set

Implementations