More Uses of Hashes

  • Review of hashes
  • Other ways we use hashes
  • Other ways the world uses hashes

Hash Properties

  • Consistent
  • One-way
  • Unlikely to Collide
  • Difficult to Predict
  • Uniform
hash("password") = "5f4dcc3b5aa765d61d8327deb882cf99"

hash("password") = "5f4dcc3b5aa765d61d8327deb882cf99"

hash("Password") = "dc647eb65e6711e155375218212b3964"

hash("abc123##") = "01f41f6f596cf299400cbc4eea0b5213"

Ways We Use Hashes

Change Detection

Problem: need an easy way to detect changes made to an object over time with low overhead.

  1. Serialize data (e.g. "num=100012&date=8/21/2015")
  2. Take the hash (e.g. "682b1e7fcb969d637c117ac3c7ab6")
  3. Store the hash
  4. ... Later ...
  5. Re-serialize data, take hash again
  6. Compare to original hash:
    1. If different, values definitely changed
    2. If same, values probably didn't change

Dirty Navigation Detection

Send Concurrency Detection

Proposed: CI2 Tray Ticket Menu Change Detection

Set and Dictionary Lookups

Problem: need a fast and efficient way to (1) determine if object is in a collection (Set), and (2) retrieve value associated with a key (Dictionary)

Initialize "Zoo" Set:

Array [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
       0   1   2   3   4   5   6   7

Add "Lion":
 1) Take hash("Lion"): "60d28e7d879c0dc48b9a593468cf11e5"
 2) Put "Lion" in array at position hash MOD length: 5
Array [ ] [ ] [ ] [ ] [ ] [L] [ ] [ ]
       0   1   2   3   4   5   6   7

Add "Tiger":
 1) Take hash("Tiger"): "454c9843110686bf6f67ce5115b66617"
 2) Put "Lion" in array at position hash MOD length: 7
Array [ ] [ ] [ ] [ ] [ ] [L] [ ] [T]
       0   1   2   3   4   5   6   7

Check "Lion":
 1) Take hash("Lion"): "60d28e7d879c0dc48b9a593468cf11e5"
 2) Look for "Lion" in bucket at position hash MOD length: 5
Array [ ] [ ] [ ] [ ] [ ] [L] [ ] [T]
       0   1   2   3   4   5   6   7
 3) Got it, "Lion" is in Zoo

Check "Bear":
 1) Take hash("Bear"): "372137ebb0d053fecd7a594ec5cb5971"
 2) Look for "Lion" in bucket at position hash MOD length: 1
Array [ ] [ ] [ ] [ ] [ ] [L] [ ] [T]
       0   1   2   3   4   5   6   7
 3) Not there, "Bear" is not in Zoo

(.NET does all this with the HashSet and Dictionary Classes)

We use HashSets and Dictionaries all the time.

Entity Framework collection navigation properties are HashSets.

Dictionaries are easy ways to look up and use key-value pairs

JavaScript Objects are implemented as HashMaps (Dictionaries)

Other Ways the World Uses Hashes

Bloom Filters

Problem: I need a low-overhead way of knowing if I have an object in a collection, no need to be able to retrieve data.

Initialize "Zoo" Bloom Filter:

Bits [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
      0   1   2   3   4   5   6   7

Add "Lion":
 1) Take hash("Lion"): "60d28e7d879c0dc48b9a593468cf11e5"
 2) Set bit at position hash MOD length: 5
Bits [ ] [ ] [ ] [ ] [ ] [1] [ ] [ ]
      0   1   2   3   4   5   6   7

Add "Tiger":
 1) Take hash("Tiger"): "454c9843110686bf6f67ce5115b66617"
 2) Set bit at position hash MOD length: 7
Bits [ ] [ ] [ ] [ ] [ ] [1] [ ] [1]
      0   1   2   3   4   5   6   7

Check "Lion":
 1) Take hash("Lion"): "60d28e7d879c0dc48b9a593468cf11e5"
 2) Check bit at position hash MOD length: 5
Bits [ ] [ ] [ ] [ ] [ ] [1] [ ] [1]
      0   1   2   3   4   5   6   7
 3) Got it, "Lion" is *probably* in Zoo

Check "Bear":
 1) Take hash("Bear"): "372137ebb0d053fecd7a594ec5cb5971"
 2) Check bit at position hash MOD length: 5
Bits [ ] [ ] [ ] [ ] [ ] [1] [ ] [1]
      0   1   2   3   4   5   6   7
 3) Not There, "Bear" is *definitely* not in Zoo

Recap

  • Allows false positives
  • Disallows false negatives
  • No ability to remove item
  • No ability to retrieve a list of items
  • Low memory footprint

 

So how is this useful?

Message Authenticaion

Problem: I need a way to verify that the data I got from somebody actually came from that person, no need to keep data secret

  • Alice and Bob agree on a secret password of "password"

  • Alice sends Bob a message "Meet me at 10:00 PM tonight"

  • Alice also includes a hash of the data:

    • hash("password"+"Meet me at 10:00 PM tonight") = "357e1b7c5fc1645e45d90faba97adfba"

  • Bob gets the message and is able to verify it using the password:

    • hash("password"+"Meet me at 10:00 PM tonight") = "357e1b7c5fc1645e45d90faba97adfba"

  • Even if Trudy intercepts the message and is able to alter it, she cannot recreate the hash properly without the password, so Bob knows it wasn't altered.

Imprecise Matching

Problem: I need a way to find similar items.

Audio Hashing

  • Variable length
  • Imprecise: based on major components of audio stream
  • Comparison then occurs with rough matches against hash
  • Effective regardless of volume, background noise, etc.
  • Fast!!
  • False positives...

Or use a similar technique on text for similarity / plagiarism checking

Or encode geometric coordinates for fast radius checking

Overall

  • The properties of hashes can be very useful: the ability to represent a lot of data in a little space.
  • However, there is always the risk of collisions, so be careful how you use them!

More Uses for Hashes

By seanm

More Uses for Hashes

  • 949