hash("password") = "5f4dcc3b5aa765d61d8327deb882cf99"
hash("password") = "5f4dcc3b5aa765d61d8327deb882cf99"
hash("Password") = "dc647eb65e6711e155375218212b3964"
hash("abc123##") = "01f41f6f596cf299400cbc4eea0b5213"
Problem: need an easy way to detect changes made to an object over time with low overhead.
Problem: need a fast and efficient way to (1) determine if object is in a collection (Set), and (2) retrieve value associated with a key (Dictionary)
Initialize "Zoo" Set:
Array [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
0 1 2 3 4 5 6 7
Add "Lion":
1) Take hash("Lion"): "60d28e7d879c0dc48b9a593468cf11e5"
2) Put "Lion" in array at position hash MOD length: 5
Array [ ] [ ] [ ] [ ] [ ] [L] [ ] [ ]
0 1 2 3 4 5 6 7
Add "Tiger":
1) Take hash("Tiger"): "454c9843110686bf6f67ce5115b66617"
2) Put "Lion" in array at position hash MOD length: 7
Array [ ] [ ] [ ] [ ] [ ] [L] [ ] [T]
0 1 2 3 4 5 6 7
Check "Lion":
1) Take hash("Lion"): "60d28e7d879c0dc48b9a593468cf11e5"
2) Look for "Lion" in bucket at position hash MOD length: 5
Array [ ] [ ] [ ] [ ] [ ] [L] [ ] [T]
0 1 2 3 4 5 6 7
3) Got it, "Lion" is in Zoo
Check "Bear":
1) Take hash("Bear"): "372137ebb0d053fecd7a594ec5cb5971"
2) Look for "Lion" in bucket at position hash MOD length: 1
Array [ ] [ ] [ ] [ ] [ ] [L] [ ] [T]
0 1 2 3 4 5 6 7
3) Not there, "Bear" is not in Zoo
(.NET does all this with the HashSet and Dictionary Classes)
Problem: I need a low-overhead way of knowing if I have an object in a collection, no need to be able to retrieve data.
Initialize "Zoo" Bloom Filter:
Bits [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ]
0 1 2 3 4 5 6 7
Add "Lion":
1) Take hash("Lion"): "60d28e7d879c0dc48b9a593468cf11e5"
2) Set bit at position hash MOD length: 5
Bits [ ] [ ] [ ] [ ] [ ] [1] [ ] [ ]
0 1 2 3 4 5 6 7
Add "Tiger":
1) Take hash("Tiger"): "454c9843110686bf6f67ce5115b66617"
2) Set bit at position hash MOD length: 7
Bits [ ] [ ] [ ] [ ] [ ] [1] [ ] [1]
0 1 2 3 4 5 6 7
Check "Lion":
1) Take hash("Lion"): "60d28e7d879c0dc48b9a593468cf11e5"
2) Check bit at position hash MOD length: 5
Bits [ ] [ ] [ ] [ ] [ ] [1] [ ] [1]
0 1 2 3 4 5 6 7
3) Got it, "Lion" is *probably* in Zoo
Check "Bear":
1) Take hash("Bear"): "372137ebb0d053fecd7a594ec5cb5971"
2) Check bit at position hash MOD length: 5
Bits [ ] [ ] [ ] [ ] [ ] [1] [ ] [1]
0 1 2 3 4 5 6 7
3) Not There, "Bear" is *definitely* not in Zoo
So how is this useful?
Problem: I need a way to verify that the data I got from somebody actually came from that person, no need to keep data secret
Alice and Bob agree on a secret password of "password"
Alice sends Bob a message "Meet me at 10:00 PM tonight"
Alice also includes a hash of the data:
hash("password"+"Meet me at 10:00 PM tonight") = "357e1b7c5fc1645e45d90faba97adfba"
Bob gets the message and is able to verify it using the password:
hash("password"+"Meet me at 10:00 PM tonight") = "357e1b7c5fc1645e45d90faba97adfba"
Even if Trudy intercepts the message and is able to alter it, she cannot recreate the hash properly without the password, so Bob knows it wasn't altered.
Problem: I need a way to find similar items.