ThisPlusThat.me



Christopher Moody

Standard natural language 
 isn't great for searching

But the state of the art 

can do amazing things!


King - Man + Woman = Queen


With these new techniques,

you can search for a movie that's more


 thoughtful


Or find a movie that is less



 depressing

  • Wikipedia
  • IMDB
  • NYTimes
  • Usenet



Map-reduce to preprocess & transform the text into n-grams



word2vec trains a neural network, assigning a vector to every word




NumPy, Cython Numba & Numexpr

for high-performance vector math




 

Christopher Moody



ThisPlusThat.me


word2vec uses a neural network to assign a vector to every word

Rows are a single word
Columns are a 'similarity dimension'
Nearby words have similar vectors.


The neural network estimates the probability that every word in the vocabulary appears around the training word.

Check the sentence to see if the word appears nearby, then backpropagate. 






ThisPlusThat.me

By Christopher Moody

ThisPlusThat.me

  • 10,930