ThisPlusThat.me
Christopher Moody
Standard natural language
isn't great for searching
But the state of the art
can do amazing things!
King - Man + Woman = Queen
With these new techniques,
you can search for a movie that's more
thoughtful
Or find a movie that is less
depressing
- Wikipedia
- IMDB
- NYTimes
- Usenet
Map-reduce to preprocess & transform the text into n-grams
word2vec trains a neural network, assigning a vector to every word
NumPy, Cython,
Numba & Numexpr
for high-performance vector math
Christopher Moody
ThisPlusThat.me
word2vec uses a neural network to assign a vector to every word
Rows are a single word
Columns are a 'similarity dimension'
Nearby words have similar vectors.
The neural network estimates the probability that every word in the vocabulary appears around the training word.
Check the sentence to see if the word appears nearby, then backpropagate.