spaCy/website/docs/usage/word-vectors-similarities.jade

//- 💫 DOCS > USAGE > WORD VECTORS & SIMILARITIES

include ../../_includes/_mixins

p
    |  Dense, real valued vectors representing distributional similarity
    |  information are now a cornerstone of practical NLP. The most common way
    |  to train these vectors is the #[+a("https://en.wikipedia.org/wiki/Word2vec") word2vec]
    |  family of algorithms. The default
    |  #[+a("/docs/usage/models#available") English model] installs
    |  300-dimensional vectors trained on the Common Crawl
    |  corpus using the #[+a("http://nlp.stanford.edu/projects/glove/") GloVe]
    |  algorithm. The GloVe common crawl vectors have become a de facto
    |  standard for practical NLP.

+aside("Tip: Training a word2vec model")
    |  If you need to train a word2vec model, we recommend the implementation in
    |  the Python library #[+a("https://radimrehurek.com/gensim/") Gensim].

+h(2, "101") Similarity and word vectors 101
    +tag-model("vectors")

include _spacy-101/_similarity
include _spacy-101/_word-vectors

+h(2, "custom") Customising word vectors

+under-construction

p
    |  By default, #[+api("token#vector") #[code Token.vector]] returns the
    |  vector for its underlying #[+api("lexeme") #[code Lexeme]], while
    |  #[+api("doc#vector") #[code Doc.vector]] and
    |  #[+api("span#vector") #[code Span.vector]] return an average of the
    |  vectors of their tokens. You can customize these
    |  behaviours by modifying the #[code doc.user_hooks],
    |  #[code doc.user_span_hooks] and #[code doc.user_token_hooks]
    |  dictionaries.

+h(2, "similarity") Similarity

+under-construction
Update to new website 2016-10-31 18:04:15 +00:00			`//- 💫 DOCS > USAGE > WORD VECTORS & SIMILARITIES`

			`include ../../_includes/_mixins`

			`p`
			`\| Dense, real valued vectors representing distributional similarity`
			`\| information are now a cornerstone of practical NLP. The most common way`
			`\| to train these vectors is the #[+a("https://en.wikipedia.org/wiki/Word2vec") word2vec]`
Update word vectors & similarity workflow 2017-05-23 21:19:09 +00:00			`\| family of algorithms. The default`
			`\| #[+a("/docs/usage/models#available") English model] installs`
			`\| 300-dimensional vectors trained on the Common Crawl`
			`\| corpus using the #[+a("http://nlp.stanford.edu/projects/glove/") GloVe]`
			`\| algorithm. The GloVe common crawl vectors have become a de facto`
			`\| standard for practical NLP.`
Update to new website 2016-10-31 18:04:15 +00:00
Update word vectors & similarity workflow 2017-05-23 21:19:09 +00:00			`+aside("Tip: Training a word2vec model")`
Update to new website 2016-10-31 18:04:15 +00:00			`\| If you need to train a word2vec model, we recommend the implementation in`
			`\| the Python library #[+a("https://radimrehurek.com/gensim/") Gensim].`

Update word vectors & similarity workflow 2017-05-23 21:19:09 +00:00			`+h(2, "101") Similarity and word vectors 101`
			`+tag-model("vectors")`
Update to new website 2016-10-31 18:04:15 +00:00
Update word vectors & similarity workflow 2017-05-23 21:19:09 +00:00			`include _spacy-101/_similarity`
			`include _spacy-101/_word-vectors`
Update to new website 2016-10-31 18:04:15 +00:00
Update word vectors & similarity workflow 2017-05-23 21:19:09 +00:00			`+h(2, "custom") Customising word vectors`

Update usage docs and ddd "under construction" 2017-05-26 11:17:48 +00:00			`+under-construction`

Update word vectors & similarity workflow 2017-05-23 21:19:09 +00:00			`p`
			`\| By default, #[+api("token#vector") #[code Token.vector]] returns the`
			`\| vector for its underlying #[+api("lexeme") #[code Lexeme]], while`
			`\| #[+api("doc#vector") #[code Doc.vector]] and`
			`\| #[+api("span#vector") #[code Span.vector]] return an average of the`
Update usage and 101 docs 2017-05-26 10:46:29 +00:00			`\| vectors of their tokens. You can customize these`
Update to new website 2016-10-31 18:04:15 +00:00			`\| behaviours by modifying the #[code doc.user_hooks],`
			`\| #[code doc.user_span_hooks] and #[code doc.user_token_hooks]`
			`\| dictionaries.`

Update word vectors & similarity workflow 2017-05-23 21:19:09 +00:00			`+h(2, "similarity") Similarity`
Update usage docs and ddd "under construction" 2017-05-26 11:17:48 +00:00
			`+under-construction`