Update list of available models and info

This commit is contained in:
ines 2017-04-26 16:03:41 +02:00
parent 5a470367df
commit c2006166d3
1 changed files with 24 additions and 37 deletions

View File

@ -13,14 +13,6 @@ p
| internal alias that tells spaCy where to find the data files for a specific
| model name.
+infobox("Important note")
| Due to improvements in the English lemmatizer in v1.7.0, you need to
| #[strong download the new English models]. The German model is still
| compatible. If you've trained statistical models that use spaCy's
| annotations, you should #[strong retrain your models after updating spaCy].
| If you don't retrain your models, you may suffer train/test skew, which
| might decrease your accuracy.
+aside-code("Quickstart").
# Install spaCy and download English model
pip install spacy
@ -31,43 +23,38 @@ p
nlp = spacy.load('en')
doc = nlp(u'This is a sentence.')
+infobox("Important note")
| Due to improvements in the English lemmatizer in v1.7.0, you need to
| #[strong download the new English models]. The German model is still
| compatible. If you've trained statistical models that use spaCy's
| annotations, you should #[strong retrain your models after updating spaCy].
| If you don't retrain your models, you may suffer train/test skew, which
| might decrease your accuracy.
+h(2, "available") Available models
+table(["Name", "Size", "Description"])
+row
+cell #[code en_core_web_sm]
+cell 50 MB
+cell Vocab, syntax, entities, word vectors #[+tag default]
+row
+cell #[code en_core_web_md]
+cell 1 GB
+cell Vocab, syntax, entities, word vectors
+row
+cell #[code en_depent_web_md]
+cell 328 MB
+cell Vocab, syntax, entities
+row
+cell #[code en_vectors_glove_md]
+cell 727 MB
+cell
| #[+a("http://nlp.stanford.edu/projects/glove/") GloVe] Common
| Crawl vectors
+row
+cell #[code de_core_news_md]
+cell 645 MB
+cell Vocab, syntax, entities, word vectors #[+tag default]
p
| Model differences are mostly statistical. In general, we do expect larger
| models to be "better" and more accurate overall. Ultimately, it depends on
| your use case and requirements, and we recommend starting with the default
| models (marked with a star below).
+aside
| Models are now available as #[code .tar.gz] archives #[+a(gh("spacy-models")) from GitHub],
| attached to individual releases. They can be downloaded and loaded manually,
| or using spaCy's #[code download] and #[code link] commands. All models
| follow the naming convention of #[code [language]_[type]_[genre]_[size]].
| #[br]#[br]
+button(gh("spacy-models") + "/releases", true, "primary") View models
+button(gh("spacy-models"), true, "primary").u-text-tag
| View model releases
+table(["Name", "Language", "Voc", "Dep", "Ent", "Vec", "Size", "License"])
+model-row("en_core_web_sm", "English", [1, 1, 1, 1], "50 MB", "CC BY-SA", true)
+model-row("en_core_web_md", "English", [1, 1, 1, 1], "1 GB", "CC BY-SA")
+model-row("en_depent_web_md", "English", [1, 1, 1, 0], "328 MB", "CC BY-SA")
+model-row("en_vectors_glove_md", "English", [0, 0, 0, 1], "727 MB", "CC BY-SA")
+model-row("de_core_news_md", "German", [1, 1, 1, 1], "645 MB", "CC BY-SA", true, true)
+h(2, "download") Downloading models