mirror of https://github.com/explosion/spaCy.git
Update list of available models and info
This commit is contained in:
parent
5a470367df
commit
c2006166d3
|
@ -13,14 +13,6 @@ p
|
|||
| internal alias that tells spaCy where to find the data files for a specific
|
||||
| model name.
|
||||
|
||||
+infobox("Important note")
|
||||
| Due to improvements in the English lemmatizer in v1.7.0, you need to
|
||||
| #[strong download the new English models]. The German model is still
|
||||
| compatible. If you've trained statistical models that use spaCy's
|
||||
| annotations, you should #[strong retrain your models after updating spaCy].
|
||||
| If you don't retrain your models, you may suffer train/test skew, which
|
||||
| might decrease your accuracy.
|
||||
|
||||
+aside-code("Quickstart").
|
||||
# Install spaCy and download English model
|
||||
pip install spacy
|
||||
|
@ -31,43 +23,38 @@ p
|
|||
nlp = spacy.load('en')
|
||||
doc = nlp(u'This is a sentence.')
|
||||
|
||||
+infobox("Important note")
|
||||
| Due to improvements in the English lemmatizer in v1.7.0, you need to
|
||||
| #[strong download the new English models]. The German model is still
|
||||
| compatible. If you've trained statistical models that use spaCy's
|
||||
| annotations, you should #[strong retrain your models after updating spaCy].
|
||||
| If you don't retrain your models, you may suffer train/test skew, which
|
||||
| might decrease your accuracy.
|
||||
|
||||
+h(2, "available") Available models
|
||||
|
||||
+table(["Name", "Size", "Description"])
|
||||
+row
|
||||
+cell #[code en_core_web_sm]
|
||||
+cell 50 MB
|
||||
+cell Vocab, syntax, entities, word vectors #[+tag default]
|
||||
|
||||
+row
|
||||
+cell #[code en_core_web_md]
|
||||
+cell 1 GB
|
||||
+cell Vocab, syntax, entities, word vectors
|
||||
|
||||
+row
|
||||
+cell #[code en_depent_web_md]
|
||||
+cell 328 MB
|
||||
+cell Vocab, syntax, entities
|
||||
|
||||
+row
|
||||
+cell #[code en_vectors_glove_md]
|
||||
+cell 727 MB
|
||||
+cell
|
||||
| #[+a("http://nlp.stanford.edu/projects/glove/") GloVe] Common
|
||||
| Crawl vectors
|
||||
|
||||
+row
|
||||
+cell #[code de_core_news_md]
|
||||
+cell 645 MB
|
||||
+cell Vocab, syntax, entities, word vectors #[+tag default]
|
||||
|
||||
p
|
||||
| Model differences are mostly statistical. In general, we do expect larger
|
||||
| models to be "better" and more accurate overall. Ultimately, it depends on
|
||||
| your use case and requirements, and we recommend starting with the default
|
||||
| models (marked with a star below).
|
||||
|
||||
+aside
|
||||
| Models are now available as #[code .tar.gz] archives #[+a(gh("spacy-models")) from GitHub],
|
||||
| attached to individual releases. They can be downloaded and loaded manually,
|
||||
| or using spaCy's #[code download] and #[code link] commands. All models
|
||||
| follow the naming convention of #[code [language]_[type]_[genre]_[size]].
|
||||
| #[br]#[br]
|
||||
|
||||
+button(gh("spacy-models") + "/releases", true, "primary") View models
|
||||
+button(gh("spacy-models"), true, "primary").u-text-tag
|
||||
| View model releases
|
||||
|
||||
+table(["Name", "Language", "Voc", "Dep", "Ent", "Vec", "Size", "License"])
|
||||
+model-row("en_core_web_sm", "English", [1, 1, 1, 1], "50 MB", "CC BY-SA", true)
|
||||
+model-row("en_core_web_md", "English", [1, 1, 1, 1], "1 GB", "CC BY-SA")
|
||||
+model-row("en_depent_web_md", "English", [1, 1, 1, 0], "328 MB", "CC BY-SA")
|
||||
+model-row("en_vectors_glove_md", "English", [0, 0, 0, 1], "727 MB", "CC BY-SA")
|
||||
+model-row("de_core_news_md", "German", [1, 1, 1, 1], "645 MB", "CC BY-SA", true, true)
|
||||
|
||||
+h(2, "download") Downloading models
|
||||
|
||||
|
|
Loading…
Reference in New Issue