From c2006166d39895098d70269d0bb9052d3bde15d1 Mon Sep 17 00:00:00 2001 From: ines Date: Wed, 26 Apr 2017 16:03:41 +0200 Subject: [PATCH] Update list of available models and info --- website/docs/usage/models.jade | 61 +++++++++++++--------------------- 1 file changed, 24 insertions(+), 37 deletions(-) diff --git a/website/docs/usage/models.jade b/website/docs/usage/models.jade index 9d50dcbc0..69142b351 100644 --- a/website/docs/usage/models.jade +++ b/website/docs/usage/models.jade @@ -13,14 +13,6 @@ p | internal alias that tells spaCy where to find the data files for a specific | model name. -+infobox("Important note") - | Due to improvements in the English lemmatizer in v1.7.0, you need to - | #[strong download the new English models]. The German model is still - | compatible. If you've trained statistical models that use spaCy's - | annotations, you should #[strong retrain your models after updating spaCy]. - | If you don't retrain your models, you may suffer train/test skew, which - | might decrease your accuracy. - +aside-code("Quickstart"). # Install spaCy and download English model pip install spacy @@ -31,43 +23,38 @@ p nlp = spacy.load('en') doc = nlp(u'This is a sentence.') ++infobox("Important note") + | Due to improvements in the English lemmatizer in v1.7.0, you need to + | #[strong download the new English models]. The German model is still + | compatible. If you've trained statistical models that use spaCy's + | annotations, you should #[strong retrain your models after updating spaCy]. + | If you don't retrain your models, you may suffer train/test skew, which + | might decrease your accuracy. + +h(2, "available") Available models -+table(["Name", "Size", "Description"]) - +row - +cell #[code en_core_web_sm] - +cell 50 MB - +cell Vocab, syntax, entities, word vectors #[+tag default] - - +row - +cell #[code en_core_web_md] - +cell 1 GB - +cell Vocab, syntax, entities, word vectors - - +row - +cell #[code en_depent_web_md] - +cell 328 MB - +cell Vocab, syntax, entities - - +row - +cell #[code en_vectors_glove_md] - +cell 727 MB - +cell - | #[+a("http://nlp.stanford.edu/projects/glove/") GloVe] Common - | Crawl vectors - - +row - +cell #[code de_core_news_md] - +cell 645 MB - +cell Vocab, syntax, entities, word vectors #[+tag default] - p + | Model differences are mostly statistical. In general, we do expect larger + | models to be "better" and more accurate overall. Ultimately, it depends on + | your use case and requirements, and we recommend starting with the default + | models (marked with a star below). + ++aside | Models are now available as #[code .tar.gz] archives #[+a(gh("spacy-models")) from GitHub], | attached to individual releases. They can be downloaded and loaded manually, | or using spaCy's #[code download] and #[code link] commands. All models | follow the naming convention of #[code [language]_[type]_[genre]_[size]]. + | #[br]#[br] -+button(gh("spacy-models") + "/releases", true, "primary") View models + +button(gh("spacy-models"), true, "primary").u-text-tag + | View model releases + ++table(["Name", "Language", "Voc", "Dep", "Ent", "Vec", "Size", "License"]) + +model-row("en_core_web_sm", "English", [1, 1, 1, 1], "50 MB", "CC BY-SA", true) + +model-row("en_core_web_md", "English", [1, 1, 1, 1], "1 GB", "CC BY-SA") + +model-row("en_depent_web_md", "English", [1, 1, 1, 0], "328 MB", "CC BY-SA") + +model-row("en_vectors_glove_md", "English", [0, 0, 0, 1], "727 MB", "CC BY-SA") + +model-row("de_core_news_md", "German", [1, 1, 1, 1], "645 MB", "CC BY-SA", true, true) +h(2, "download") Downloading models