Update list of available models and info

2017-04-26 16:03:41 +02:00 · 2017-04-26 16:03:41 +02:00 · c2006166d3
parent 5a470367df
commit c2006166d3
1 changed files with 24 additions and 37 deletions
--- a/website/docs/usage/models.jade
+++ b/website/docs/usage/models.jade
@ -13,14 +13,6 @@ p
    |  internal alias that tells spaCy where to find the data files for a specific
    |  model name.

-+infobox("Important note")
-    |  Due to improvements in the English lemmatizer in v1.7.0, you need to
-    |  #[strong download the new English models]. The German model is still
-    |  compatible. If you've trained statistical models that use spaCy's
-    |  annotations, you should #[strong retrain your models after updating spaCy].
-    |  If you don't retrain your models, you may suffer train/test skew, which
-    |  might decrease your accuracy.
-
 +aside-code("Quickstart").
    # Install spaCy and download English model
    pip install spacy
@ -31,43 +23,38 @@ p
    nlp = spacy.load('en')
    doc = nlp(u'This is a sentence.')

+infobox("Important note")
+    |  Due to improvements in the English lemmatizer in v1.7.0, you need to
+    |  #[strong download the new English models]. The German model is still
+    |  compatible. If you've trained statistical models that use spaCy's
+    |  annotations, you should #[strong retrain your models after updating spaCy].
+    |  If you don't retrain your models, you may suffer train/test skew, which
+    |  might decrease your accuracy.
+
 +h(2, "available") Available models

-+table(["Name", "Size", "Description"])
-    +row
-        +cell #[code en_core_web_sm]
-        +cell 50 MB
-        +cell Vocab, syntax, entities, word vectors #[+tag default]
-
-    +row
-        +cell #[code en_core_web_md]
-        +cell 1 GB
-        +cell Vocab, syntax, entities, word vectors
-
-    +row
-        +cell #[code en_depent_web_md]
-        +cell 328 MB
-        +cell Vocab, syntax, entities
-
-    +row
-        +cell #[code en_vectors_glove_md]
-        +cell 727 MB
-        +cell
-            |  #[+a("http://nlp.stanford.edu/projects/glove/") GloVe] Common
-            |  Crawl vectors
-
-    +row
-        +cell #[code de_core_news_md]
-        +cell 645 MB
-        +cell Vocab, syntax, entities, word vectors #[+tag default]
-
 p
+    |  Model differences are mostly statistical. In general, we do expect larger
+    |  models to be "better" and more accurate overall. Ultimately, it depends on
+    |  your use case and requirements, and we recommend starting with the default
+    |  models (marked with a star below).
+
+aside
    |  Models are now available as #[code .tar.gz] archives #[+a(gh("spacy-models")) from GitHub],
    |  attached to individual releases. They can be downloaded and loaded manually,
    |  or using spaCy's #[code download] and #[code link] commands. All models
    |  follow the naming convention of #[code [language]_[type]_[genre]_[size]].
+    | #[br]#[br]

-+button(gh("spacy-models") + "/releases", true, "primary") View models
+    +button(gh("spacy-models"), true, "primary").u-text-tag
+        |  View model releases
+
+table(["Name", "Language", "Voc", "Dep", "Ent", "Vec", "Size", "License"])
+    +model-row("en_core_web_sm", "English", [1, 1, 1, 1], "50 MB", "CC BY-SA", true)
+    +model-row("en_core_web_md", "English", [1, 1, 1, 1], "1 GB", "CC BY-SA")
+    +model-row("en_depent_web_md", "English", [1, 1, 1, 0], "328 MB", "CC BY-SA")
+    +model-row("en_vectors_glove_md", "English", [0, 0, 0, 1], "727 MB", "CC BY-SA")
+    +model-row("de_core_news_md", "German", [1, 1, 1, 1], "645 MB", "CC BY-SA", true, true)

 +h(2, "download") Downloading models