Commit Graph

9630 Commits

Author SHA1 Message Date
Ines Montani 296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Ines Montani daaeeb7a2b Merge branch 'master' into develop 2019-03-07 22:07:31 +01:00
Adrien Ball 88909a9adb Fix egg fragments in direct download (#3369)
## Description
The egg fragment in the URL must be of the form `#egg=package_name==version` instead of `#egg=package_name-version`.
One of the consequences of specifying wrong egg fragments is that `pip` does not recognize the package and its version properly, and thus it re-downloads the package systematically.

I'm not sure how this should be tested properly. 
Here is what I had before the fix when running the same direct download twice:
```
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm-2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 1.6MB/s
  Generating metadata for package en-core-web-sm-2.0.0 produced metadata for project name en-core-web-sm. Fix your #egg=en-core-web-sm-2.0.0 fragments.
Installing collected packages: en-core-web-sm
  Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm-2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 919kB/s
  Generating metadata for package en-core-web-sm-2.0.0 produced metadata for project name en-core-web-sm. Fix your #egg=en-core-web-sm-2.0.0 fragments.
Requirement already satisfied (use --upgrade to upgrade): en-core-web-sm from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0 in ./venv3/lib/python3.6/site-packages
```

And after the fix:
```
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 1.1MB/s
Installing collected packages: en-core-web-sm
  Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Requirement already satisfied: en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0 in ./venv3/lib/python3.6/site-packages (2.0.0)
```

### Types of change
This is an enhancement as it avoids unnecessary downloads of (potentially big) spacy models, when they have already been downloaded.

## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-07 21:07:19 +01:00
Ines Montani 96b91a8898 Fix noqa [ci skip] 2019-03-07 12:25:00 +01:00
Ines Montani fa7314b221 Clarify train_path and dev_path format (see #3366) [ci skip] 2019-03-07 12:23:27 +01:00
Ines Montani 9d6ca18a10 Tidy up and only use self.vector once 2019-03-07 01:06:12 +01:00
Ines Montani a8f1efd2f5 Merge branch 'master' into develop 2019-03-07 00:56:31 +01:00
Daniel King 5f40229397 Don't use numpy directly for similarity (#3362)
* Don't use numpy directly for similarity

* Contributor agreement
2019-03-06 22:58:38 +00:00
Ines Montani e9babd9973 Update hyperparameters section (see #3352) 2019-03-06 14:40:30 +01:00
Ines Montani 6bd34e9d54 Expose Japanese stop words (closes #3346) 2019-03-06 14:21:15 +01:00
Ines Montani 85deb96278 Fix whitespace 2019-03-06 14:20:34 +01:00
Ines Montani 48a206a95f Fix displaCy visualizations in docs (closes #3357) [ci skip] 2019-03-06 13:20:44 +01:00
Ines Montani 5eadf61327 Update pretraining docs on file format (closes #3354) 2019-03-04 16:30:13 +00:00
Ines Montani 23f6ebf0f3 Add missing " (closes #3343) 2019-02-27 16:37:03 +01:00
Ines Montani 533b580c19 Add test for stray print statements in languages (see #3342) 2019-02-27 16:04:30 +01:00
Ines Montani 48a2046d1c Remove stray print statement (closes #3342) 2019-02-27 15:35:04 +01:00
Ines Montani 07d7c0a1af Fix whitespace 2019-02-27 15:34:21 +01:00
Ines Montani 9b62639d19 Auto-format [ci skip] 2019-02-27 14:24:55 +01:00
Matthew Honnibal 656edcb984 Set version to v2.1.0a10 2019-02-27 12:26:13 +01:00
Ines Montani 1d4ba7678f Auto-format [ci skip] 2019-02-27 12:07:35 +01:00
Matthew Honnibal f1d77eb140
💫 Improve handling of missing NER tags (closes #2603) (#3341)
* Improve handling of missing NER tags

GoldParse can accept missing NER tags, if entities is provided
in BILUO format (rather than as spans). Missing tags can be provided
as None values.

Fix bug that occurred when first tag was a None value. Closes #2603.

* Document specification of missing NER tags.
2019-02-27 12:06:32 +01:00
Ines Montani c478a2ccb6 Update backwards incompat [ci skip] 2019-02-27 11:56:56 +01:00
Ines Montani e359bdd0e3 Auto-format 2019-02-27 11:56:45 +01:00
Ines Montani d7217513c9 Merge branch 'spacy.io' into develop [ci skip] 2019-02-27 11:42:10 +01:00
Matthew Honnibal 4a3371acd5
Make doc[0].is_sent_start == True (closes #2869) (#3340)
* Make doc[0] have sent_start True. Closes #2869

* Document that doc[0].is_sent_start defaults True.
2019-02-27 11:17:17 +01:00
Matthew Honnibal 2d3ce89b78 Improve matcher tests re issue #3328 2019-02-27 10:25:56 +01:00
Matthew Honnibal 8d6954e0e7 Fix matcher bug #3328 2019-02-27 10:25:39 +01:00
Ines Montani cb481aa1fe Merge branch 'spacy.io' into develop [ci skip] 2019-02-26 16:51:22 +01:00
Ines Montani aadf586789 Add xfailing test for #3331 2019-02-25 22:33:30 +01:00
Matthew Honnibal 002c24d8ea Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-25 21:55:43 +01:00
Matthew Honnibal 3cdd3eb518 Set version to v2.1.0a9 2019-02-25 21:55:19 +01:00
Ines Montani 2579ecbb63 Merge branch 'spacy.io' into develop [ci skip] 2019-02-25 21:41:51 +01:00
Matthew Honnibal b449be0f04 Add comment re issue #3170 2019-02-25 21:24:03 +01:00
Matthew Honnibal 29fb7b4a16 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2019-02-25 21:22:02 +01:00
Matthew Honnibal 9ccd6a3062 Fix head-outside-sentence bug. Fixes #3170 2019-02-25 21:21:44 +01:00
Ines Montani 3379ebcaa4 Fix default prop [ci skip] 2019-02-25 20:29:11 +01:00
Ines Montani e711969e3b Add more human-readable class names [ci skip] 2019-02-25 20:22:40 +01:00
Ines Montani 162bd4d75b
💫 Add Algolia DocSearch (#3332)
* Add Algolia DocSearch

* Add human-readable selector for teaser
2019-02-25 20:11:11 +01:00
Matthew Honnibal f2fae1f186 Add batch size argument to Language.evaluate(). Closes #3263 2019-02-25 19:30:33 +01:00
Ines Montani f135d663f7 Update conftest.py 2019-02-25 15:55:29 +01:00
Ines Montani 76ce8b2662 Merge branch 'master' into develop 2019-02-25 15:54:55 +01:00
Julia Makogon f1c3108d52 Fixing pymorphy2 dependency issue (#3329) (closes #3327)
* Classes for Ukrainian; small fix in Russian.

* Contributor agreement

* pymorphy2 initialization split for ru and uk (#3327)

* stop-words fixed

* Unit-tests updated
2019-02-25 15:48:17 +01:00
Ines Montani 1a735e0f1f Add regression test for #3328 2019-02-25 10:12:58 +01:00
Ines Montani 1b6238101a Add table explaining training metrics [closes #2644] 2019-02-25 10:03:43 +01:00
Ines Montani 1981b194cc Fix recomputing of :target [ci skip]
Prevents additional history entry
2019-02-25 10:03:20 +01:00
Ines Montani 55bb570f51 Add [ja] to extras_require 2019-02-25 09:37:05 +01:00
Ines Montani dfbed07d3b Remove unused temp errors 2019-02-24 22:26:08 +01:00
Ines Montani d0b3af9222 Fix remaining inaccuracies in API docs (closes #2329) 2019-02-24 22:21:25 +01:00
Ines Montani 49d0938038 Update version [ci skip] 2019-02-24 22:01:47 +01:00
Ines Montani 62b558ab72 💫 Support lexical attributes in retokenizer attrs (closes #2390) (#3325)
* Fix formatting and whitespace

* Add support for lexical attributes (closes #2390)

* Document lexical attribute setting during retokenization

* Assign variable oputside of nested loop
2019-02-24 21:13:51 +01:00