Commit Graph

9724 Commits

Author SHA1 Message Date
Matthew Honnibal bba5f57f91 Add method to export utf8 array to Doc 2019-03-09 11:50:27 +00:00
Matthew Honnibal e1a83d15ed Add support for character features to Tok2Vec 2019-03-09 11:50:08 +00:00
Matthew Honnibal eae384ebb2 Add POS to morphological fields 2019-03-09 11:49:44 +00:00
Matthew Honnibal b6d60d0041 Merge branch 'feature/lemmatizer' of https://github.com/explosion/spaCy into feature/lemmatizer 2019-03-09 00:41:53 +00:00
Matthew Honnibal 4c8730526b Filter bad retokenizations 2019-03-09 00:41:34 +00:00
Matthew Honnibal 42bc3ad73b Fix class mapping for morphologizer 2019-03-09 00:20:29 +00:00
Matthew Honnibal c4df89ab90 Fixes for morphologizer 2019-03-09 00:20:11 +00:00
Matthew Honnibal cc2b2dba14 Neaten set_morphology option on Tagger 2019-03-08 19:16:02 +01:00
Matthew Honnibal afa227e25b Fix setter 2019-03-08 19:10:01 +01:00
Matthew Honnibal b27bd42613 Fix compile error 2019-03-08 19:06:02 +01:00
Matthew Honnibal 27886d626f Dont set morphology in Tagger for ud_train 2019-03-08 19:03:31 +01:00
Matthew Honnibal c91577db02 Add set_morphology cfg option for Tagger 2019-03-08 19:03:17 +01:00
Matthew Honnibal 49cf002ac4 Add missing import 2019-03-08 18:59:25 +01:00
Matthew Honnibal 09b26f5e2e Fix compile error 2019-03-08 18:58:26 +01:00
Matthew Honnibal d7ec1d62cb Fix Morphologizer 2019-03-08 18:54:25 +01:00
Matthew Honnibal 3908911da4 Fix import 2019-03-08 17:04:14 +01:00
Matthew Honnibal 8a9181d95a Merge __init__ 2019-03-08 16:58:42 +01:00
Matthew Honnibal 4cf897e8e1 Update from develop 2019-03-08 16:56:54 +01:00
Ines Montani ad834be494 Tidy up and auto-format 2019-03-08 13:28:53 +01:00
Ines Montani d260aa17fd Merge branch 'develop' into feature/lemmatizer 2019-03-08 13:25:00 +01:00
Ines Montani 296446a1c8
Tidy up and improve docs and docstrings (#3370)
<!--- Provide a general summary of your changes in the title. -->

## Description
* tidy up and adjust Cython code to code style
* improve docstrings and make calling `help()` nicer
* add URLs to new docs pages to docstrings wherever possible, mostly to user-facing objects
* fix various typos and inconsistencies in docs

### Types of change
enhancement, docs

## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-08 11:42:26 +01:00
Matthew Honnibal 19e6b39786 Test morphological features 2019-03-08 01:38:54 +01:00
Matthew Honnibal 9dceb97570 Extend morphanalysis API 2019-03-08 01:38:34 +01:00
Matthew Honnibal 322b64dca0 Allow lookup of morphology by attribute name 2019-03-08 01:38:15 +01:00
Matthew Honnibal 3c32590243 Add test for morph analysis 2019-03-08 00:10:07 +01:00
Matthew Honnibal 3300e3d7ab Implement more MorphAnalysis API 2019-03-08 00:09:16 +01:00
Matthew Honnibal 9a2d1cc6e0 Add length attribute to MorphAnalysisC 2019-03-08 00:08:57 +01:00
Matthew Honnibal b5f2b7b454 Add list_features() helper, clean up 2019-03-08 00:08:35 +01:00
Ines Montani daaeeb7a2b Merge branch 'master' into develop 2019-03-07 22:07:31 +01:00
Matthew Honnibal a40d73cb2a Build out morphological analysis API 2019-03-07 21:59:25 +01:00
Matthew Honnibal dd9ea478c5 Fix intify_attrs function for obsolete data 2019-03-07 21:59:03 +01:00
Matthew Honnibal 987ee6e884 Fix data reading in morphology 2019-03-07 21:58:43 +01:00
Matthew Honnibal 00cfadbf63 Fix obsolete data in English tokenizer exceptions 2019-03-07 21:58:16 +01:00
Matthew Honnibal 7afe56a360 Fix morphological features in en tag_map 2019-03-07 21:57:56 +01:00
Matthew Honnibal 3a667833d1 Fix morphological features in de tag_map 2019-03-07 21:57:43 +01:00
Adrien Ball 88909a9adb Fix egg fragments in direct download (#3369)
## Description
The egg fragment in the URL must be of the form `#egg=package_name==version` instead of `#egg=package_name-version`.
One of the consequences of specifying wrong egg fragments is that `pip` does not recognize the package and its version properly, and thus it re-downloads the package systematically.

I'm not sure how this should be tested properly. 
Here is what I had before the fix when running the same direct download twice:
```
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm-2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 1.6MB/s
  Generating metadata for package en-core-web-sm-2.0.0 produced metadata for project name en-core-web-sm. Fix your #egg=en-core-web-sm-2.0.0 fragments.
Installing collected packages: en-core-web-sm
  Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm-2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 919kB/s
  Generating metadata for package en-core-web-sm-2.0.0 produced metadata for project name en-core-web-sm. Fix your #egg=en-core-web-sm-2.0.0 fragments.
Requirement already satisfied (use --upgrade to upgrade): en-core-web-sm from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm-2.0.0 in ./venv3/lib/python3.6/site-packages
```

And after the fix:
```
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Collecting en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz (37.4MB)
    100% |████████████████████████████████| 37.4MB 1.1MB/s
Installing collected packages: en-core-web-sm
  Running setup.py install for en-core-web-sm ... done
Successfully installed en-core-web-sm-2.0.0
$ python -m spacy download en_core_web_sm-2.0.0 --direct
Looking in indexes: https://pypi.python.org/simple/
Requirement already satisfied: en_core_web_sm==2.0.0 from https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.0.0/en_core_web_sm-2.0.0.tar.gz#egg=en_core_web_sm==2.0.0 in ./venv3/lib/python3.6/site-packages (2.0.0)
```

### Types of change
This is an enhancement as it avoids unnecessary downloads of (potentially big) spacy models, when they have already been downloaded.

## Checklist
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-03-07 21:07:19 +01:00
Matthew Honnibal 1a10bf29bc Remove morph_key from token api 2019-03-07 18:33:17 +01:00
Matthew Honnibal c1888b05d2 Export helper functions for morphology 2019-03-07 18:33:06 +01:00
Matthew Honnibal 357066ee2f Work on morphanalysis class 2019-03-07 18:32:51 +01:00
Matthew Honnibal 2669190b85 Normalize props for morph exceptions 2019-03-07 18:32:36 +01:00
Matthew Honnibal e585b50458 Fix features in English tag map 2019-03-07 18:32:09 +01:00
Matthew Honnibal 0ad09b16ad Add header for morphanalysis 2019-03-07 17:24:57 +01:00
Matthew Honnibal fed0371db7 Remove enums from morphology 2019-03-07 17:14:57 +01:00
Matthew Honnibal 932d7dde1c Fix compile error 2019-03-07 14:34:54 +01:00
Matthew Honnibal b9ade7d4e0 Add MorphAnalysisC struct 2019-03-07 14:03:07 +01:00
Matthew Honnibal b69013e2d7 Fix passing of morphological features to lemmatizer 2019-03-07 13:11:38 +01:00
Matthew Honnibal 74db1d9602 Revert "Space out symbols enum, to make maintaining easier"
This reverts commit be5235369c.
2019-03-07 12:52:30 +01:00
Matthew Honnibal c773b5011c Revert "Fix StringStore after symbols changes"
This reverts commit bcfe3bd312.
2019-03-07 12:52:15 +01:00
Matthew Honnibal bcfe3bd312 Fix StringStore after symbols changes 2019-03-07 12:51:11 +01:00
Ines Montani 96b91a8898 Fix noqa [ci skip] 2019-03-07 12:25:00 +01:00