Commit Graph

6883 Commits

Author SHA1 Message Date
Matthew Honnibal 188f620046 Improve parser defaults 2017-10-11 09:43:48 +02:00
Matthew Honnibal acba2e1051 Fix metadata in training 2017-10-11 08:55:52 +02:00
Matthew Honnibal 74c2c6a58c Add default name and lang to meta 2017-10-11 08:49:12 +02:00
Matthew Honnibal 3814a161e6 Avoid clobbering preset lemmas 2017-10-11 08:41:03 +02:00
Matthew Honnibal fd47f8e89f Fix failing test 2017-10-11 08:38:34 +02:00
Matthew Honnibal 462b2e26b4 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-11 08:23:04 +02:00
Matthew Honnibal a6ac4699eb Allow Morphology class to setup tokens
Add Morphology.assign_untagged() C-method, and call it from
Doc.push_back() when a token is created. This gives a place
to allow the Morphology class to initialize token data.
2017-10-11 03:24:14 +02:00
Matthew Honnibal 3b527fa52b Call morphology.assign_untagged when pushing token to Doc 2017-10-11 03:23:57 +02:00
Matthew Honnibal c15d8278cb Avoid lemmatizing inappropriate tags in English lemmatizer 2017-10-11 03:23:23 +02:00
Matthew Honnibal d528b6e36d Add assign_untagged method in Morphology 2017-10-11 03:22:49 +02:00
Matthew Honnibal 2c118ab3a6 Add tests for Doc creation 2017-10-11 03:21:23 +02:00
ines f4ae6763b9 Fix consistency of imports from spacy.tokens in examples 2017-10-11 02:30:40 +02:00
ines 820bf85075 Move LookupLemmatizer to spacy.lemmatizer 2017-10-11 02:25:13 +02:00
ines 417d45f5d0 Add lemmatizer data as variable on language data
Don't create lookup lemmatizer within Language class and just pass in
the data so it can be set on Token creation
2017-10-11 02:24:58 +02:00
ines 0c2343d73a Tidy up language data 2017-10-11 02:22:49 +02:00
Matthew Honnibal d84136b4a9 Update add label test 2017-10-10 22:57:41 +02:00
Matthew Honnibal 3065f12ef2 Make add parser label work for hidden_depth=0 2017-10-10 22:57:31 +02:00
ines bfd58dd0fc Merge branch 'develop' into feature/dot-underscore 2017-10-10 22:03:51 +02:00
Matthew Honnibal 73bca3d382 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-10 12:51:37 -05:00
Matthew Honnibal 5156074df1 Make loading code more consistent in train command 2017-10-10 12:51:20 -05:00
Matthew Honnibal d70fba6807 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-10 19:33:10 +02:00
Matthew Honnibal 8143618497 Set prefix length back to 1 2017-10-10 19:32:54 +02:00
Matthew Honnibal 97c9b5db8b Patch spacy.train for new pipeline management 2017-10-09 23:41:16 -05:00
ines 19598ebfee Update migration guide 2017-10-10 06:38:11 +02:00
ines 9c96a6e131 Update pipelines section in v2 overview 2017-10-10 06:33:53 +02:00
Matthew Honnibal a635240398 Add conll_ner2json converter 2017-10-09 22:03:26 -05:00
Matthew Honnibal e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
Matthew Honnibal dce8afb9cf Set prefix length to 3 2017-10-09 21:55:55 -05:00
Matthew Honnibal 8265b90c83 Update parser defaults 2017-10-09 21:55:20 -05:00
Matthew Honnibal dd2b0601d1 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-09 21:30:46 -05:00
Matthew Honnibal 09d61ada5e Merge pull request #1396 from explosion/feature/pipeline-management
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
ines 6679117000 Add pipeline component examples 2017-10-10 04:26:06 +02:00
ines 7a592d01dc Update pipeline component usage docs 2017-10-10 04:24:39 +02:00
ines 3d5154811a Fix typo 2017-10-10 04:24:22 +02:00
ines 43b70651fb Document extension methods on Doc, Token and Span
set_extension, get_extension, has_extension
2017-10-10 04:23:37 +02:00
ines 67350fa496 Use better logic for auto-generating component name
Instances don't have __name__, so we try __class__.__name__ as well,
before giving up and defaulting to repr(component).
2017-10-10 04:23:05 +02:00
ines b4fc6b203c Rename mixin 2017-10-10 04:22:23 +02:00
ines 3fc4fe61d2 Fix typo 2017-10-10 04:15:14 +02:00
ines 59c4f27499 Add get, set and has methods to Underscore 2017-10-10 04:14:35 +02:00
Matthew Honnibal 19136fd155 Merge branch 'develop' of https://github.com/explosion/spaCy into develop 2017-10-10 03:58:30 +02:00
Matthew Honnibal 8978212ee5 Patch serialization bug raised in #1105 2017-10-10 03:58:12 +02:00
Matthew Honnibal f0f2739ae3 Add test for serialization issue raised in #1105 2017-10-10 03:57:58 +02:00
Matthew Honnibal 735d18654d Add NER converter for CoNLL 2003 data 2017-10-09 20:06:28 -05:00
Matthew Honnibal 51d18937af Partially apply doc/span/token into method
We want methods to act like they're "bound" to the object, so that you can make your method conditional on the `doc`, `span` or `token` instance --- like, well, a method. We therefore partially apply the function, which works like this:

```
def partial(unbound_method, constant_arg):
    def bound_method(*args, **kwargs):
        return unbound_method(constant_arg, *args, **kwargs)
    return bound_method
2017-10-10 02:21:28 +02:00
Matthew Honnibal 808d8740d6 Remove print statement 2017-10-09 08:45:20 -05:00
Matthew Honnibal 0f41b25f60 Add speed benchmarks to metadata 2017-10-09 08:05:37 -05:00
ines de374dc72a Merge branch 'feature/pipeline-management' into feature/dot-underscore 2017-10-09 14:37:51 +02:00
ines 6c253db3fe Add section for developing spaCy extensions 2017-10-09 14:36:56 +02:00
ines 6550d0547c Fix typo 2017-10-09 14:36:36 +02:00
ines 4d248ea920 Fix spacing on bulleted lists 2017-10-09 14:36:30 +02:00