Commit Graph

230 Commits

Author SHA1 Message Date
Matthew Honnibal 4b9d69f428 Merge branch 'v2' into develop
* Move v2 parser into nn_parser.pyx
* New TokenVectorEncoder class in pipeline.pyx
* New spacy/_ml.py module

Currently the two parsers live side-by-side, until we figure out how to
organize them.
2017-05-14 01:10:23 +02:00
ines 9d85cda8e4 Fix models error message and use about.__docs_models__ (see #1051) 2017-05-13 13:05:47 +02:00
ines 6b942763f0 Tidy up imports 2017-05-13 13:04:40 +02:00
ines 6129016e15 Replace deepcopy 2017-05-13 12:32:37 +02:00
ines df68bf45ce Set defaults for light and flat kwargs 2017-05-13 12:32:23 +02:00
ines b9dea345e5 Remove old import 2017-05-13 12:32:11 +02:00
ines 293ee359c5 Fix formatting 2017-05-13 12:32:06 +02:00
Matthew Honnibal ee1d35bdb0 Fix merge conflict 2017-05-13 03:20:19 +02:00
Matthew Honnibal b2540d2379 Merge Kengz's tree_print patch 2017-05-13 03:18:49 +02:00
Matthew Honnibal 4efb391994 Fix serializer 2017-05-09 18:45:18 +02:00
Matthew Honnibal 1166b0c491 Implement Doc.to_bytes and Doc.from_bytes methods 2017-05-09 18:11:34 +02:00
Matthew Honnibal 9e167b7bb6 Strip serializer from code 2017-05-09 17:28:50 +02:00
Matthew Honnibal 62ecdea9f2 Add binder class for document serialization 2017-05-09 17:21:00 +02:00
Matthew Honnibal 6782eedf9b Tmp GPU code 2017-05-07 11:04:24 -05:00
Matthew Honnibal 4d98511db7 Make Span hashable. Closes #1019 2017-04-26 19:01:05 +02:00
Matthew Honnibal 6a4221a6de Allow lemma to be set from Python. Re #973 2017-04-16 18:07:53 +02:00
ines 0739ae7b76 Tidy up and fix formatting and imports 2017-04-15 13:05:15 +02:00
ines 3b667a24d4 Remove whitespace 2017-04-01 10:21:08 +02:00
ines e71a1f4bd0 Fix download commands in error messages (see #946) 2017-04-01 10:20:57 +02:00
Matthew Honnibal 51882ee2b8 Fix check for setting ent_id in merge 2017-03-31 19:32:01 +02:00
Matthew Honnibal fc3900e5b2 Allow ent_id to be set in Token 2017-03-31 14:00:14 +02:00
Matthew Honnibal 9720103428 Improve attribute handlign in doc.merge(). Still unsatisfying 2017-03-31 13:59:58 +02:00
Matthew Honnibal 0fefdfcbda Merge pull request #935 from ericzhao28/master
Add option to use label=ent_type in doc.merge arguments (Bug fix for issue #862)
2017-03-30 02:51:24 +02:00
Eric Zhao aafdf6ffb8 Add option to use label karg to determine ent_type in doc.merge 2017-03-28 23:35:03 -07:00
Matthew Honnibal 28bb546939 Merge pull request #883 from ericzhao28/master
Add `lower_` and `upper_` properties to `Span` class
2017-03-16 23:35:47 +01:00
ines 66c1f194f9 Use consistent unicode declarations 2017-03-12 13:07:28 +01:00
Em 9c809efc25 Removed mapStr 2017-03-11 16:23:26 -08:00
Em 426d17167f Added string manipulation for spans 2017-03-10 16:50:02 -08:00
Roman Inflianskas 66e1109b53 Add support for Universal Dependencies v2.0 2017-03-03 13:17:34 +01:00
Matvey Ezhov 32a22291bc Small `Doc.count_by` documentation update
Current example doesn't work
2017-01-31 19:18:45 +03:00
Matthew Honnibal 6c665b81df Fix redundant == TAG in from_array conditional 2017-01-31 00:46:21 +11:00
Matthew Honnibal e7f8e13cf3 Make Token hashable. Fixes #743 2017-01-16 13:27:57 +01:00
Matthew Honnibal 12cd27b821 Amend 8ae8b443f: Handle comparison with None tokens. 2017-01-11 13:03:32 +01:00
Matthew Honnibal 44e2b0100d Support TAG attribute in doc.from_array 2017-01-10 22:47:07 +01:00
Matthew Honnibal 8ae8b443f1 Add richcmp method to Token. Closes #631 2017-01-09 19:30:31 +01:00
kengz 73a38bd4d1 Merge remote-tracking branch 'upstream/master' 2016-12-30 12:19:59 -05:00
kengz da44183ae1 move parse_tree logic to a new tokens/printers.py file 2016-12-30 12:19:18 -05:00
Matthew Honnibal 404019ad2f Fix issue #672: ent_iob_ was a string, not unicode, due to missing unicode_literals statement. 2016-12-18 22:33:53 +01:00
Matthew Honnibal f6e356aada Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667 2016-12-02 11:05:50 +01:00
Matthew Honnibal 87613edf8f Add set_struct_attr staticmethod to token 2016-11-25 12:41:47 +01:00
Matthew Honnibal fb69aa648f Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-25 11:35:44 +01:00
Matthew Honnibal 9a03a3f85e Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr. 2016-11-25 11:35:17 +01:00
Pokey Rule 3e3bda142d Add noun_chunks to Span 2016-11-24 10:47:20 +00:00
tiago b38cfd0ef9 now span.merge returns token like it says on documentation 2016-11-09 14:58:19 +00:00
Matthew Honnibal 1fb09c3dc1 Fix morphology tagger 2016-11-04 19:19:09 +01:00
Matthew Honnibal 293c79c09a Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly. 2016-11-04 00:29:07 +01:00
Matthew Honnibal f292f7f0e6 Fix Issue #599, by considering empty documents to be parsed and tagged. Implementation is a bit dodgy. 2016-11-02 23:48:43 +01:00
Matthew Honnibal 05a8b752a2 Fix Issue #600: Missing setters for Token attribute. 2016-11-02 23:28:59 +01:00
Matthew Honnibal 11664b9f20 Fix variable error in token 2016-11-01 13:28:00 +01:00
Matthew Honnibal 8c4d1b46ce Fix variable error in Span 2016-11-01 13:27:44 +01:00