Commit Graph

192 Commits

Author SHA1 Message Date
Matthew Honnibal f6e356aada Add (and test) Span.sentiment attribute. By default we average token.span, but can override with custom hook. Re Issue #667 2016-12-02 11:05:50 +01:00
Matthew Honnibal 87613edf8f Add set_struct_attr staticmethod to token 2016-11-25 12:41:47 +01:00
Matthew Honnibal fb69aa648f Merge branch 'master' of ssh://github.com/explosion/spaCy 2016-11-25 11:35:44 +01:00
Matthew Honnibal 9a03a3f85e Add get_struct_attr staticmethod to Token, to match Lexeme.get_struct_attr. 2016-11-25 11:35:17 +01:00
Pokey Rule 3e3bda142d Add noun_chunks to Span 2016-11-24 10:47:20 +00:00
tiago b38cfd0ef9 now span.merge returns token like it says on documentation 2016-11-09 14:58:19 +00:00
Matthew Honnibal 1fb09c3dc1 Fix morphology tagger 2016-11-04 19:19:09 +01:00
Matthew Honnibal 293c79c09a Fix #595: Lemmatization was incorrect for base forms, because morphological analyser wasn't adding morphology properly. 2016-11-04 00:29:07 +01:00
Matthew Honnibal f292f7f0e6 Fix Issue #599, by considering empty documents to be parsed and tagged. Implementation is a bit dodgy. 2016-11-02 23:48:43 +01:00
Matthew Honnibal 05a8b752a2 Fix Issue #600: Missing setters for Token attribute. 2016-11-02 23:28:59 +01:00
Matthew Honnibal 11664b9f20 Fix variable error in token 2016-11-01 13:28:00 +01:00
Matthew Honnibal 8c4d1b46ce Fix variable error in Span 2016-11-01 13:27:44 +01:00
Matthew Honnibal e7af6b937f Fix syntax error while fixing doc strings 2016-11-01 13:27:32 +01:00
Matthew Honnibal b86f8af0c1 Fix doc strings 2016-11-01 12:25:36 +01:00
Matthew Honnibal 4ca31b4d87 Fix clobbering of 'missing' named ent values after assigning ents. 2016-10-26 13:13:56 +02:00
Matthew Honnibal 15c9b59f0e Fix Issue #461: O tag was being clobbered by doc.ents.__set__ 2016-10-23 15:50:26 +02:00
Matthew Honnibal 2c3a67b693 Fix calculation of vector norm, re Issue #522. Need to consolidate the calculations into a helper function. 2016-10-23 14:49:31 +02:00
Matthew Honnibal e80944276f Fix Span.vector_norm 2016-10-20 21:58:56 +02:00
Matthew Honnibal 3588a18fb8 Fix hook names in doc 2016-10-19 21:15:16 +02:00
Matthew Honnibal 5d5742b773 Add sentiment field to doc, rename getters_for_tokens and getters_for_spans, add user_hooks field to Doc. 2016-10-19 20:54:22 +02:00
Matthew Honnibal 9b60186266 Fix doc class 2016-10-17 15:23:47 +02:00
Matthew Honnibal 7fd98fc91c Remove deprecation shim around str/bytes in Token. 2016-10-17 14:02:47 +02:00
Matthew Honnibal b67697a97b Improve API for doc.merge() and span.merge(), to use keyword arguments. 2016-10-17 14:02:13 +02:00
Matthew Honnibal fbb7f3f15c Add user_data attribute to Doc object. 2016-10-17 11:43:22 +02:00
Matthew Honnibal c1abc8f6ed Fix deprecation stuff in Token: Remove the shim for the str/unicode semantics, and raise for has_repvec and repvec 2016-10-17 11:18:41 +02:00
Matthew Honnibal 09ab447a18 Remove tensor property from token. 2016-10-17 02:45:09 +02:00
Matthew Honnibal 5d10e2005c Defer some attributes to Doc, via getters_for_tokens attribute. 2016-10-17 02:44:49 +02:00
Matthew Honnibal 8829984efb Remove tensor attribute from Span and Token. 2016-10-17 02:44:04 +02:00
Matthew Honnibal d15a88c66a Defer some attributes to Doc via getters_for_spans 2016-10-17 02:43:35 +02:00
Matthew Honnibal 62230dd13a Add getters_for_spans and getters_for_tokens attributes to Doc. Fix docstring 2016-10-17 02:42:51 +02:00
Matthew Honnibal ae11ea8240 Add getters_for_tokens and getters_for_spans attributes to Doc object. 2016-10-17 02:42:05 +02:00
Matthew Honnibal 311a985fe0 Add input error handling in Doc 2016-10-16 18:16:42 +02:00
Matthew Honnibal 06322ba99d Add words and spaces keyword arguments to Doc. 2016-10-16 18:13:03 +02:00
Matthew Honnibal f3be9d0a9a Add tensor field to Lexeme, Token, Doc and Span, so that users have a place to hang neural network outputs 2016-10-14 03:24:13 +02:00
Matthew Honnibal ca32a1ab01 Revert "Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good."
This reverts commit 8423e8627f.
2016-09-30 20:20:22 +02:00
Matthew Honnibal 6736977d82 Revert "Changes to Doc and Token for new string store scheme"
This reverts commit 99de44d864.
2016-09-30 20:11:15 +02:00
Matthew Honnibal 99de44d864 Changes to Doc and Token for new string store scheme 2016-09-30 20:00:21 +02:00
Matthew Honnibal 8423e8627f Work on Issue #285: intern strings into document-specific pools, to address streaming data memory growth. StringStore.__getitem__ now raises KeyError when it can't find the string. Use StringStore.intern() to get the old behaviour. Still need to hunt down all uses of StringStore.__getitem__ in library and do testing, but logic looks good. 2016-09-30 10:14:47 +02:00
Matthew Honnibal d3dc5718b2 Fix syntax error in Doc 2016-09-28 11:39:49 +02:00
Matthew Honnibal 1b520e7bab Improve docstrings for Doc object 2016-09-28 11:15:13 +02:00
Matthew Honnibal fc4a7ad794 Test and fix Issue #411: IndexError when .sents property is used on empty string. 2016-09-27 18:49:14 +02:00
Matthew Honnibal 15e42a1ba9 Allow entities to be set by Span, or by 4-tuple (with entity ID) 2016-09-24 01:17:43 +02:00
Matthew Honnibal e48df859b5 Fix typedef import in span.pyx 2016-09-23 16:02:28 +02:00
Matthew Honnibal 4de13606fd Fix token.pyx 2016-09-23 15:07:07 +02:00
Matthew Honnibal b4de419e19 Import hash_t typedef in token.pyx 2016-09-23 14:22:06 +02:00
Matthew Honnibal c1a2e96604 Clean up notes at end of token.pyx 2016-09-21 20:45:51 +02:00
Matthew Honnibal 58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Matthew Honnibal 2735b6247b Fix orths_and_spaces in Doc.__init__ 2016-09-21 14:52:05 +02:00
Matthew Honnibal cdc10e9a1c * Fix Issue #375: noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks to work. 2016-05-20 10:14:06 +02:00
Matthew Honnibal 5d86c30f0b * Fix Issue #367: Missing has_vector property on Doc and Span objects 2016-05-09 12:36:14 +02:00