Commit Graph

1602 Commits

Author SHA1 Message Date
Matthew Honnibal b4de419e19 Import hash_t typedef in token.pyx 2016-09-23 14:22:06 +02:00
Matthew Honnibal c1a2e96604 Clean up notes at end of token.pyx 2016-09-21 20:45:51 +02:00
Matthew Honnibal f6e587b1c7 Fix matcher tests 2016-09-21 20:45:20 +02:00
Matthew Honnibal 58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Matthew Honnibal 2735b6247b Fix orths_and_spaces in Doc.__init__ 2016-09-21 14:52:05 +02:00
Matthew Honnibal 070af4af9d Revert "* Working neural net, but features hacky. Switching to extractor."
This reverts commit 7c2f1a673b.
2016-09-21 12:26:14 +02:00
Matthew Honnibal 6b202ec43f Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-09-21 12:08:25 +02:00
Mahmoud Lababidi 4c9ccc3b8b Add parameter to download() for application to not exit if a Model exists. The default behavior is unchanged. 2016-09-14 10:04:09 -04:00
Adam Ever Hadani f1c0762443 exit code 0 for when downloading a model that already was downloaded 2016-07-13 16:22:14 -07:00
Matthew Honnibal 7c2f1a673b * Working neural net, but features hacky. Switching to extractor. 2016-05-26 19:06:10 +02:00
Matthew Honnibal cdc10e9a1c * Fix Issue #375: noun phrase iteration results in index error if noun phrases are merged during the loop. Fix by accumulating the spans inside the noun_chunks property, allowing the Span index tricks to work. 2016-05-20 10:14:06 +02:00
Matthew Honnibal 13fad36e49 * Cosmetic change to english noun chunks iterator -- use enumerate instead of range loop 2016-05-20 10:11:05 +02:00
Matthew Honnibal 02276cc444 Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-05-17 16:56:22 +02:00
Matthew Honnibal 4d7f5468bb * Change Language class to use a .pipeline attribute, instead of having the pipeline hard coded 2016-05-17 16:55:42 +02:00
Daylen Yang 5405e7dd73 Fix get_lang_class parsing (take 2) 2016-05-16 16:40:31 -07:00
Matthew Honnibal b240104f40 Revert "Fix get_lang_class parsing" 2016-05-17 08:04:26 +10:00
Daylen Yang 1692c2df3c Fix get_lang_class parsing
We want the get_lang_class to return "en" for both "en" and "en_glove_cc_300_1m_vectors". Changed the split rule to "_" so that this happens.
2016-05-16 14:38:20 -07:00
Matthew Honnibal 17137f5c0c * Fix issue #372: mistake in Lexeme rich comparison 2016-05-12 12:58:57 +02:00
Matthew Honnibal cc8bf62208 * Fix Issue #360: Tokenizer failed when the infix regex matched the start of the string while trying to tokenize multi-infix tokens. 2016-05-09 13:23:47 +02:00
Matthew Honnibal c61ee8f9fa * Increment version 2016-05-09 13:20:00 +02:00
Matthew Honnibal 5d86c30f0b * Fix Issue #367: Missing has_vector property on Doc and Span objects 2016-05-09 12:36:14 +02:00
Wolfgang Seeker 7b78239436 add fix for German noun chunk iterator (issue #365) 2016-05-06 01:41:26 +02:00
Matthew Honnibal 8c0888d6cb * Fix error in span.sent 2016-05-06 00:28:05 +02:00
Matthew Honnibal bb94022975 * Fix Issue #365: Error introduced during noun phrase chunking, due to use of corrected PRON/PROPN/etc tags. 2016-05-06 00:21:05 +02:00
Matthew Honnibal 41342ca79b Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-05-06 00:17:58 +02:00
Matthew Honnibal 26095f9722 * Add span.sent property, re Issue #366 2016-05-06 00:17:38 +02:00
Wolfgang Seeker dbf8f5f3ec fix bug in StateC.set_break() 2016-05-05 15:15:34 +02:00
Wolfgang Seeker 3c44b5dc1a call deprojectivization after parsing 2016-05-05 15:10:36 +02:00
Matthew Honnibal 472f576b82 * Deprojectivize German parses 2016-05-05 15:01:10 +02:00
Matthew Honnibal 9bbd6cf031 * Work on Chinese support 2016-05-05 11:39:12 +02:00
Matthew Honnibal a6a25166ba * Remove print from test 2016-05-05 11:10:59 +02:00
Matthew Honnibal e31df66d26 * Fix Issue #361: Lexemes didn't have rich comparison. 2016-05-05 01:32:26 +02:00
Matthew Honnibal 7441ca30ee * Add tests for Issue #361: Lexeme rich comparison 2016-05-05 01:31:58 +02:00
Matthew Honnibal 72564213e3 * Add test for Issue #309 2016-05-04 16:00:28 +02:00
Matthew Honnibal 76f1d871da Merge branch 'master' of ssh://github.com/spacy-io/spaCy 2016-05-04 15:54:00 +02:00
Matthew Honnibal 519366f677 * Fix Issue #351: Indices off when leading whitespace 2016-05-04 15:53:36 +02:00
Matthew Honnibal b4bfc6ae55 * Add test for Issue #351: Indices off when leading whitespace 2016-05-04 15:53:17 +02:00
Matthew Honnibal 76021cb853 * Fix bug in Doc.text, introduced by a862edc 2016-05-04 11:02:16 +02:00
Wolfgang Seeker e4ea2bea01 fix whitespace 2016-05-04 07:40:38 +02:00
Wolfgang Seeker 5bf2fd1f78 make the code less cryptic 2016-05-03 17:19:05 +02:00
Wolfgang Seeker a06fca9fdf German noun chunk iterator now doesn't return tokens more than once 2016-05-03 16:58:59 +02:00
Wolfgang Seeker 7825b75548 add tests for German noun chunker 2016-05-03 15:01:28 +02:00
Wolfgang Seeker 7b246c13cb reformulate noun chunk tests for English 2016-05-03 14:24:35 +02:00
Wolfgang Seeker 1786331cd8 add model sanity test 2016-05-03 12:51:47 +02:00
Matthew Honnibal 1f1532142f * Fix cost calculation on non-monotonic oracle 2016-05-03 00:21:08 +02:00
Matthew Honnibal 377a624046 Merge pull request #358 from wbwseeker/german_lemmatizer_dummy
German lemmatizer dummy
2016-05-03 07:38:26 +10:00
Wolfgang Seeker 92bfbebeec remove unnecessary imports 2016-05-02 17:33:22 +02:00
Wolfgang Seeker 857454ffa0 fix indentation -.- 2016-05-02 17:10:41 +02:00
Matthew Honnibal 308a28c26c * Whitespace 2016-05-02 16:08:11 +02:00
Matthew Honnibal 29a114e645 * Don't assign 0-valued tags in Doc.from_array 2016-05-02 16:07:50 +02:00