Commit Graph

709 Commits

Author SHA1 Message Date
Matthew Honnibal e0a9b02b67 Merge Span._ and Span.as_doc methods 2017-10-09 22:00:15 -05:00
Matthew Honnibal 09d61ada5e Merge pull request #1396 from explosion/feature/pipeline-management
💫 Improve pipeline and factory management
2017-10-10 04:29:54 +02:00
Matthew Honnibal f0f2739ae3 Add test for serialization issue raised in #1105 2017-10-10 03:57:58 +02:00
ines de374dc72a Merge branch 'feature/pipeline-management' into feature/dot-underscore 2017-10-09 14:37:51 +02:00
Matthew Honnibal d8a2506023 Merge pull request #1401 from explosion/feature/add-parser-action
💫 Allow labels to be added to pre-trained parser and NER modes
2017-10-09 04:57:51 +02:00
Matthew Honnibal 689349e32f Merge pull request #1400 from explosion/feature/sentence-parsing
💫 Force parser to respect preset sentence boundaries
2017-10-09 04:31:43 +02:00
Matthew Honnibal fad2b8315f Merge branch 'develop' into feature/add-parser-action 2017-10-09 04:13:04 +02:00
Matthew Honnibal 6c79841c0d Fix tests for history features 2017-10-09 04:12:24 +02:00
Matthew Honnibal dde87e6b0d Add tests for adding parser actions 2017-10-09 03:42:35 +02:00
Matthew Honnibal 81a64119db Fix string-to-unicode problem 2017-10-09 00:59:49 +02:00
Matthew Honnibal 02c2af7119 Fix test 2017-10-09 00:29:37 +02:00
Matthew Honnibal 5a67efeccc Add tests for sentence segmentation presetting 2017-10-09 00:02:23 +02:00
Matthew Honnibal 9bd8191739 Add tests for Underscore 2017-10-07 18:56:19 +02:00
Matthew Honnibal 3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
ines 0adadcb3f0 Fix beam parse model test 2017-10-07 02:15:15 +02:00
ines b38a8f4a94 Fix and update pipe methods tests 2017-10-07 02:06:23 +02:00
Matthew Honnibal 3a65a0c970 Start adding tests for new pipeline management 2017-10-07 01:48:23 +02:00
ines 61a503a611 Fix parser test 2017-10-07 00:38:51 +02:00
Matthew Honnibal c6cd81f192 Wrap try/except around model saving 2017-10-05 08:14:24 -05:00
Matthew Honnibal fd4baff475 Update tests 2017-10-05 08:12:27 -05:00
Matthew Honnibal 40edb65ee7 Make test work for Python 2.7 2017-10-04 16:36:50 +02:00
Matthew Honnibal db05d4d582 Add test for #1380. Passes without fix? 2017-10-04 14:56:31 +02:00
Matthew Honnibal 4a59f6358c Fix thinc imports 2017-10-03 19:21:26 +02:00
Ines Montani 959c46eabe Merge pull request #1365 from wannaphongcom/develop
Add Thai language for spaCy v2
2017-09-26 23:43:05 +02:00
Wannaphong Phatthiyaphaibun 7b5263ffa4 fix thai test 2017-09-26 23:54:15 +07:00
Matthew Honnibal 41cc5c4c17 Merge branch 'develop' into feature/phrasematcher 2017-09-26 09:59:17 -05:00
Wannaphong Phatthiyaphaibun 5cba67146c add thai in spacy2 2017-09-26 21:36:27 +07:00
Matthew Honnibal 74f08e1ad5 Update test 2017-09-26 06:45:56 -05:00
Matthew Honnibal 20193371f5 Don't share CNN, to reduce complexities 2017-09-21 14:59:48 +02:00
Matthew Honnibal cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal 43ad250dd5 Update matcher tests 2017-09-20 21:54:49 +02:00
Matthew Honnibal c013e5996f Fix parser test 2017-09-17 13:13:20 -05:00
ines ece30c28a8 Don't split hyphenated words in German
This way, the tokenizer matches the tokenization in German treebanks
2017-09-16 20:40:15 +02:00
Matthew Honnibal ebf8942564 Fix test for Python3 2017-09-16 16:22:38 +02:00
Matthew Honnibal 8c945310fb Excuse emoji failure on narrow unicode builds 2017-09-16 16:21:13 +02:00
Matthew Honnibal 3fa5b40b5c Add test for hash consistency 2017-09-16 11:21:35 +02:00
Matthew Honnibal 456bb8a74c Unxfail and close #1305 2017-09-06 19:14:17 +02:00
Matthew Honnibal 99e44fbdbb Update regression test 2017-09-06 19:13:51 +02:00
Matthew Honnibal 497a9308a8 Xfail new lemmatizer test 2017-09-06 18:41:22 +02:00
Matthew Honnibal 5384fff5ce Add test for 1305: Incorrect lemmatization of VBZ for English 2017-09-06 18:40:18 +02:00
Matthew Honnibal d5fbf27335 Fix test 2017-09-04 16:45:11 +02:00
Matthew Honnibal cb4839033c Fix loader for EN tests 2017-09-04 15:19:18 +02:00
Matthew Honnibal 644d6c9e1a Improve lemmatization tests, re #1296 2017-09-04 15:17:44 +02:00
Jim Geovedi fbc62a09c7 added {pre,suf,in}fix tests 2017-08-20 13:43:00 +07:00
Jim Geovedi 713d7c0aa0 added indonesian lang test 2017-08-20 12:17:14 +07:00
Jim Geovedi fa544e6c9a Merge remote-tracking branch 'upstream/develop' into indonesian 2017-08-20 11:49:40 +07:00
Matthew Honnibal 41c2218c53 Fix test for vectors 2017-08-19 22:09:12 +02:00
Matthew Honnibal ef87562741 Restore vectors test utils 2017-08-19 20:35:16 +02:00
Matthew Honnibal 1391f9da37 Restore vectors tests 2017-08-19 20:34:58 +02:00
Matthew Honnibal d55d6e1cfa Fix comparison of Token from different docs. Closes #1257 2017-08-19 16:39:32 +02:00