Commit Graph

29 Commits

Author SHA1 Message Date
Suraj Rajan bbdc6456c6 Set up dependency tree pattern matching skeleton (#2732) 2018-09-27 13:27:18 +02:00
ines c0b55ebdac Fix PhraseMatcher.__contains__ and add more tests 2017-10-25 16:31:11 +02:00
Matthew Honnibal 4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal c29927d2e7 Fix matcher test 2017-10-16 17:22:18 +02:00
Matthew Honnibal 748d525801 Add more matcher operator tests 2017-10-16 13:38:01 +02:00
Matthew Honnibal 2534cd57d7 Add bandaid solution to the 'shadowing' problem in #864 2017-10-09 08:59:35 +02:00
Matthew Honnibal 3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
Matthew Honnibal cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal 43ad250dd5 Update matcher tests 2017-09-20 21:54:49 +02:00
ines c5714d4fb2 xfail matcher test for now until setting norm via Span.merge works 2017-05-29 10:51:02 +02:00
ines 00b2094dc3 Fix typos, long integers and tests 2017-05-29 01:09:52 +02:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
ines b3c7ee0148 Fix tests and use the new Matcher API 2017-05-22 13:54:20 +02:00
Ines Montani 5f0d196a31 Modernise and merge matcher tests 2017-01-12 22:23:11 +01:00
Ines Montani f8803808ce Remove old unused tests and conftest files 2017-01-12 15:09:05 +01:00
Dmitry Sadovnychyi 86c056ba64 Add basic test for PhraseMatcher
#613
2016-11-09 00:10:32 +08:00
Matthew Honnibal 7d446e5094 Revert "Update matcher test, to reflect character offset return instead of token offset."
This reverts commit f8d3e3bcfe.
2016-10-17 16:49:49 +02:00
Matthew Honnibal f8d3e3bcfe Update matcher test, to reflect character offset return instead of token offset. 2016-10-17 16:00:10 +02:00
Matthew Honnibal 8951bf6989 Update matcher tests 2016-10-17 01:53:24 +02:00
Matthew Honnibal bd7fe6420c Revert "Changes to test for new string-store"
This reverts commit 21e90d7d0b.
2016-09-30 20:11:01 +02:00
Matthew Honnibal 21e90d7d0b Changes to test for new string-store 2016-09-30 20:00:58 +02:00
Matthew Honnibal 95aaea0d3f Refactor so that the tokenizer data is read from Python data, rather than from disk 2016-09-25 14:49:53 +02:00
Matthew Honnibal 83e364188c Mostly finished loading refactoring. Design is in place, but doesn't work yet. 2016-09-24 15:42:01 +02:00
Matthew Honnibal b00f683a0c Fix matcher test 2016-09-24 11:20:58 +02:00
Matthew Honnibal 939a791a52 Update tests 2016-09-24 01:17:03 +02:00
Matthew Honnibal 58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Matthew Honnibal 4e16f9e435 * Move tests underneath spacy/ 2015-10-26 00:07:31 +11:00