Commit Graph

85 Commits

Author SHA1 Message Date
Matthew Honnibal 3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
Matthew Honnibal 19c7c09bf7 Fix PhraseMatcher.__contains__ 2017-09-26 08:35:53 -05:00
Ines Montani 7123139b2b Add __contains__ to PhraseMatcher 2017-09-26 13:13:27 +02:00
Ines Montani 50ad50f96a Update matcher.pyx 2017-09-26 13:11:17 +02:00
Matthew Honnibal 842e21de9f Fix int type error for Python 2 2017-09-20 23:55:30 +02:00
Matthew Honnibal 0c93c73e49 Add __reduce__ method for PhraseMatcher 2017-09-20 22:26:40 +02:00
Matthew Honnibal cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal 828cc91545 Fix PhraseMatcher for spaCy 2 2017-09-20 21:54:31 +02:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal e27262f431 Go back to previous matcher signature, with on_match positional 2017-05-23 04:37:40 -05:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal 187f370734 Update tests for matcher changes 2017-05-22 12:59:50 +02:00
ines 4ed6a36622 Update docstrings and API docs for Matcher 2017-05-20 14:43:10 +02:00
ines 39f36539f6 Update docstrings and API docs for Matcher 2017-05-20 14:32:34 +02:00
ines c00ff257be Update docstrings and API docs for Matcher 2017-05-20 14:26:10 +02:00
ines 790435e51c Update docstrings 2017-05-20 14:05:07 +02:00
Matthew Honnibal ce9234f593 Update Matcher API 2017-05-20 13:54:53 +02:00
ines 1d4d3d0ecd Add TODO 2017-05-20 01:38:04 +02:00
ines fe5d8819ea Update Matcher docstrings and API docs 2017-05-19 21:47:06 +02:00
ines e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal 725249c59a Add merge_phrase callback in matcher.pyx 2017-03-31 13:58:59 +02:00
Raphaël Bournhonesque f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00
Matthew Honnibal 8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
Dmytro Sadovnychyi e70a7050e1 Remove duplicated line of vocab declaration
As already declared on line 211.
2016-11-13 18:52:49 +08:00
Dmitry Sadovnychyi 9488222e79 Fix PhraseMatcher to work with updated Matcher
#613
2016-11-09 00:14:26 +08:00
Matthew Honnibal 5cd3acb265 Fix #605: Acceptor now rejects matches as expected. 2016-11-06 10:50:42 +01:00
Matthew Honnibal f1605df2ec Fix #588: Matcher should reject empty pattern. 2016-11-03 00:16:44 +01:00
Matthew Honnibal b86f8af0c1 Fix doc strings 2016-11-01 12:25:36 +01:00
Matthew Honnibal d563f1eadb Fix Issue #587: Segfault in Matcher, due to simple error in the state machine. 2016-10-28 17:42:00 +02:00
Matthew Honnibal 2e92c6fb3a Fix JSON encoding issue on load 2016-10-20 21:06:48 +02:00
Matthew Honnibal f189a3cb00 Fix encoding when opening files in Python 2.7, re Issue #539 2016-10-20 14:42:56 +02:00
Matthew Honnibal 05e2a589a4 Fix None label in matcher 2016-10-18 18:05:21 +02:00
Matthew Honnibal 9258db788a Revert "Have the matcher return character offsets, to handle the match better."
This reverts commit 049c937540.
2016-10-17 16:49:51 +02:00
Matthew Honnibal 2fd97c71cc Revert "Don't try to pickle matcher."
This reverts commit 97bd0c9d00.
2016-10-17 16:49:43 +02:00
Matthew Honnibal 97bd0c9d00 Don't try to pickle matcher. 2016-10-17 16:38:40 +02:00
Matthew Honnibal 049c937540 Have the matcher return character offsets, to handle the match better. 2016-10-17 15:58:57 +02:00
Matthew Honnibal 6cbdc94959 Lots of updates to Matcher, to make entity handling sane. 2016-10-17 15:23:31 +02:00
Matthew Honnibal 90baa9c7e6 Revert "Changes to matcher.pyx for new StringStore scheme"
This reverts commit 3ff09614e0.
2016-09-30 20:20:13 +02:00
Matthew Honnibal 3ff09614e0 Changes to matcher.pyx for new StringStore scheme 2016-09-30 19:56:48 +02:00
Matthew Honnibal fd65cf6cbb Finish refactoring data loading 2016-09-24 20:26:17 +02:00
Matthew Honnibal 83e364188c Mostly finished loading refactoring. Design is in place, but doesn't work yet. 2016-09-24 15:42:01 +02:00
Matthew Honnibal eaf4065480 Expose the _patterns private member 2016-09-24 11:20:42 +02:00
Matthew Honnibal 55f1f7edaf Don't automatically write new entities into the Doc in the Matcher. This fixes a long-standing wart, but introduces a *backwards incompatibility.* 2016-09-24 01:16:45 +02:00
Matthew Honnibal 58e83fe34b Initial, limited support for quantified patterns in Matcher, and tracking of ent_id attribute in Token and Span. The quantifiers need a lot more testing, and there are some known problems. The main known problem is that the zero-plus and one-plus quantifiers won't work if a token can match both the quantified pattern expression AND the tail of the match. 2016-09-21 14:54:55 +02:00
Matthew Honnibal 67ce96c9c9 * Make patterns argument to Matcher class optional 2016-04-17 21:32:24 +02:00
Wolfgang Seeker e6945c4d0e bugfix: uppercase attr values before looking them up 2016-04-15 15:46:31 +02:00