Commit Graph

106 Commits

Author SHA1 Message Date
Matthew Honnibal f9f46e5a07 Revert matcher fixes from GregDubbin 2018-02-18 10:59:28 +01:00
Matthew Honnibal 6a8cb905aa
Merge pull request #1876 from GregDubbin/master
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal 2ad050e668 Fix unpickling of Matcher. Also store correct data in matcher._patterns 2018-01-24 15:42:11 +01:00
greg f50bb1aafc Restructure StateC to eliminate dependency on unordered_map 2018-01-23 14:40:03 -05:00
greg 3a491093ee Import libcpp.map if libcpp.unordered_map doesn't exist 2018-01-22 16:46:25 -05:00
greg d55992bdf0 Switch match dictionary to use final state pointer rather than ID 2018-01-22 15:36:47 -05:00
greg 490bc82c27 Add comments clarifying matcher logic for '*' 2018-01-22 10:03:12 -05:00
greg 8bea62f26e Correct bugs for greedy matching and introduce ADVANCE_PLUS action 2018-01-16 13:21:43 -05:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines c0b55ebdac Fix PhraseMatcher.__contains__ and add more tests 2017-10-25 16:31:11 +02:00
ines 91beacf5e3 Fix Matcher.__contains__ 2017-10-25 16:19:38 +02:00
ines 4d97efc3b5 Add missing docstrings 2017-10-25 12:10:16 +02:00
ines 1262aa0bf9 Implement PhraseMatcher.__contains__ 2017-10-25 12:10:04 +02:00
ines 9c733a8849 Implement PhraseMatcher.__len__ 2017-10-25 12:09:56 +02:00
ines 7eebeeaf85 Fix Matcher.__contains__ 2017-10-25 12:09:47 +02:00
ines 7bcec57462 Remove unused attribute 2017-10-25 12:08:54 +02:00
Matthew Honnibal 4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal d8391b1c4d Fix #1434: Matcher failed on ending ? if no token 2017-10-20 16:49:36 +02:00
Matthew Honnibal 56aa42cc5d Fix and document matcher operator 'shadowing' behaviour 2017-10-16 13:38:20 +02:00
Matthew Honnibal 0433181658 Document operator semantics in Matcher docstring 2017-10-16 12:06:33 +02:00
Matthew Honnibal 2534cd57d7 Add bandaid solution to the 'shadowing' problem in #864 2017-10-09 08:59:35 +02:00
Matthew Honnibal 3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
Matthew Honnibal 19c7c09bf7 Fix PhraseMatcher.__contains__ 2017-09-26 08:35:53 -05:00
Ines Montani 7123139b2b Add __contains__ to PhraseMatcher 2017-09-26 13:13:27 +02:00
Ines Montani 50ad50f96a Update matcher.pyx 2017-09-26 13:11:17 +02:00
Matthew Honnibal 842e21de9f Fix int type error for Python 2 2017-09-20 23:55:30 +02:00
Matthew Honnibal 0c93c73e49 Add __reduce__ method for PhraseMatcher 2017-09-20 22:26:40 +02:00
Matthew Honnibal cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal 828cc91545 Fix PhraseMatcher for spaCy 2 2017-09-20 21:54:31 +02:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal e27262f431 Go back to previous matcher signature, with on_match positional 2017-05-23 04:37:40 -05:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal 187f370734 Update tests for matcher changes 2017-05-22 12:59:50 +02:00
ines 4ed6a36622 Update docstrings and API docs for Matcher 2017-05-20 14:43:10 +02:00
ines 39f36539f6 Update docstrings and API docs for Matcher 2017-05-20 14:32:34 +02:00
ines c00ff257be Update docstrings and API docs for Matcher 2017-05-20 14:26:10 +02:00
ines 790435e51c Update docstrings 2017-05-20 14:05:07 +02:00
Matthew Honnibal ce9234f593 Update Matcher API 2017-05-20 13:54:53 +02:00
ines 1d4d3d0ecd Add TODO 2017-05-20 01:38:04 +02:00
ines fe5d8819ea Update Matcher docstrings and API docs 2017-05-19 21:47:06 +02:00
ines e1efd589c3 Fix json imports and use ujson 2017-04-15 12:13:34 +02:00
ines d24589aa72 Clean up imports, unused code, whitespace, docstrings 2017-04-15 12:05:47 +02:00
ines 561f2a3eb4 Use consistent formatting for docstrings 2017-04-15 11:59:21 +02:00
Matthew Honnibal 725249c59a Add merge_phrase callback in matcher.pyx 2017-03-31 13:58:59 +02:00
Raphaël Bournhonesque f332bf05be Remove unused import statements 2017-03-21 21:08:54 +01:00
Matthew Honnibal 8f94897d07 Add 1 operator to matcher, and make sure open patterns are closed at end of document. Closes Issue #766 2017-02-24 14:27:02 +01:00
Dmytro Sadovnychyi e70a7050e1 Remove duplicated line of vocab declaration
As already declared on line 211.
2016-11-13 18:52:49 +08:00
Dmitry Sadovnychyi 9488222e79 Fix PhraseMatcher to work with updated Matcher
#613
2016-11-09 00:14:26 +08:00