Commit Graph

119 Commits

Author SHA1 Message Date
Ines Montani e89708c3eb 💫 Allow matching non-ORTH attributes in PhraseMatcher (#2925)
* Allow matching non-orth attributes in PhraseMatcher (see #1971)

Usage: PhraseMatcher(nlp.vocab, attr='POS')

* Allow attr argument to be int

* Fix formatting

* Fix typo
2018-11-15 03:00:58 +01:00
Ines Montani 0d5b142c78 Fix typos and whitespace 2018-11-14 19:12:34 +01:00
Ines Montani bd1b0e396a Add deprecation warning for PhraseMatcher max_length 2018-11-14 19:10:46 +01:00
Ines Montani 64257bf3a7 Fix formatting 2018-11-14 19:10:21 +01:00
Suraj Rajan 0bf14082a4 Added more constucts for dependency tree matcher (#2836) 2018-10-29 23:21:39 +01:00
Suraj Krishnan Rajan 356af7b0a1 Fix tests 2018-09-06 01:39:36 +02:00
Matthew Honnibal ce512e1d47 Fix #2671: Incorrect match ID on some patterns 2018-08-15 16:19:08 +02:00
Matthew Honnibal e1569fda4e Fix compile error in matcher 2018-07-06 12:29:23 +02:00
Matthew Honnibal f5703b7a91 Clean up unused stuff in matcher 2018-07-06 12:16:44 +02:00
Matthew Honnibal ad3d56c3ba Fix compile error in matcher 2018-04-29 15:48:34 +02:00
Matthew Honnibal 2c4a6d66fa Merge master into develop. Big merge, many conflicts -- need to review 2018-04-29 14:49:26 +02:00
Ines Montani 3141e04822
💫 New system for error messages and warnings (#2163)
* Add spacy.errors module

* Update deprecation and user warnings

* Replace errors and asserts with new error message system

* Remove redundant asserts

* Fix whitespace

* Add messages for print/util.prints statements

* Fix typo

* Fix typos

* Move CLI messages to spacy.cli._messages

* Add decorator to display error code with message

An implementation like this is nice because it only modifies the string when it's retrieved from the containing class – so we don't have to worry about manipulating tracebacks etc.

* Remove unused link in spacy.about

* Update errors for invalid pipeline components

* Improve error for unknown factories

* Add displaCy warnings

* Update formatting consistency

* Move error message to spacy.errors

* Update errors and check if doc returned by component is None
2018-04-03 15:50:31 +02:00
Matthew Honnibal 1f7229f40f Revert "Merge branch 'develop' of https://github.com/explosion/spaCy into develop"
This reverts commit c9ba3d3c2d, reversing
changes made to 92c26a35d4.
2018-03-27 19:23:02 +02:00
Matthew Honnibal f9f46e5a07 Revert matcher fixes from GregDubbin 2018-02-18 10:59:28 +01:00
Matthew Honnibal 6a8cb905aa
Merge pull request #1876 from GregDubbin/master
Pattern matcher fixes
2018-01-24 16:38:11 +01:00
Matthew Honnibal 2ad050e668 Fix unpickling of Matcher. Also store correct data in matcher._patterns 2018-01-24 15:42:11 +01:00
greg f50bb1aafc Restructure StateC to eliminate dependency on unordered_map 2018-01-23 14:40:03 -05:00
greg 3a491093ee Import libcpp.map if libcpp.unordered_map doesn't exist 2018-01-22 16:46:25 -05:00
greg d55992bdf0 Switch match dictionary to use final state pointer rather than ID 2018-01-22 15:36:47 -05:00
greg 490bc82c27 Add comments clarifying matcher logic for '*' 2018-01-22 10:03:12 -05:00
greg 8bea62f26e Correct bugs for greedy matching and introduce ADVANCE_PLUS action 2018-01-16 13:21:43 -05:00
ines d96e72f656 Tidy up rest 2017-10-27 21:07:59 +02:00
ines c0b55ebdac Fix PhraseMatcher.__contains__ and add more tests 2017-10-25 16:31:11 +02:00
ines 91beacf5e3 Fix Matcher.__contains__ 2017-10-25 16:19:38 +02:00
ines 4d97efc3b5 Add missing docstrings 2017-10-25 12:10:16 +02:00
ines 1262aa0bf9 Implement PhraseMatcher.__contains__ 2017-10-25 12:10:04 +02:00
ines 9c733a8849 Implement PhraseMatcher.__len__ 2017-10-25 12:09:56 +02:00
ines 7eebeeaf85 Fix Matcher.__contains__ 2017-10-25 12:09:47 +02:00
ines 7bcec57462 Remove unused attribute 2017-10-25 12:08:54 +02:00
Matthew Honnibal 4bea65a1a8 Fix Issue #1450: Off-by-1 in * and ? matches
Patterns that end in variable-length operators e.g. * and ? now end on
the correct token. Previously, they were off by 1: the next token was
pulled into the match, even if that's where the pattern failed.
2017-10-24 14:26:27 +02:00
Matthew Honnibal d8391b1c4d Fix #1434: Matcher failed on ending ? if no token 2017-10-20 16:49:36 +02:00
Matthew Honnibal 56aa42cc5d Fix and document matcher operator 'shadowing' behaviour 2017-10-16 13:38:20 +02:00
Matthew Honnibal 0433181658 Document operator semantics in Matcher docstring 2017-10-16 12:06:33 +02:00
Matthew Honnibal 2534cd57d7 Add bandaid solution to the 'shadowing' problem in #864 2017-10-09 08:59:35 +02:00
Matthew Honnibal 3b67eabfea Allow empty dictionaries to match any token in Matcher
Often patterns need to match "any token". A clean way to denote this
is with the empty dict {}: this sets no constraints on the token,
so should always match.

The problem was that having attributes length==0 was used as an
end-of-array signal, so the matcher didn't handle this case correctly.

This patch compiles empty token spec dicts into a constraint
NULL_ATTR==0. The NULL_ATTR attribute, 0, is always set to 0 on the
lexeme -- so this always matches.
2017-10-07 03:36:15 +02:00
Matthew Honnibal 19c7c09bf7 Fix PhraseMatcher.__contains__ 2017-09-26 08:35:53 -05:00
Ines Montani 7123139b2b Add __contains__ to PhraseMatcher 2017-09-26 13:13:27 +02:00
Ines Montani 50ad50f96a Update matcher.pyx 2017-09-26 13:11:17 +02:00
Matthew Honnibal 842e21de9f Fix int type error for Python 2 2017-09-20 23:55:30 +02:00
Matthew Honnibal 0c93c73e49 Add __reduce__ method for PhraseMatcher 2017-09-20 22:26:40 +02:00
Matthew Honnibal cc408fc189 Make PhraseMatcher API like Matcher API 2017-09-20 22:20:35 +02:00
Matthew Honnibal 828cc91545 Fix PhraseMatcher for spaCy 2 2017-09-20 21:54:31 +02:00
Matthew Honnibal fe11564b8e Finish stringstore change. Also xfail vectors tests 2017-05-28 15:10:22 +02:00
Matthew Honnibal e27262f431 Go back to previous matcher signature, with on_match positional 2017-05-23 04:37:40 -05:00
Matthew Honnibal 3959d778ac Revert "Revert "WIP on improving parser efficiency""
This reverts commit 532afef4a8.
2017-05-23 03:06:53 -05:00
Matthew Honnibal 532afef4a8 Revert "WIP on improving parser efficiency"
This reverts commit bdaac7ab44.
2017-05-23 03:05:25 -05:00
Matthew Honnibal bdaac7ab44 WIP on improving parser efficiency 2017-05-23 02:59:31 -05:00
Matthew Honnibal 187f370734 Update tests for matcher changes 2017-05-22 12:59:50 +02:00
ines 4ed6a36622 Update docstrings and API docs for Matcher 2017-05-20 14:43:10 +02:00
ines 39f36539f6 Update docstrings and API docs for Matcher 2017-05-20 14:32:34 +02:00