spaCy/website/docs/_api-matcher.jade

80 lines
3.1 KiB
Plaintext
Raw Normal View History

2016-10-03 18:19:13 +00:00
//- ----------------------------------
//- 💫 DOCS > API > MATCHER
//- ----------------------------------
2016-03-31 14:24:48 +00:00
2016-10-03 18:19:13 +00:00
+section("matcher")
+h(2, "matcher", "https://github.com/" + SOCIAL.github + "/spaCy/blob/master/spacy/matcher.pyx")
| #[+tag class] Matcher
2016-03-31 14:24:48 +00:00
2016-10-03 18:19:13 +00:00
p A full example can be found #[a(href="https://github.com/" + SOCIAL.github + "blob/master/examples/matcher_example.py") here].
2016-03-31 14:24:48 +00:00
2016-10-03 18:19:13 +00:00
+table(["Usage", "Description"])
2016-03-31 14:24:48 +00:00
+row
+cell #[code.lang-python nlp(doc)]
2016-10-03 18:19:13 +00:00
+cell As part of annotation pipeline.
2016-03-31 14:24:48 +00:00
+row
+cell #[code.lang-python nlp.matcher(doc)]
2016-10-03 18:19:13 +00:00
+cell Explicit invocation.
2016-03-31 14:24:48 +00:00
+row
+cell #[code.lang-python nlp.matcher.add(u'FooCorp', u'ORG', {}, [[{u'ORTH': u'Foo'}]])]
2016-10-03 18:19:13 +00:00
+cell Add a pattern to match.
2016-03-31 14:24:48 +00:00
2016-10-03 18:19:13 +00:00
+section("matcher-init")
+h(3, "matcher-init") __init__(self, vocab, patterns)
+table(["Name", "Type", "Description"])
2016-03-31 14:24:48 +00:00
+row
+cell vocab
+cell #[code.lang-python spacy.vocab.Vocab]
+cell Reference to the shared vocabulary object.
+row
+cell patterns
+cell #[code {entity_key: (etype, attrs, specs)}]
+cell.
Initial patterns to match. See #[code Matcher.add]
2016-10-03 18:19:13 +00:00
+section("matcher-add")
+h(3, "matcher-add") add(self, entity_key, etype, attrs, specs)
+table(["Name", "Type", "Description"])
2016-03-31 14:24:48 +00:00
+row
+cell entity_key
+cell unicode or int
+cell Your arbitrary ID string (or its integer encoding)
+row
+cell etype
+cell unicode or int
+cell A pre-registered entity type, e.g. u'PERSON', u'ORG', etc.
+row
+cell attrs
+cell #[code dict]
+cell Placeholder for future support of entity attributes.
+row
+cell specs
+cell #[code [[{int: unicode}]]]
+cell A list of surface forms, where each surface form is defined as a list of token definitions, and each token definition is a dictionary mapping attribute IDs to attribute values.
2016-10-03 18:19:13 +00:00
+section("matcher-saveload")
+h(3, "matcher-saveload")
2016-03-31 14:24:48 +00:00
| Save and Load
2016-10-03 18:19:13 +00:00
+section("matcher-saveload-dump")
+h(4, "matcher-saveload-dump") dump(loc)
2016-03-31 14:24:48 +00:00
2016-10-03 18:19:13 +00:00
+table(["Name", "Type", "Description"])
2016-03-31 14:24:48 +00:00
+row
+cell loc
2016-10-03 18:19:13 +00:00
+cell #[+a(link_unicode) unicode]
+cell Path to save the gazetteer.json file.
2016-03-31 14:24:48 +00:00
2016-10-03 18:19:13 +00:00
+section("matcher-saveload-load")
+h(4, "matcher-saveload-load") load(loc)
2016-03-31 14:24:48 +00:00
2016-10-03 18:19:13 +00:00
+table(["Name", "Type", "Description"])
2016-03-31 14:24:48 +00:00
+row
+cell loc
2016-10-03 18:19:13 +00:00
+cell #[+a(link_unicode) unicode]
2016-03-31 14:24:48 +00:00
+cell.
Path to load the gazetteer.json file from.