From f86289566ad48598ce3c0c528fa6e543f9b34690 Mon Sep 17 00:00:00 2001 From: ines Date: Tue, 30 May 2017 13:53:06 +0200 Subject: [PATCH] Update new in v2 section and add note on Matcher acceptors --- website/docs/api/matcher.jade | 4 ++- website/docs/usage/v2.jade | 48 ++++++++++++++++++++++------------- 2 files changed, 34 insertions(+), 18 deletions(-) diff --git a/website/docs/api/matcher.jade b/website/docs/api/matcher.jade index e7c0aaaf2..95819e553 100644 --- a/website/docs/api/matcher.jade +++ b/website/docs/api/matcher.jade @@ -11,7 +11,9 @@ p Match sequences of tokens, based on pattern rules. | patterns and a callback for a given match ID. #[code Matcher.get_entity] | is now called #[+api("matcher#get") #[code matcher.get]]. | #[code Matcher.load] (not useful, as it didn't allow specifying callbacks), - | and #[code Matcher.has_entity] (now redundant) have been removed. + | and #[code Matcher.has_entity] (now redundant) have been removed. The + | concept of "acceptor functions" has also been retired – this logic can + | now be handled in the callback functions. +h(2, "init") Matcher.__init__ +tag method diff --git a/website/docs/usage/v2.jade b/website/docs/usage/v2.jade index 2123a04af..75c8c2d3c 100644 --- a/website/docs/usage/v2.jade +++ b/website/docs/usage/v2.jade @@ -3,8 +3,17 @@ include ../../_includes/_mixins p - | We also re-wrote a large part of the documentation and usage workflows, - | and added more examples. + +p + | On this page, you'll find a summary of the #[+a("#features") new features], + | information on the #[+a("#incompat") backwards incompatibilities], + | including a handy overview of what's been renamed or deprecated. + | To help you make the most of v2.0, we also + | #[strong re-wrote almost all of the usage guides and API docs], and added + | more real-world examples. If you're new to spaCy, or just want to brush + | up on some NLP basics and the details of the library, check out + | the #[+a("/docs/usage/spacy-101") spaCy 101 guide] that explains the most + | important concepts with examples and illustrations. +h(2, "features") New features @@ -14,14 +23,6 @@ p | include additional deprecation notes. New methods and functions that | were introduced in this version are marked with a #[+tag-new(2)] tag. -p - | To help you make the most of v2.0, we also - | #[strong re-wrote almost all of the usage guides and API docs], and added - | more real-world examples. If you're new to spaCy, or just want to brush - | up on some NLP basics and the details of the library, check out - | the #[+a("/docs/usage/spacy-101") spaCy 101 guide] that explains the most - | important concepts with examples and illustrations. - +h(3, "features-pipelines") Improved processing pipelines +aside-code("Example"). @@ -292,11 +293,10 @@ p +h(2, "migrating") Migrating from spaCy 1.x -+list - +item Saving, loading and serialization. - +item Processing pipelines and language data. - +item Adding patterns and callbacks to the matcher. - +item Models trained with spaCy 1.x. +p + | If you've mostly been using spaCy for basic text processing, chances are + | you won't even have to change your code at all. For all other cases, + | we've tried to focus... +infobox("Some tips") | Before migrating, we strongly recommend writing a few @@ -341,6 +341,13 @@ p +h(3, "migrating-strings") Strings and hash values +p + | The change from integer IDs to hash values may not actually affect your + | code very much. However, if you're adding strings to the vocab manually, + | you now need to call #[+api("stringstore#add") #[code StringStore.add()]] + | explicitly. You can also now be sure that the string-to-hash mapping will + | always match across vocabularies. + +code-new. nlp.vocab.strings.add(u'coffee') nlp.vocab.strings[u'coffee'] # 3197928453018144401 @@ -382,7 +389,7 @@ p p | If you're using the matcher, you can now add patterns in one step. This | should be easy to update – simply merge the ID, callback and patterns - | into one call to #[+api("matcher#add") #[code matcher.add()]]. + | into one call to #[+api("matcher#add") #[code Matcher.add()]]. +code-new. matcher.add('GoogleNow', merge_phrases, [{ORTH: 'Google'}, {ORTH: 'Now'}]) @@ -391,4 +398,11 @@ p matcher.add_entity('GoogleNow', on_match=merge_phrases) matcher.add_pattern('GoogleNow', [{ORTH: 'Google'}, {ORTH: 'Now'}]) -+h(3, "migrating-models") Trained models +p + | If you've been using #[strong acceptor functions], you'll need to move + | this logic into the + | #[+a("/docs/usage/rule-based-matching#on_match") #[code on_match] callbacks]. + | The callback function is invoked on every match and will give you access to + | the doc, the index of the current match and all total matches. This lets + | you both accept or reject the match, and define the actions to be + | triggered.