From 14f97cfd207a20f6162e28af33388e30db80bc59 Mon Sep 17 00:00:00 2001 From: ines Date: Wed, 8 Nov 2017 01:53:36 +0100 Subject: [PATCH] Add note on stream processing to migration guide (see #1508) --- website/usage/_v2/_migrating.jade | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/website/usage/_v2/_migrating.jade b/website/usage/_v2/_migrating.jade index 6443e0592..5ed0fb13e 100644 --- a/website/usage/_v2/_migrating.jade +++ b/website/usage/_v2/_migrating.jade @@ -17,6 +17,25 @@ p | runtime inputs must match. This means you'll have to | #[strong retrain your models] with spaCy v2.0. ++h(3, "migrating-document-processing") Document processing + +p + | The #[+api("language#pipe") #[code Language.pipe]] method allows spaCy + | to batch documents, which brings a + | #[strong significant performance advantage] in v2.0. The new neural + | networks introduce some overhead per batch, so if you're processing a + | number of documents in a row, you should use #[code nlp.pipe] and process + | the texts as a stream. + ++code-new docs = nlp.pipe(texts) ++code-old docs = (nlp(text) for text in texts) + +p + | To make usage easier, there's now a boolean #[code as_tuples] + | keyword argument, that lets you pass in an iterator of + | #[code (text, context)] pairs, so you can get back an iterator of + | #[code (doc, context)] tuples. + +h(3, "migrating-saving-loading") Saving, loading and serialization p