mirror of https://github.com/explosion/spaCy.git
Add note on stream processing to migration guide (see #1508)
This commit is contained in:
parent
f929f41bcc
commit
14f97cfd20
|
@ -17,6 +17,25 @@ p
|
|||
| runtime inputs must match. This means you'll have to
|
||||
| #[strong retrain your models] with spaCy v2.0.
|
||||
|
||||
+h(3, "migrating-document-processing") Document processing
|
||||
|
||||
p
|
||||
| The #[+api("language#pipe") #[code Language.pipe]] method allows spaCy
|
||||
| to batch documents, which brings a
|
||||
| #[strong significant performance advantage] in v2.0. The new neural
|
||||
| networks introduce some overhead per batch, so if you're processing a
|
||||
| number of documents in a row, you should use #[code nlp.pipe] and process
|
||||
| the texts as a stream.
|
||||
|
||||
+code-new docs = nlp.pipe(texts)
|
||||
+code-old docs = (nlp(text) for text in texts)
|
||||
|
||||
p
|
||||
| To make usage easier, there's now a boolean #[code as_tuples]
|
||||
| keyword argument, that lets you pass in an iterator of
|
||||
| #[code (text, context)] pairs, so you can get back an iterator of
|
||||
| #[code (doc, context)] tuples.
|
||||
|
||||
+h(3, "migrating-saving-loading") Saving, loading and serialization
|
||||
|
||||
p
|
||||
|
|
Loading…
Reference in New Issue