diff --git a/website/docs/api/language.md b/website/docs/api/language.md index d65b217a4..7799f103b 100644 --- a/website/docs/api/language.md +++ b/website/docs/api/language.md @@ -828,8 +828,10 @@ subclass of the built-in `dict`. It supports the additional methods `to_disk` ## Language.to_disk {#to_disk tag="method" new="2"} -Save the current state to a directory. If a trained pipeline is loaded, this -will **include all model data**. +Save the current state to a directory. Under the hood, this method delegates to +the `to_disk` methods of the individual pipeline components, if available. This +means that if a trained pipeline is loaded, all components and their weights +will be saved to disk. > #### Example > diff --git a/website/docs/usage/linguistic-features.md b/website/docs/usage/linguistic-features.md index 7d3613cf5..ff08d547c 100644 --- a/website/docs/usage/linguistic-features.md +++ b/website/docs/usage/linguistic-features.md @@ -1222,7 +1222,7 @@ print(doc.text, [token.text for token in doc]) Keep in mind that your models' results may be less accurate if the tokenization during training differs from the tokenization at runtime. So if you modify a -trained pipeline' tokenization afterwards, it may produce very different +trained pipeline's tokenization afterwards, it may produce very different predictions. You should therefore train your pipeline with the **same tokenizer** it will be using at runtime. See the docs on [training with custom tokenization](#custom-tokenizer-training) for details.