💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
---
title: Top-level Functions
menu:
- ['spacy', 'spacy']
- ['displacy', 'displacy']
2020-07-06 16:14:57 +00:00
- ['registry', 'registry']
2020-08-06 17:30:43 +00:00
- ['Batchers', 'batchers']
2020-07-04 12:23:10 +00:00
- ['Data & Alignment', 'gold']
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
- ['Utility Functions', 'util']
---
## spaCy {#spacy hidden="true"}
### spacy.load {#spacy.load tag="function" model="any"}
2020-07-01 19:26:39 +00:00
Load a model using the name of an installed
[model package ](/usage/training#models-generating ), a string path or a
`Path` -like object. spaCy will try resolving the load argument in this order. If
a model is loaded from a model name, spaCy will assume it's a Python package and
import it and call the model's own `load()` method. If a model is loaded from a
2020-08-18 12:39:40 +00:00
path, spaCy will assume it's a data directory, load its
[`config.cfg` ](/api/data-formats#config ) and use the language and pipeline
information to construct the `Language` class. The data will be loaded in via
[`Language.from_disk` ](/api/language#from_disk ).
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> #### Example
>
> ```python
> nlp = spacy.load("en_core_web_sm") # package
2020-07-01 19:26:39 +00:00
> nlp = spacy.load("/path/to/en") # string path
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> nlp = spacy.load(Path("/path/to/en")) # pathlib Path
>
2019-07-25 12:24:56 +00:00
> nlp = spacy.load("en_core_web_sm", disable=["parser", "tagger"])
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `name` | Model to load, i.e. package name or path. ~~Union[str, Path]~~ |
| _keyword-only_ | |
| `disable` | Names of pipeline components to [disable ](/usage/processing-pipelines#disabling ). ~~List[str]~~ |
| `config` < Tag variant = "new" > 3</ Tag > | Optional config overrides, either as nested dict or dict keyed by section value in dot notation, e.g. `"components.name.value"` . ~~Union[Dict[str, Any], Config]~~ |
| **RETURNS** | A `Language` object with the loaded model. ~~Language~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2020-08-18 12:39:40 +00:00
Essentially, `spacy.load()` is a convenience wrapper that reads the model's
[`config.cfg` ](/api/data-formats#config ), uses the language and pipeline
information to construct a `Language` object, loads in the model data and
returns it.
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
```python
### Abstract example
2020-07-27 16:11:45 +00:00
cls = util.get_lang_class(lang) # get language for ID, e.g. "en"
nlp = cls() # initialize the language
2020-07-26 22:29:45 +00:00
for name in pipeline:
2020-07-27 16:11:45 +00:00
nlp.add_pipe(name) # add component to pipeline
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
nlp.from_disk(model_data_path) # load in model data
```
### spacy.blank {#spacy.blank tag="function" new="2"}
Create a blank model of a given language class. This function is the twin of
`spacy.load()` .
> #### Example
>
> ```python
2020-07-27 16:11:45 +00:00
> nlp_en = spacy.blank("en") # equivalent to English()
> nlp_de = spacy.blank("de") # equivalent to German()
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | -------------------------------------------------------------------------------------------------------- |
| `name` | [ISO code ](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes ) of the language class to load. ~~str~~ |
| **RETURNS** | An empty `Language` object of the appropriate subclass. ~~Language~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2020-08-17 23:22:59 +00:00
### spacy.info {#spacy.info tag="function"}
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
The same as the [`info` command ](/api/cli#info ). Pretty-print information about
your installation, models and local setup from within spaCy. To get the model
meta data as a dictionary instead, you can use the `meta` attribute on your
`nlp` object with a loaded model, e.g. `nlp.meta` .
> #### Example
>
> ```python
> spacy.info()
2020-07-04 12:23:10 +00:00
> spacy.info("en_core_web_sm")
2020-07-27 16:11:45 +00:00
> markdown = spacy.info(markdown=True, silent=True)
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| -------------- | ------------------------------------------------------------------ |
| `model` | A model, i.e. a package name or path (optional). ~~Optional[str]~~ |
| _keyword-only_ | |
| `markdown` | Print information as Markdown. ~~bool~~ |
| `silent` | Don't print anything, just return. ~~bool~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### spacy.explain {#spacy.explain tag="function"}
Get a description for a given POS tag, dependency label or entity type. For a
list of available terms, see
[`glossary.py` ](https://github.com/explosion/spaCy/tree/master/spacy/glossary.py ).
> #### Example
>
> ```python
2019-09-12 14:11:15 +00:00
> spacy.explain("NORP")
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> # Nationalities or religious or political groups
>
2019-09-12 14:11:15 +00:00
> doc = nlp("Hello world")
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> for word in doc:
> print(word.text, word.tag_, spacy.explain(word.tag_))
> # Hello UH interjection
> # world NN noun, singular or mass
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | -------------------------------------------------------------------------- |
| `term` | Term to explain. ~~str~~ |
| **RETURNS** | The explanation, or `None` if not found in the glossary. ~~Optional[str]~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### spacy.prefer_gpu {#spacy.prefer_gpu tag="function" new="2.0.14"}
Allocate data and perform operations on [GPU ](/usage/#gpu ), if available. If
data has already been allocated on CPU, it will not be moved. Ideally, this
function should be called right after importing spaCy and _before_ loading any
models.
> #### Example
>
> ```python
> import spacy
> activated = spacy.prefer_gpu()
> nlp = spacy.load("en_core_web_sm")
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | --------------------------------------- |
| **RETURNS** | Whether the GPU was activated. ~~bool~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### spacy.require_gpu {#spacy.require_gpu tag="function" new="2.0.14"}
Allocate data and perform operations on [GPU ](/usage/#gpu ). Will raise an error
if no GPU is available. If data has already been allocated on CPU, it will not
be moved. Ideally, this function should be called right after importing spaCy
and _before_ loading any models.
> #### Example
>
> ```python
> import spacy
> spacy.require_gpu()
> nlp = spacy.load("en_core_web_sm")
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | --------------- |
| **RETURNS** | `True` ~~bool~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
## displaCy {#displacy source="spacy/displacy"}
As of v2.0, spaCy comes with a built-in visualization suite. For more info and
examples, see the usage guide on [visualizing spaCy ](/usage/visualizers ).
### displacy.serve {#displacy.serve tag="method" new="2"}
Serve a dependency parse tree or named entity visualization to view it in your
browser. Will run a simple web server.
> #### Example
>
> ```python
> import spacy
> from spacy import displacy
> nlp = spacy.load("en_core_web_sm")
2019-09-12 14:11:15 +00:00
> doc1 = nlp("This is a sentence.")
> doc2 = nlp("This is another sentence.")
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> displacy.serve([doc1, doc2], style="dep")
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| --------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `docs` | Document(s) or span(s) to visualize. ~~Union[Iterable[Union[Doc, Span]], Doc, Span]~~ |
| `style` | Visualization style, `"dep"` or `"ent"` . Defaults to `"dep"` . ~~str~~ |
| `page` | Render markup as full HTML page. Defaults to `True` . ~~bool~~ |
| `minify` | Minify HTML markup. Defaults to `False` . ~~bool~~ |
| `options` | [Visualizer-specific options ](#displacy_options ), e.g. colors. ~~Dict[str, Any]~~ |
| `manual` | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here ](/usage/visualizers#manual-usage ) for formats and examples. Defaults to `False` . ~~bool~~ |
| `port` | Port to serve visualization. Defaults to `5000` . ~~int~~ |
| `host` | Host to serve visualization. Defaults to `"0.0.0.0"` . ~~str~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### displacy.render {#displacy.render tag="method" new="2"}
Render a dependency parse tree or named entity visualization.
> #### Example
>
> ```python
> import spacy
> from spacy import displacy
> nlp = spacy.load("en_core_web_sm")
2019-09-12 14:11:15 +00:00
> doc = nlp("This is a sentence.")
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> html = displacy.render(doc, style="dep")
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `docs` | Document(s) or span(s) to visualize. ~~Union[Iterable[Union[Doc, Span]], Doc, Span]~~ |
| `style` | Visualization style, `"dep"` or `"ent"` . Defaults to `"dep"` . ~~str~~ |
| `page` | Render markup as full HTML page. Defaults to `True` . ~~bool~~ |
| `minify` | Minify HTML markup. Defaults to `False` . ~~bool~~ |
| `options` | [Visualizer-specific options ](#displacy_options ), e.g. colors. ~~Dict[str, Any]~~ |
| `manual` | Don't parse `Doc` and instead, expect a dict or list of dicts. [See here ](/usage/visualizers#manual-usage ) for formats and examples. Defaults to `False` . ~~bool~~ |
| `jupyter` | Explicitly enable or disable "[Jupyter](http://jupyter.org/) mode" to return markup ready to be rendered in a notebook. Detected automatically if `None` (default). ~~Optional[bool]~~ |
| **RETURNS** | The rendered HTML markup. ~~str~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### Visualizer options {#displacy_options}
The `options` argument lets you specify additional settings for each visualizer.
If a setting is not present in the options, the default value will be used.
#### Dependency Visualizer options {#options-dep}
> #### Example
>
> ```python
> options = {"compact": True, "color": "blue"}
> displacy.serve(doc, style="dep", options=options)
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------- |
| `fine_grained` | Use fine-grained part-of-speech tags (`Token.tag_`) instead of coarse-grained tags (`Token.pos_`). Defaults to `False` . ~~bool~~ |
| `add_lemma` < Tag variant = "new" > 2.2.4</ Tag > | Print the lemma's in a separate row below the token texts. Defaults to `False` . ~~bool~~ |
| `collapse_punct` | Attach punctuation to tokens. Can make the parse more readable, as it prevents long arcs to attach punctuation. Defaults to `True` . ~~bool~~ |
| `collapse_phrases` | Merge noun phrases into one token. Defaults to `False` . ~~bool~~ |
| `compact` | "Compact mode" with square arrows that takes up less space. Defaults to `False` . ~~bool~~ |
| `color` | Text color (HEX, RGB or color names). Defaults to `"#000000"` . ~~str~~ |
| `bg` | Background color (HEX, RGB or color names). Defaults to `"#ffffff"` . ~~str~~ |
| `font` | Font name or font family for all text. Defaults to `"Arial"` . ~~str~~ |
| `offset_x` | Spacing on left side of the SVG in px. Defaults to `50` . ~~int~~ |
| `arrow_stroke` | Width of arrow path in px. Defaults to `2` . ~~int~~ |
| `arrow_width` | Width of arrow head in px. Defaults to `10` in regular mode and `8` in compact mode. ~~int~~ |
| `arrow_spacing` | Spacing between arrows in px to avoid overlaps. Defaults to `20` in regular mode and `12` in compact mode. ~~int~~ |
| `word_spacing` | Vertical spacing between words and arcs in px. Defaults to `45` . ~~int~~ |
| `distance` | Distance between words in px. Defaults to `175` in regular mode and `150` in compact mode. ~~int~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
#### Named Entity Visualizer options {#displacy_options-ent}
> #### Example
>
> ```python
> options = {"ents": ["PERSON", "ORG", "PRODUCT"],
> "colors": {"ORG": "yellow"}}
> displacy.serve(doc, style="ent", options=options)
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| --------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `ents` | Entity types to highlight or `None` for all types (default). ~~Optional[List[str]]~~ |
| `colors` | Color overrides. Entity types in uppercase should be mapped to color names or values. ~~Dict[str, str]~~ |
| `template` < Tag variant = "new" > 2.2</ Tag > | Optional template to overwrite the HTML used to render entity spans. Should be a format string and can use `{bg}` , `{text}` and `{label}` . See [`templates.py` ](https://github.com/explosion/spaCy/blob/master/spacy/displacy/templates.py ) for examples. ~~Optional[str]~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2020-07-05 14:11:16 +00:00
By default, displaCy comes with colors for all entity types used by
[spaCy models ](/models ). If you're using custom entity types, you can use the
`colors` setting to add your own colors for them. Your application or model
package can also expose a
2019-09-12 14:11:15 +00:00
[`spacy_displacy_colors` entry point ](/usage/saving-loading#entry-points-displacy )
to add custom labels and their colors automatically.
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2020-07-06 16:14:57 +00:00
## registry {#registry source="spacy/util.py" new="3"}
spaCy's function registry extends
[Thinc's `registry` ](https://thinc.ai/docs/api-config#registry ) and allows you
to map strings to functions. You can register functions to create architectures,
optimizers, schedules and more, and then refer to them and set their arguments
in your [config file ](/usage/training#config ). Python type hints are used to
validate the inputs. See the
[Thinc docs ](https://thinc.ai/docs/api-config#registry ) for details on the
`registry` methods and our helper library
[`catalogue` ](https://github.com/explosion/catalogue ) for some background on the
concept of function registries. spaCy also uses the function registry for
language subclasses, model architecture, lookups and pipeline component
factories.
<!-- TODO: improve example? -->
> #### Example
>
> ```python
> import spacy
> from thinc.api import Model
>
> @spacy.registry.architectures("CustomNER.v1")
> def custom_ner(n0: int) -> Model:
> return Model("custom", forward, dims={"nO": nO})
> ```
| Registry name | Description |
| ----------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `architectures` | Registry for functions that create [model architectures ](/api/architectures ). Can be used to register custom model architectures and reference them in the `config.cfg` . |
| `factories` | Registry for functions that create [pipeline components ](/usage/processing-pipelines#custom-components ). Added automatically when you use the `@spacy.component` decorator and also reads from [entry points ](/usage/saving-loading#entry-points ) |
2020-08-06 17:30:43 +00:00
| `tokenizers` | Registry for tokenizer factories. Registered functions should return a callback that receives the `nlp` object and returns a [`Tokenizer` ](/api/tokenizer ) or a custom callable. |
2020-07-06 16:14:57 +00:00
| `languages` | Registry for language-specific `Language` subclasses. Automatically reads from [entry points ](/usage/saving-loading#entry-points ). |
| `lookups` | Registry for large lookup tables available via `vocab.lookups` . |
| `displacy_colors` | Registry for custom color scheme for the [`displacy` NER visualizer ](/usage/visualizers ). Automatically reads from [entry points ](/usage/saving-loading#entry-points ). |
2020-08-09 23:20:10 +00:00
| `assets` | Registry for data assets, knowledge bases etc. |
2020-08-05 18:29:53 +00:00
| `callbacks` | Registry for custom callbacks to [modify the `nlp` object ](/usage/training#custom-code-nlp-callbacks ) before training. |
2020-08-06 17:30:43 +00:00
| `readers` | Registry for training and evaluation data readers like [`Corpus` ](/api/corpus ). |
| `batchers` | Registry for training and evaluation [data batchers ](#batchers ). |
2020-07-06 16:14:57 +00:00
| `optimizers` | Registry for functions that create [optimizers ](https://thinc.ai/docs/api-optimizers ). |
| `schedules` | Registry for functions that create [schedules ](https://thinc.ai/docs/api-schedules ). |
| `layers` | Registry for functions that create [layers ](https://thinc.ai/docs/api-layers ). |
| `losses` | Registry for functions that create [losses ](https://thinc.ai/docs/api-loss ). |
| `initializers` | Registry for functions that create [initializers ](https://thinc.ai/docs/api-initializers ). |
2020-07-29 16:44:10 +00:00
### spacy-transformers registry {#registry-transformers}
The following registries are added by the
[`spacy-transformers` ](https://github.com/explosion/spacy-transformers ) package.
See the [`Transformer` ](/api/transformer ) API reference and
2020-08-17 22:49:19 +00:00
[usage docs ](/usage/embeddings-transformers ) for details.
2020-07-29 16:44:10 +00:00
> #### Example
>
> ```python
> import spacy_transformers
>
> @spacy_transformers.registry.annotation_setters("my_annotation_setter.v1")
> def configure_custom_annotation_setter():
> def annotation_setter(docs, trf_data) -> None:
> # Set annotations on the docs
>
> return annotation_sette
> ```
2020-08-05 13:01:00 +00:00
| Registry name | Description |
| ----------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| [`span_getters` ](/api/transformer#span_getters ) | Registry for functions that take a batch of `Doc` objects and return a list of `Span` objects to process by the transformer, e.g. sentences. |
| [`annotation_setters` ](/api/transformer#annotation_setters ) | Registry for functions that create annotation setters. Annotation setters are functions that take a batch of `Doc` objects and a [`FullTransformerBatch` ](/api/transformer#fulltransformerbatch ) and can set additional annotations on the `Doc` . |
2020-07-29 16:44:10 +00:00
2020-08-06 17:30:43 +00:00
## Batchers {#batchers source="spacy/gold/batchers.py" new="3"}
2020-08-05 13:00:54 +00:00
2020-08-07 18:14:31 +00:00
<!-- TODO: intro -->
2020-08-05 18:29:53 +00:00
#### batch_by_words.v1 {#batch_by_words tag="registered function"}
Create minibatches of roughly a given number of words. If any examples are
longer than the specified batch length, they will appear in a batch by
themselves, or be discarded if `discard_oversize` is set to `True` . The argument
`docs` can be a list of strings, [`Doc` ](/api/doc ) objects or
[`Example` ](/api/example ) objects.
> #### Example config
>
> ```ini
> [training.batcher]
> @batchers = "batch_by_words.v1"
> size = 100
> tolerance = 0.2
> discard_oversize = false
> get_length = null
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `seqs` | The sequences to minibatch. ~~Iterable[Any]~~ |
| `size` | The target number of words per batch. Can also be a block referencing a schedule, e.g. [`compounding` ](https://thinc.ai/docs/api-schedules/#compounding ). ~~Union[int, Sequence[int]]~~ |
| `tolerance` | What percentage of the size to allow batches to exceed. ~~float~~ |
| `discard_oversize` | Whether to discard sequences that by themselves exceed the tolerated size. ~~bool~~ |
| `get_length` | Optional function that receives a sequence item and returns its length. Defaults to the built-in `len()` if not set. ~~Optional[Callable[[Any], int]]~~ |
2020-08-05 18:29:53 +00:00
#### batch_by_sequence.v1 {#batch_by_sequence tag="registered function"}
> #### Example config
>
> ```ini
> [training.batcher]
> @batchers = "batch_by_sequence.v1"
> size = 32
> get_length = null
> ```
2020-08-07 18:14:31 +00:00
Create a batcher that creates batches of the specified size.
2020-08-05 18:29:53 +00:00
2020-08-17 14:45:24 +00:00
| Name | Description |
| ------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `size` | The target number of items per batch. Can also be a block referencing a schedule, e.g. [`compounding` ](https://thinc.ai/docs/api-schedules/#compounding ). ~~Union[int, Sequence[int]]~~ |
| `get_length` | Optional function that receives a sequence item and returns its length. Defaults to the built-in `len()` if not set. ~~Optional[Callable[[Any], int]]~~ |
2020-08-05 18:29:53 +00:00
#### batch_by_padded.v1 {#batch_by_padded tag="registered function"}
> #### Example config
>
> ```ini
> [training.batcher]
2020-08-07 18:14:31 +00:00
> @batchers = "batch_by_padded.v1"
2020-08-05 18:29:53 +00:00
> size = 100
2020-08-07 18:14:31 +00:00
> buffer = 256
2020-08-05 18:29:53 +00:00
> discard_oversize = false
> get_length = null
> ```
2020-08-07 18:14:31 +00:00
Minibatch a sequence by the size of padded batches that would result, with
sequences binned by length within a window. The padded size is defined as the
maximum length of sequences within the batch multiplied by the number of
sequences in the batch.
2020-08-17 14:45:24 +00:00
| Name | Description |
| ------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `size` | The largest padded size to batch sequences into. Can also be a block referencing a schedule, e.g. [`compounding` ](https://thinc.ai/docs/api-schedules/#compounding ). ~~Union[int, Sequence[int]]~~ |
| `buffer` | The number of sequences to accumulate before sorting by length. A larger buffer will result in more even sizing, but if the buffer is very large, the iteration order will be less random, which can result in suboptimal training. ~~int~~ |
| `discard_oversize` | Whether to discard sequences that are by themselves longer than the largest padded batch size. ~~bool~~ |
| `get_length` | Optional function that receives a sequence item and returns its length. Defaults to the built-in `len()` if not set. ~~Optional[Callable[[Any], int]]~~ |
2020-08-05 18:29:53 +00:00
2020-07-04 12:23:10 +00:00
## Training data and alignment {#gold source="spacy/gold"}
### gold.biluo_tags_from_offsets {#biluo_tags_from_offsets tag="function"}
Encode labelled spans into per-token tags, using the
2020-07-05 14:11:16 +00:00
[BILUO scheme ](/usage/linguistic-features#accessing-ner ) (Begin, In, Last, Unit,
Out). Returns a list of strings, describing the tags. Each tag string will be of
the form of either `""` , `"O"` or `"{action}-{label}"` , where action is one of
`"B"` , `"I"` , `"L"` , `"U"` . The string `"-"` is used where the entity offsets
don't align with the tokenization in the `Doc` object. The training algorithm
will view these as missing values. `O` denotes a non-entity token. `B` denotes
the beginning of a multi-token entity, `I` the inside of an entity of three or
more tokens, and `L` the end of an entity of two or more tokens. `U` denotes a
single-token entity.
2020-07-04 12:23:10 +00:00
> #### Example
>
> ```python
> from spacy.gold import biluo_tags_from_offsets
>
> doc = nlp("I like London.")
> entities = [(7, 13, "LOC")]
> tags = biluo_tags_from_offsets(doc, entities)
> assert tags == ["O", "O", "U-LOC", "O"]
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `doc` | The document that the entity offsets refer to. The output tags will refer to the token boundaries within the document. ~~Doc~~ |
| `entities` | A sequence of `(start, end, label)` triples. `start` and `end` should be character-offset integers denoting the slice into the original string. ~~List[Tuple[int, int, Union[str, int]]]~~ |
| **RETURNS** | A list of strings, describing the [BILUO ](/usage/linguistic-features#accessing-ner ) tags. ~~List[str]~~ |
2020-07-04 12:23:10 +00:00
### gold.offsets_from_biluo_tags {#offsets_from_biluo_tags tag="function"}
2020-07-05 14:11:16 +00:00
Encode per-token tags following the
[BILUO scheme ](/usage/linguistic-features#accessing-ner ) into entity offsets.
2020-07-04 12:23:10 +00:00
> #### Example
>
> ```python
> from spacy.gold import offsets_from_biluo_tags
>
> doc = nlp("I like London.")
> tags = ["O", "O", "U-LOC", "O"]
> entities = offsets_from_biluo_tags(doc, tags)
> assert entities == [(7, 13, "LOC")]
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `doc` | The document that the BILUO tags refer to. ~~Doc~~ |
| `entities` | A sequence of [BILUO ](/usage/linguistic-features#accessing-ner ) tags with each tag describing one token. Each tag string will be of the form of either `""` , `"O"` or `"{action}-{label}"` , where action is one of `"B"` , `"I"` , `"L"` , `"U"` . ~~List[str]~~ |
| **RETURNS** | A sequence of `(start, end, label)` triples. `start` and `end` will be character-offset integers denoting the slice into the original string. ~~List[Tuple[int, int, str]]~~ |
2020-07-04 12:23:10 +00:00
### gold.spans_from_biluo_tags {#spans_from_biluo_tags tag="function" new="2.1"}
2020-07-05 14:11:16 +00:00
Encode per-token tags following the
[BILUO scheme ](/usage/linguistic-features#accessing-ner ) into
2020-07-04 12:23:10 +00:00
[`Span` ](/api/span ) objects. This can be used to create entity spans from
token-based tags, e.g. to overwrite the `doc.ents` .
> #### Example
>
> ```python
> from spacy.gold import spans_from_biluo_tags
>
> doc = nlp("I like London.")
> tags = ["O", "O", "U-LOC", "O"]
> doc.ents = spans_from_biluo_tags(doc, tags)
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `doc` | The document that the BILUO tags refer to. ~~Doc~~ |
| `entities` | A sequence of [BILUO ](/usage/linguistic-features#accessing-ner ) tags with each tag describing one token. Each tag string will be of the form of either `""` , `"O"` or `"{action}-{label}"` , where action is one of `"B"` , `"I"` , `"L"` , `"U"` . ~~List[str]~~ |
| **RETURNS** | A sequence of `Span` objects with added entity labels. ~~List[Span]~~ |
2020-07-04 12:23:10 +00:00
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
## Utility functions {#util source="spacy/util.py"}
spaCy comes with a small collection of utility functions located in
[`spacy/util.py` ](https://github.com/explosion/spaCy/tree/master/spacy/util.py ).
Because utility functions are mostly intended for **internal use within spaCy** ,
their behavior may change with future releases. The functions documented on this
page should be safe to use and we'll try to ensure backwards compatibility.
However, we recommend having additional tests in place if your application
depends on any of spaCy's utilities.
2020-07-06 16:14:57 +00:00
<!-- TODO: document new config - related util functions? -->
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### util.get_lang_class {#util.get_lang_class tag="function"}
Import and load a `Language` class. Allows lazy-loading
[language data ](/usage/adding-languages ) and importing languages using the
two-letter language code. To add a language code for a custom language class,
2020-08-17 14:45:24 +00:00
you can register it using the [`@registry.languages` ](/api/top-level#registry )
decorator.
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> #### Example
>
> ```python
> for lang_id in ["en", "de"]:
> lang_class = util.get_lang_class(lang_id)
> lang = lang_class()
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | ---------------------------------------------- |
| `lang` | Two-letter language code, e.g. `"en"` . ~~str~~ |
| **RETURNS** | The respective subclass. ~~Language~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2019-05-11 21:03:56 +00:00
### util.lang_class_is_loaded {#util.lang_class_is_loaded tag="function" new="2.1"}
2019-03-11 14:23:53 +00:00
2020-08-17 14:45:24 +00:00
Check whether a `Language` subclass is already loaded. `Language` subclasses are
2019-03-11 14:23:53 +00:00
loaded lazily, to avoid expensive setup code associated with the language data.
> #### Example
>
> ```python
> lang_cls = util.get_lang_class("en")
> assert util.lang_class_is_loaded("en") is True
> assert util.lang_class_is_loaded("de") is False
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | ---------------------------------------------- |
| `name` | Two-letter language code, e.g. `"en"` . ~~str~~ |
| **RETURNS** | Whether the class has been loaded. ~~bool~~ |
2019-03-11 14:23:53 +00:00
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### util.load_model {#util.load_model tag="function" new="2"}
2020-07-01 19:26:39 +00:00
Load a model from a package or data path. If called with a package name, spaCy
will assume the model is a Python package and import and call its `load()`
method. If called with a path, spaCy will assume it's a data directory, read the
2020-08-18 12:39:40 +00:00
language and pipeline settings from the [`config.cfg` ](/api/data-formats#config )
and create a `Language` object. The model data will then be loaded in via
2020-08-17 14:45:24 +00:00
[`Language.from_disk` ](/api/language#from_disk ).
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> #### Example
>
> ```python
2020-07-04 12:23:10 +00:00
> nlp = util.load_model("en_core_web_sm")
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> nlp = util.load_model("en_core_web_sm", disable=["ner"])
> nlp = util.load_model("/path/to/data")
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `name` | Package name or model path. ~~str~~ |
| `vocab` < Tag variant = "new" > 3</ Tag > | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~ . |
| `disable` | Names of pipeline components to disable. ~~Iterable[str]~~ |
| `config` < Tag variant = "new" > 3</ Tag > | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"` . ~~Union[Dict[str, Any], Config]~~ |
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### util.load_model_from_init_py {#util.load_model_from_init_py tag="function" new="2"}
A helper function to use in the `load()` method of a model package's
[`__init__.py` ](https://github.com/explosion/spacy-models/tree/master/template/model/xx_model_name/__init__.py ).
> #### Example
>
> ```python
> from spacy.util import load_model_from_init_py
>
> def load(**overrides):
> return load_model_from_init_py(__file__, **overrides)
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------- |
| `init_file` | Path to model's `__init__.py` , i.e. `__file__` . ~~Union[str, Path]~~ |
| `vocab` < Tag variant = "new" > 3</ Tag > | Optional shared vocab to pass in on initialization. If `True` (default), a new `Vocab` object will be created. ~~Union[Vocab, bool]~~ . |
| `disable` | Names of pipeline components to disable. ~~Iterable[str]~~ |
| `config` < Tag variant = "new" > 3</ Tag > | Config overrides as nested dict or flat dict keyed by section values in dot notation, e.g. `"nlp.pipeline"` . ~~Union[Dict[str, Any], Config]~~ |
| **RETURNS** | `Language` class with the loaded model. ~~Language~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2020-08-17 23:22:59 +00:00
### util.load_config {#util.load_config tag="function" new="3"}
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2020-08-17 23:22:59 +00:00
Load a model's [`config.cfg` ](/api/data-formats#config ) from a file path. The
config typically includes details about the model pipeline and how its
components are created, as well as all training settings and hyperparameters.
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> #### Example
>
> ```python
2020-08-17 23:22:59 +00:00
> config = util.load_config("/path/to/model/config.cfg")
> print(config.to_str())
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> ```
2020-08-17 23:22:59 +00:00
| Name | Description |
| ------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `path` | Path to the model's `config.cfg` . ~~Union[str, Path]~~ |
| `overrides` | Optional config overrides to replace in loaded config. Can be provided as nested dict, or as flat dict with keys in dot notation, e.g. `"nlp.pipeline"` . ~~Dict[str, Any]~~ |
| `interpolate` | Whether to interpolate the config and replace variables like `${paths:train}` with their values. Defaults to `False` . ~~bool~~ |
| **RETURNS** | The model's config. ~~Config~~ |
### util.load_meta {#util.load_meta tag="function" new="3"}
2020-08-18 12:39:40 +00:00
Get a model's [`meta.json` ](/api/data-formats#meta ) from a file path and
validate its contents.
2020-08-17 23:22:59 +00:00
> #### Example
>
> ```python
> meta = util.load_meta("/path/to/model/meta.json")
> ```
| Name | Description |
| ----------- | ----------------------------------------------------- |
| `path` | Path to the model's `meta.json` . ~~Union[str, Path]~~ |
| **RETURNS** | The model's meta data. ~~Dict[str, Any]~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### util.is_package {#util.is_package tag="function"}
Check if string maps to a package installed via pip. Mainly used to validate
[model packages ](/usage/models ).
> #### Example
>
> ```python
> util.is_package("en_core_web_sm") # True
> util.is_package("xyz") # False
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | ----------------------------------------------------- |
| `name` | Name of package. ~~str~~ |
| **RETURNS** | `True` if installed package, `False` if not. ~~bool~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### util.get_package_path {#util.get_package_path tag="function" new="2"}
Get path to an installed package. Mainly used to resolve the location of
[model packages ](/usage/models ). Currently imports the package to find its path.
> #### Example
>
> ```python
> util.get_package_path("en_core_web_sm")
> # /usr/lib/python3.6/site-packages/en_core_web_sm
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| -------------- | ----------------------------------------- |
| `package_name` | Name of installed package. ~~str~~ |
| **RETURNS** | Path to model package directory. ~~Path~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### util.is_in_jupyter {#util.is_in_jupyter tag="function" new="2"}
Check if user is running spaCy from a [Jupyter ](https://jupyter.org ) notebook by
detecting the IPython kernel. Mainly used for the
[`displacy` ](/api/top-level#displacy ) visualizer.
> #### Example
>
> ```python
> html = "<h1>Hello world!</h1>"
> if util.is_in_jupyter():
> from IPython.core.display import display, HTML
> display(HTML(html))
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | ---------------------------------------------- |
| **RETURNS** | `True` if in Jupyter, `False` if not. ~~bool~~ |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2019-02-24 17:34:10 +00:00
### util.compile_prefix_regex {#util.compile_prefix_regex tag="function"}
Compile a sequence of prefix rules into a regex object.
> #### Example
>
> ```python
> prefixes = ("§", "%", "=", r"\+")
> prefix_regex = util.compile_prefix_regex(prefixes)
> nlp.tokenizer.prefix_search = prefix_regex.search
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `entries` | The prefix rules, e.g. [`lang.punctuation.TOKENIZER_PREFIXES` ](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py ). ~~Iterable[Union[str, Pattern]]~~ |
| **RETURNS** | The regex object. to be used for [`Tokenizer.prefix_search` ](/api/tokenizer#attributes ). ~~Pattern~~ |
2019-02-24 17:34:10 +00:00
### util.compile_suffix_regex {#util.compile_suffix_regex tag="function"}
Compile a sequence of suffix rules into a regex object.
> #### Example
>
> ```python
> suffixes = ("'s", "'S", r"(?<=[0-9])\+")
> suffix_regex = util.compile_suffix_regex(suffixes)
> nlp.tokenizer.suffix_search = suffix_regex.search
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `entries` | The suffix rules, e.g. [`lang.punctuation.TOKENIZER_SUFFIXES` ](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py ). ~~Iterable[Union[str, Pattern]]~~ |
| **RETURNS** | The regex object. to be used for [`Tokenizer.suffix_search` ](/api/tokenizer#attributes ). ~~Pattern~~ |
2019-02-24 17:34:10 +00:00
### util.compile_infix_regex {#util.compile_infix_regex tag="function"}
Compile a sequence of infix rules into a regex object.
> #### Example
>
> ```python
> infixes = ("…", "-", "—", r"(?<=[0-9])[+\-\*^](?=[0-9-])")
> infix_regex = util.compile_infix_regex(infixes)
> nlp.tokenizer.infix_finditer = infix_regex.finditer
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `entries` | The infix rules, e.g. [`lang.punctuation.TOKENIZER_INFIXES` ](https://github.com/explosion/spaCy/tree/master/spacy/lang/punctuation.py ). ~~Iterable[Union[str, Pattern]]~~ |
| **RETURNS** | The regex object. to be used for [`Tokenizer.infix_finditer` ](/api/tokenizer#attributes ). ~~Pattern~~ |
2019-02-24 17:34:10 +00:00
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
### util.minibatch {#util.minibatch tag="function" new="2"}
Iterate over batches of items. `size` may be an iterator, so that batch-size can
vary on each step.
> #### Example
>
> ```python
> batches = minibatch(train_data)
> for batch in batches:
2020-07-07 17:17:19 +00:00
> nlp.update(batch)
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ---------- | ---------------------------------------- |
| `items` | The items to batch up. ~~Iterable[Any]~~ |
| `size` | int / iterable | The batch size(s). ~~Union[int, Sequence[int]]~~ |
| **YIELDS** | The batches. |
💫 Update website (#3285)
<!--- Provide a general summary of your changes in the title. -->
## Description
The new website is implemented using [Gatsby](https://www.gatsbyjs.org) with [Remark](https://github.com/remarkjs/remark) and [MDX](https://mdxjs.com/). This allows authoring content in **straightforward Markdown** without the usual limitations. Standard elements can be overwritten with powerful [React](http://reactjs.org/) components and wherever Markdown syntax isn't enough, JSX components can be used. Hopefully, this update will also make it much easier to contribute to the docs. Once this PR is merged, I'll implement auto-deployment via [Netlify](https://netlify.com) on a specific branch (to avoid building the website on every PR). There's a bunch of other cool stuff that the new setup will allow us to do – including writing front-end tests, service workers, offline support, implementing a search and so on.
This PR also includes various new docs pages and content.
Resolves #3270. Resolves #3222. Resolves #2947. Resolves #2837.
### Types of change
enhancement
## Checklist
<!--- Before you submit the PR, go over this checklist and make sure you can
tick off all the boxes. [] -> [x] -->
- [x] I have submitted the spaCy Contributor Agreement.
- [x] I ran the tests, and all new and existing tests passed.
- [x] My changes don't require a change to the documentation, or if they do, I've added all required information.
2019-02-17 18:31:19 +00:00
2019-05-08 00:33:40 +00:00
### util.filter_spans {#util.filter_spans tag="function" new="2.1.4"}
Filter a sequence of [`Span` ](/api/span ) objects and remove duplicates or
overlaps. Useful for creating named entities (where one token can only be part
of one entity) or when merging spans with
[`Retokenizer.merge` ](/api/doc#retokenizer.merge ). When spans overlap, the
(first) longest span is preferred over shorter spans.
> #### Example
>
> ```python
> doc = nlp("This is a sentence.")
> spans = [doc[0:2], doc[0:2], doc[0:4]]
> filtered = filter_spans(spans)
> ```
2020-08-17 14:45:24 +00:00
| Name | Description |
| ----------- | --------------------------------------- |
| `spans` | The spans to filter. ~~Iterable[Span]~~ |
| **RETURNS** | The filtered spans. ~~List[Span]~~ |
2020-07-04 12:23:10 +00:00
2020-07-06 16:14:57 +00:00
### util.get_words_and_spaces {#get_words_and_spaces tag="function" new="3"}
2020-07-04 12:23:10 +00:00
2020-08-17 14:45:24 +00:00
Given a list of words and a text, reconstruct the original tokens and return a
list of words and spaces that can be used to create a [`Doc` ](/api/doc#init ).
This can help recover destructive tokenization that didn't preserve any
whitespace information.
2020-07-04 12:23:10 +00:00
2020-08-17 14:45:24 +00:00
> #### Example
>
> ```python
> orig_words = ["Hey", ",", "what", "'s", "up", "?"]
> orig_text = "Hey, what's up?"
> words, spaces = get_words_and_spaces(orig_words, orig_text)
> # ['Hey', ',', 'what', "'s", 'up', '?']
> # [False, True, False, True, False, False]
> ```
| Name | Description |
| ----------- | -------------------------------------------------------------------------------------------------------------------------------------------------- |
| `words` | The list of words. ~~Iterable[str]~~ |
| `text` | The original text. ~~str~~ |
| **RETURNS** | A list of words and a list of boolean values indicating whether the word at this position is followed by a space. ~~Tuple[List[str], List[bool]]~~ |